├── Audio2BodyPrediction.gif ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── data └── README.md ├── data_utils ├── data.py └── transform_keypoints.py ├── generate_stitched_video.py ├── model.py ├── pytorch_A2B_dynamics.py ├── requirements.txt ├── run_pipeline.sh └── visualize.py /Audio2BodyPrediction.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookarchive/Audio2BodyDynamics/e79ff68e8d0799ef4452810d5efe9e1506db75d0/Audio2BodyPrediction.gif -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | Facebook has adopted a Code of Conduct that we expect project participants to adhere to. 4 | Please read the [full text](https://code.fb.com/codeofconduct/) 5 | so that you can understand what actions will and will not be tolerated. 6 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to Audio2BodyDynamics 2 | 3 | While we are seeding this project with an initial set of popular tasks and a few 4 | models and examples, ongoing contributions from the research community are 5 | desired to increase the pool of tasks, models, and baselines. 6 | 7 | ## Pull Requests 8 | We actively welcome your pull requests. 9 | 10 | 1. Fork the repo and create your branch from `master`. 11 | 2. If you've added code that should be tested, add tests. 12 | 3. If you've changed APIs, update the documentation. 13 | 4. Make sure your code lints. 14 | 5. If you haven't already, complete the Contributor License Agreement ("CLA"). 15 | 16 | ## Contributor License Agreement ("CLA") 17 | In order to accept your pull request, we need you to submit a CLA. You only need 18 | to do this once to work on any of Facebook's open source projects. 19 | 20 | Complete your CLA here: 21 | 22 | ## Issues 23 | We use GitHub issues for general feature discussion, Q&A and public bugs tracking. 24 | Please ensure your description is clear and has sufficient instructions to be able to 25 | reproduce the issue or understand the problem. 26 | 27 | Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe 28 | disclosure of security bugs. In those cases, please go through the process 29 | outlined on that page and do not file a public issue. 30 | 31 | ## Coding Style 32 | We try to follow the PEP style guidelines and encourage you to as well. 33 | 34 | ## License 35 | By contributing to AudioToBodyDynamics, you agree that your contributions will be licensed 36 | under the LICENSE file in the root directory of this source tree. 37 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Attribution-NonCommercial 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution-NonCommercial 4.0 International Public 58 | License 59 | 60 | By exercising the Licensed Rights (defined below), You accept and agree 61 | to be bound by the terms and conditions of this Creative Commons 62 | Attribution-NonCommercial 4.0 International Public License ("Public 63 | License"). To the extent this Public License may be interpreted as a 64 | contract, You are granted the Licensed Rights in consideration of Your 65 | acceptance of these terms and conditions, and the Licensor grants You 66 | such rights in consideration of benefits the Licensor receives from 67 | making the Licensed Material available under these terms and 68 | conditions. 69 | 70 | Section 1 -- Definitions. 71 | 72 | a. Adapted Material means material subject to Copyright and Similar 73 | Rights that is derived from or based upon the Licensed Material 74 | and in which the Licensed Material is translated, altered, 75 | arranged, transformed, or otherwise modified in a manner requiring 76 | permission under the Copyright and Similar Rights held by the 77 | Licensor. For purposes of this Public License, where the Licensed 78 | Material is a musical work, performance, or sound recording, 79 | Adapted Material is always produced where the Licensed Material is 80 | synched in timed relation with a moving image. 81 | 82 | b. Adapter's License means the license You apply to Your Copyright 83 | and Similar Rights in Your contributions to Adapted Material in 84 | accordance with the terms and conditions of this Public License. 85 | 86 | c. Copyright and Similar Rights means copyright and/or similar rights 87 | closely related to copyright including, without limitation, 88 | performance, broadcast, sound recording, and Sui Generis Database 89 | Rights, without regard to how the rights are labeled or 90 | categorized. For purposes of this Public License, the rights 91 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 92 | Rights. 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. NonCommercial means not primarily intended for or directed towards 116 | commercial advantage or monetary compensation. For purposes of 117 | this Public License, the exchange of the Licensed Material for 118 | other material subject to Copyright and Similar Rights by digital 119 | file-sharing or similar means is NonCommercial provided there is 120 | no payment of monetary compensation in connection with the 121 | exchange. 122 | 123 | j. Share means to provide material to the public by any means or 124 | process that requires permission under the Licensed Rights, such 125 | as reproduction, public display, public performance, distribution, 126 | dissemination, communication, or importation, and to make material 127 | available to the public including in ways that members of the 128 | public may access the material from a place and at a time 129 | individually chosen by them. 130 | 131 | k. Sui Generis Database Rights means rights other than copyright 132 | resulting from Directive 96/9/EC of the European Parliament and of 133 | the Council of 11 March 1996 on the legal protection of databases, 134 | as amended and/or succeeded, as well as other essentially 135 | equivalent rights anywhere in the world. 136 | 137 | l. You means the individual or entity exercising the Licensed Rights 138 | under this Public License. Your has a corresponding meaning. 139 | 140 | Section 2 -- Scope. 141 | 142 | a. License grant. 143 | 144 | 1. Subject to the terms and conditions of this Public License, 145 | the Licensor hereby grants You a worldwide, royalty-free, 146 | non-sublicensable, non-exclusive, irrevocable license to 147 | exercise the Licensed Rights in the Licensed Material to: 148 | 149 | a. reproduce and Share the Licensed Material, in whole or 150 | in part, for NonCommercial purposes only; and 151 | 152 | b. produce, reproduce, and Share Adapted Material for 153 | NonCommercial purposes only. 154 | 155 | 2. Exceptions and Limitations. For the avoidance of doubt, where 156 | Exceptions and Limitations apply to Your use, this Public 157 | License does not apply, and You do not need to comply with 158 | its terms and conditions. 159 | 160 | 3. Term. The term of this Public License is specified in Section 161 | 6(a). 162 | 163 | 4. Media and formats; technical modifications allowed. The 164 | Licensor authorizes You to exercise the Licensed Rights in 165 | all media and formats whether now known or hereafter created, 166 | and to make technical modifications necessary to do so. The 167 | Licensor waives and/or agrees not to assert any right or 168 | authority to forbid You from making technical modifications 169 | necessary to exercise the Licensed Rights, including 170 | technical modifications necessary to circumvent Effective 171 | Technological Measures. For purposes of this Public License, 172 | simply making modifications authorized by this Section 2(a) 173 | (4) never produces Adapted Material. 174 | 175 | 5. Downstream recipients. 176 | 177 | a. Offer from the Licensor -- Licensed Material. Every 178 | recipient of the Licensed Material automatically 179 | receives an offer from the Licensor to exercise the 180 | Licensed Rights under the terms and conditions of this 181 | Public License. 182 | 183 | b. No downstream restrictions. You may not offer or impose 184 | any additional or different terms or conditions on, or 185 | apply any Effective Technological Measures to, the 186 | Licensed Material if doing so restricts exercise of the 187 | Licensed Rights by any recipient of the Licensed 188 | Material. 189 | 190 | 6. No endorsement. Nothing in this Public License constitutes or 191 | may be construed as permission to assert or imply that You 192 | are, or that Your use of the Licensed Material is, connected 193 | with, or sponsored, endorsed, or granted official status by, 194 | the Licensor or others designated to receive attribution as 195 | provided in Section 3(a)(1)(A)(i). 196 | 197 | b. Other rights. 198 | 199 | 1. Moral rights, such as the right of integrity, are not 200 | licensed under this Public License, nor are publicity, 201 | privacy, and/or other similar personality rights; however, to 202 | the extent possible, the Licensor waives and/or agrees not to 203 | assert any such rights held by the Licensor to the limited 204 | extent necessary to allow You to exercise the Licensed 205 | Rights, but not otherwise. 206 | 207 | 2. Patent and trademark rights are not licensed under this 208 | Public License. 209 | 210 | 3. To the extent possible, the Licensor waives any right to 211 | collect royalties from You for the exercise of the Licensed 212 | Rights, whether directly or through a collecting society 213 | under any voluntary or waivable statutory or compulsory 214 | licensing scheme. In all other cases the Licensor expressly 215 | reserves any right to collect such royalties, including when 216 | the Licensed Material is used other than for NonCommercial 217 | purposes. 218 | 219 | Section 3 -- License Conditions. 220 | 221 | Your exercise of the Licensed Rights is expressly made subject to the 222 | following conditions. 223 | 224 | a. Attribution. 225 | 226 | 1. If You Share the Licensed Material (including in modified 227 | form), You must: 228 | 229 | a. retain the following if it is supplied by the Licensor 230 | with the Licensed Material: 231 | 232 | i. identification of the creator(s) of the Licensed 233 | Material and any others designated to receive 234 | attribution, in any reasonable manner requested by 235 | the Licensor (including by pseudonym if 236 | designated); 237 | 238 | ii. a copyright notice; 239 | 240 | iii. a notice that refers to this Public License; 241 | 242 | iv. a notice that refers to the disclaimer of 243 | warranties; 244 | 245 | v. a URI or hyperlink to the Licensed Material to the 246 | extent reasonably practicable; 247 | 248 | b. indicate if You modified the Licensed Material and 249 | retain an indication of any previous modifications; and 250 | 251 | c. indicate the Licensed Material is licensed under this 252 | Public License, and include the text of, or the URI or 253 | hyperlink to, this Public License. 254 | 255 | 2. You may satisfy the conditions in Section 3(a)(1) in any 256 | reasonable manner based on the medium, means, and context in 257 | which You Share the Licensed Material. For example, it may be 258 | reasonable to satisfy the conditions by providing a URI or 259 | hyperlink to a resource that includes the required 260 | information. 261 | 262 | 3. If requested by the Licensor, You must remove any of the 263 | information required by Section 3(a)(1)(A) to the extent 264 | reasonably practicable. 265 | 266 | 4. If You Share Adapted Material You produce, the Adapter's 267 | License You apply must not prevent recipients of the Adapted 268 | Material from complying with this Public License. 269 | 270 | Section 4 -- Sui Generis Database Rights. 271 | 272 | Where the Licensed Rights include Sui Generis Database Rights that 273 | apply to Your use of the Licensed Material: 274 | 275 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 276 | to extract, reuse, reproduce, and Share all or a substantial 277 | portion of the contents of the database for NonCommercial purposes 278 | only; 279 | 280 | b. if You include all or a substantial portion of the database 281 | contents in a database in which You have Sui Generis Database 282 | Rights, then the database in which You have Sui Generis Database 283 | Rights (but not its individual contents) is Adapted Material; and 284 | 285 | c. You must comply with the conditions in Section 3(a) if You Share 286 | all or a substantial portion of the contents of the database. 287 | 288 | For the avoidance of doubt, this Section 4 supplements and does not 289 | replace Your obligations under this Public License where the Licensed 290 | Rights include other Copyright and Similar Rights. 291 | 292 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 293 | 294 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 295 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 296 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 297 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 298 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 299 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 300 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 301 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 302 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 303 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 304 | 305 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 306 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 307 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 308 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 309 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 310 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 311 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 312 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 313 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 314 | 315 | c. The disclaimer of warranties and limitation of liability provided 316 | above shall be interpreted in a manner that, to the extent 317 | possible, most closely approximates an absolute disclaimer and 318 | waiver of all liability. 319 | 320 | Section 6 -- Term and Termination. 321 | 322 | a. This Public License applies for the term of the Copyright and 323 | Similar Rights licensed here. However, if You fail to comply with 324 | this Public License, then Your rights under this Public License 325 | terminate automatically. 326 | 327 | b. Where Your right to use the Licensed Material has terminated under 328 | Section 6(a), it reinstates: 329 | 330 | 1. automatically as of the date the violation is cured, provided 331 | it is cured within 30 days of Your discovery of the 332 | violation; or 333 | 334 | 2. upon express reinstatement by the Licensor. 335 | 336 | For the avoidance of doubt, this Section 6(b) does not affect any 337 | right the Licensor may have to seek remedies for Your violations 338 | of this Public License. 339 | 340 | c. For the avoidance of doubt, the Licensor may also offer the 341 | Licensed Material under separate terms or conditions or stop 342 | distributing the Licensed Material at any time; however, doing so 343 | will not terminate this Public License. 344 | 345 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 346 | License. 347 | 348 | Section 7 -- Other Terms and Conditions. 349 | 350 | a. The Licensor shall not be bound by any additional or different 351 | terms or conditions communicated by You unless expressly agreed. 352 | 353 | b. Any arrangements, understandings, or agreements regarding the 354 | Licensed Material not stated herein are separate from and 355 | independent of the terms and conditions of this Public License. 356 | 357 | Section 8 -- Interpretation. 358 | 359 | a. For the avoidance of doubt, this Public License does not, and 360 | shall not be interpreted to, reduce, limit, restrict, or impose 361 | conditions on any use of the Licensed Material that could lawfully 362 | be made without permission under this Public License. 363 | 364 | b. To the extent possible, if any provision of this Public License is 365 | deemed unenforceable, it shall be automatically reformed to the 366 | minimum extent necessary to make it enforceable. If the provision 367 | cannot be reformed, it shall be severed from this Public License 368 | without affecting the enforceability of the remaining terms and 369 | conditions. 370 | 371 | c. No term or condition of this Public License will be waived and no 372 | failure to comply consented to unless expressly agreed to by the 373 | Licensor. 374 | 375 | d. Nothing in this Public License constitutes or may be interpreted 376 | as a limitation upon, or waiver of, any privileges and immunities 377 | that apply to the Licensor or You, including from the legal 378 | processes of any jurisdiction or authority. 379 | 380 | ======================================================================= 381 | 382 | Creative Commons is not a party to its public 383 | licenses. Notwithstanding, Creative Commons may elect to apply one of 384 | its public licenses to material it publishes and in those instances 385 | will be considered the “Licensor.” The text of the Creative Commons 386 | public licenses is dedicated to the public domain under the CC0 Public 387 | Domain Dedication. Except for the limited purpose of indicating that 388 | material is shared under a Creative Commons public license or as 389 | otherwise permitted by the Creative Commons policies published at 390 | creativecommons.org/policies, Creative Commons does not authorize the 391 | use of the trademark "Creative Commons" or any other trademark or logo 392 | of Creative Commons without its prior written consent including, 393 | without limitation, in connection with any unauthorized modifications 394 | to any of its public licenses or any other arrangements, 395 | understandings, or agreements concerning use of licensed material. For 396 | the avoidance of doubt, this paragraph does not form part of the 397 | public licenses. 398 | 399 | Creative Commons may be contacted at creativecommons.org. 400 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Audio2BodyDynamics 2 | 3 | ## Introduction 4 | This repository contains the code to predict skeleton movements that correspond to music, published in: 5 | * [Audio To Body Dynamics](http://openaccess.thecvf.com/content_cvpr_2018/papers/Shlizerman_Audio_to_Body_CVPR_2018_paper.pdf), CVPR 2018 6 | * Project Page https://arviolin.github.io/AudioBodyDynamics/ 7 | 8 | ## Abstract 9 | We present a method that gets as input an audio of violin 10 | or piano playing, and outputs a video of skeleton predictions 11 | which are further used to animate an avatar. The key 12 | idea is to create an animation of an avatar that moves their 13 | hands similarly to how a pianist or violinist would do, just 14 | from audio. Notably, it’s not clear if body movement can 15 | be predicted from music at all and our aim in this work is 16 | to explore this possibility. In this paper, we present the first 17 | result that shows that natural body dynamics can be predicted. 18 | We built an LSTM network that is trained on violin 19 | and piano recital videos uploaded to the Internet. The predicted 20 | points are applied onto a rigged avatar to create the 21 | animation 22 | 23 | ## Predicted Skeleton Video 24 | ![Predicted Skeleton Video](Audio2BodyPrediction.gif) 25 | 26 | ## Getting Started 27 | 28 | * Install requirements by running: `pip install -r requirements.txt` 29 | * Download [ffmpeg](https://www.ffmpeg.org/download.html) to enable visualization 30 | * This repository contains starter data in the **data** folder. We provide json files formatted as follows 31 | * Naming convention - {**split**}_{**body part**}.json 32 | * **video_id** : **(audio mfcc features, keypoints)** 33 | * keypoints : NxC where N is the number of frames and C is the number of keypoints 34 | * audio mfcc features : NxD where N is the number of frames and D is the number of MFCC Features 35 | 36 | 37 | ## Training Instructions for training on All Keypoints together 38 | 39 | * Run python **pytorch\_A2B_dynamics.py --help** for argument list 40 | * For training 41 | * python pytorch\_A2B_dynamics.py --logfldr {...} --data data/train_all.json --device {...} ... 42 | * See run_pipeline.sh for an example 43 | * For testing - generates video from test model 44 | * python pytorch\_A2B\_dynamics.py --test_model {...} --logfldr {...} --data test_all.json --device {...} ... --audio_file {...} --batch_size 1 45 | * See run_pipeline.sh for an example 46 | * **NB** : Testing is constrained to 1 video at a time. We restrict batch size to 1 for the test video and proceed to generate the whole test sequence at once instead of breaking it up. 47 | 48 | ## Training Instructions for separate training of Body, Lefthand and Righthand 49 | 50 | * We expose data and functionality for training and testing on key-points of individual parts of the body and stitching the final results into a single video. 51 | * **sh run_pipeline.sh** 52 | * Outputs are by default logged to **$HOME/logfldr** 53 | 54 | ## Other Quirks 55 | * Checkpointing saves training data statistics for use in testing. 56 | * Modify FFMPEG_LOC in visualize.py to specify the path to ffmpeg. 57 | * Set the --visualize flag to turn off visualization during testing 58 | * The losses observed for provided data are different from those reported in the paper due to different resolution of train and test images used for this dataset. 59 | 60 | ## Citation 61 | 62 | Please cite the [Audio To Body Dynamics paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Shlizerman_Audio_to_Body_CVPR_2018_paper.pdf) if you use this code: 63 | ``` 64 | @inproceedings{shlizerman2018audio, 65 | title={Audio to body dynamics}, 66 | author={Shlizerman, Eli and Dery, Lucio and Schoen, Hayden and Kemelmacher-Shlizerman, Ira} 67 | journal={CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition} 68 | year={2018} 69 | } 70 | ``` 71 | 72 | ## License 73 | Audio2BodyDynamics is Non-Commercial Creative Commons Licensed. Please refer to [LICENSE](LICENSE). 74 | -------------------------------------------------------------------------------- /data/README.md: -------------------------------------------------------------------------------- 1 | ## Train and test data 2 | Download [data.zip](https://github.com/facebookresearch/Audio2BodyDynamics/releases/download/v1.0/data.zip) containing the files below and extract them to "/data" folder. 3 | The data.zip file is attached with version 1.0 in the GitHub release. 4 | 5 | data.zip: 6 | * -logfldr (trained models) 7 | * --body 8 | * ---best_model_db.pth 9 | * --righthand 10 | * ---best_model_db.pth 11 | * --lefthand 12 | * ---best_model_db.pth 13 | 14 | * -list_of_recital_videos.txt (list of videos used for training) 15 | * -pred_audio.mp4 (sample visualized output generated with trained models) 16 | 17 | * -test_audio.wav (testing music) 18 | * -test_body.json (body keypoints for testing) 19 | * -test_righthand.json (righthand keypoints for testing) 20 | * -test_lefthand.json (lefthand keypoints for testing) 21 | 22 | * -train_body.json (body keypoints for training) 23 | * -train_righthand.json (righthand keypoints for testing) 24 | * -train_lefthand.json (lefthand keypoints for training) 25 | -------------------------------------------------------------------------------- /data_utils/data.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from __future__ import absolute_import, division, print_function, unicode_literals 9 | import json 10 | import numpy as np 11 | from sklearn.decomposition import PCA 12 | 13 | 14 | class dataBatcher(object): 15 | 16 | def __init__(self, data, seq_len, batch_size, delay, shuffle=False): 17 | self.cur_batch = 0 18 | self.data = data 19 | self.seq_len = seq_len 20 | self.batch_size = batch_size 21 | self.delay = delay 22 | 23 | self.num_batches = len(self.data[0]) // (self.seq_len * self.batch_size) 24 | assert(self.num_batches != 0), 'Size of data must be > time_steps x batch_size' 25 | self.indices = range(self.num_batches) 26 | if shuffle: 27 | self.indices = np.random.permutation(self.num_batches) 28 | 29 | def hasNext(self): 30 | return self.cur_batch < len(self.indices) 31 | 32 | def correctDimensions(self, arr): 33 | num_feats = arr.shape[-1] 34 | arr = np.reshape(arr, [self.batch_size, self.seq_len, num_feats]) 35 | arr = np.transpose(arr, (1, 0, 2)) # LSTM expects seqlen x batchsize x D 36 | return arr 37 | 38 | def delayArray(self, arr, dummy_var=0): 39 | arr[self.delay:, :, :] = arr[:(self.seq_len - self.delay), :, :] 40 | arr[:self.delay, :, :] = dummy_var 41 | return arr 42 | 43 | def reconstructKeypsOrder(self, arr): 44 | arr = np.reshape(arr, [self.seq_len, self.batch_size, -1]) 45 | arr = arr[self.delay:, :, :] 46 | arr = np.transpose(arr, (1, 0, 2)) 47 | num_pts = arr.shape[2] 48 | arr = np.reshape(arr, [-1, num_pts]) 49 | arr = np.reshape(arr, [-1, 2, num_pts // 2]) # convert to X-Y format 50 | return arr 51 | 52 | def getNext(self): 53 | start = self.indices[self.cur_batch] * self.seq_len * self.batch_size 54 | end = (self.indices[self.cur_batch] + 1) * self.seq_len * self.batch_size 55 | assert((end - start) == (self.seq_len * self.batch_size)) 56 | cur_aud = np.copy(self.data[0][start: end]) 57 | cur_keyps = np.copy(self.data[1][start: end]) 58 | 59 | cur_aud = self.correctDimensions(cur_aud) 60 | cur_keyps = self.correctDimensions(cur_keyps) 61 | cur_keyps = self.delayArray(cur_keyps) 62 | cur_keyps = np.reshape(cur_keyps, [-1, cur_keyps.shape[2]]) 63 | 64 | mask = np.ones((self.seq_len * self.batch_size, 1)) 65 | mask = self.correctDimensions(mask) 66 | mask = self.delayArray(mask) 67 | mask = np.reshape(mask, [-1, mask.shape[2]]) 68 | 69 | self.cur_batch += 1 70 | return cur_aud, cur_keyps, mask 71 | 72 | 73 | class DataIterator(object): 74 | 75 | def __init__(self, args, data_loc, test_mode=False): 76 | super(DataIterator, self).__init__() 77 | self.test_mode = test_mode 78 | self.seq_len = args.time_steps 79 | if self.test_mode: 80 | assert(args.batch_size == 1), \ 81 | 'No batching at test time. Run on full sequence.' 82 | self.batch_sz = args.batch_size 83 | self.delay = args.time_delay 84 | val_split = args.val_split if not self.test_mode else 1.0 85 | self.loadData(data_loc, val_split, args.upsample_times) 86 | if not self.test_mode: 87 | if (args.numpca > 0): 88 | self.performPCA(args.numpca) 89 | else: 90 | self.pca = None 91 | self.getDataStats() 92 | self.normalizeDataset() 93 | 94 | def stateDict(self): 95 | state_dict = {} 96 | state_dict['pca'] = self.pca 97 | state_dict['audio_stats'] = (self.aud_means, self.aud_stds) 98 | state_dict['keyps_stats'] = (self.means, self.stds) 99 | return state_dict 100 | 101 | def loadStateDict(self, state_dict): 102 | self.pca = state_dict['pca'] 103 | self.aud_means, self.aud_stds = state_dict['audio_stats'] 104 | self.means, self.stds = state_dict['keyps_stats'] 105 | return state_dict 106 | 107 | def getPCASeq(self, seq, pca_dim=0, batch_dim=0): 108 | seq = np.reshape(seq, [self.seq_len, self.batch_sz, -1]) 109 | seq = seq[self.delay:, :, :] 110 | seq = np.transpose(seq, (1, 0, 2)) 111 | return seq[batch_dim, : , pca_dim] 112 | 113 | def processTestData(self, upsample_times=1): 114 | if self.pca: 115 | self.val_keyps = self.pca.transform(self.val_keyps) 116 | self.normalizeDataset() 117 | 118 | def loadData(self, data_loc, val_split, upsample_times): 119 | with open(data_loc, "r+") as fhandle: 120 | data = json.load(fhandle) 121 | self.train_audio, self.train_keyps = [], [] 122 | self.val_audio, self.val_keyps = [], [] 123 | 124 | # Data Format : video_id : (audio_features, body_keypoints, raw_audio) 125 | # body keypoints may or may not be transformed into fixed reference depending 126 | # depending on user. 127 | num_pts_per_batch = self.seq_len 128 | 129 | for _, (audio_feats, keyps) in data.items(): 130 | audio_feats = np.array(audio_feats) 131 | keyps = np.array(keyps) 132 | 133 | if (upsample_times > 0): 134 | audio_feats = self.upsample(audio_feats, upsample_times) 135 | keyps = self.upsample(keyps, upsample_times) 136 | 137 | if not self.test_mode: 138 | # In Training mode, split the data 139 | num_batches = len(audio_feats) // num_pts_per_batch 140 | num_train = int(num_batches * (1 - val_split)) 141 | 142 | # Throw away extra points 143 | audio_feats = audio_feats[:(num_batches * num_pts_per_batch)] 144 | keyps = keyps[:(num_batches * num_pts_per_batch)] 145 | 146 | audio_split = np.split(audio_feats, num_batches) 147 | keyps_split = np.split(keyps, num_batches) 148 | 149 | # Split into Train and Test 150 | train_indices = np.random.choice(num_batches, num_train, replace=False) 151 | val_indices = [x for x in range(num_batches) if x not in train_indices] 152 | 153 | for ind in train_indices: 154 | assert(len(audio_split[ind]) == num_pts_per_batch) 155 | self.train_audio.extend(audio_split[ind]) 156 | self.train_keyps.extend(keyps_split[ind]) 157 | 158 | for ind in val_indices: 159 | assert(len(keyps_split[ind]) == num_pts_per_batch) 160 | self.val_audio.extend(audio_split[ind]) 161 | self.val_keyps.extend(keyps_split[ind]) 162 | else: 163 | self.val_audio = audio_feats 164 | self.val_keyps = keyps 165 | 166 | # Perform Inference on whole video at once 167 | self.seq_len = len(self.val_keyps) 168 | 169 | break # Testing is executed on 1 video at a time. 170 | 171 | self.train_audio, self.train_keyps = \ 172 | np.array(self.train_audio), np.array(self.train_keyps) 173 | self.val_audio, self.val_keyps = \ 174 | np.array(self.val_audio), np.array(self.val_keyps) 175 | 176 | def performPCA(self, num_components): 177 | self.pca = PCA(n_components=num_components) 178 | self.train_keyps = self.pca.fit_transform(self.train_keyps) 179 | self.val_keyps = self.pca.transform(self.val_keyps) 180 | 181 | def getDataStats(self): 182 | self.means = self.train_keyps.mean(axis=0) 183 | self.stds = np.max(self.train_keyps.std(axis=0)) 184 | 185 | self.aud_means = 0.0 186 | self.aud_stds = 1.0 187 | 188 | def normalizeDataset(self): 189 | 190 | def normalize(dataset, mean, std): 191 | EPSILON = 1E-8 192 | if not len(dataset): 193 | return dataset 194 | return (dataset - mean) / (std + EPSILON) 195 | 196 | self.train_keyps = normalize(self.train_keyps, self.means, self.stds) 197 | self.val_keyps = normalize(self.val_keyps, self.means, self.stds) 198 | 199 | self.train_audio = normalize(self.train_audio, self.aud_means, self.aud_stds) 200 | self.val_audio = normalize(self.val_audio, self.aud_means, self.aud_stds) 201 | 202 | def getInOutDimensions(self): 203 | return self.val_audio.shape[1], self.val_keyps.shape[1] 204 | 205 | def reset(self): 206 | self.val_iterator = self.createIterator(is_test=True) 207 | if not self.test_mode: 208 | self.train_iterator = self.createIterator(is_test=False) 209 | 210 | def getNumBatches(self): 211 | train_batches = len(self.train_keyps) // (self.seq_len * self.batch_sz) 212 | val_batches = len(self.val_keyps) // (self.seq_len * self.batch_sz) 213 | return train_batches, val_batches 214 | 215 | def createIterator(self, is_test=False): 216 | dataset = (self.val_audio, self.val_keyps) \ 217 | if is_test else (self.train_audio, self.train_keyps) 218 | return dataBatcher(dataset, self.seq_len, self.batch_sz, self.delay, 219 | shuffle=(not is_test)) 220 | 221 | def hasNext(self, is_test=False): 222 | if is_test: 223 | return self.val_iterator.hasNext() 224 | else: 225 | return self.train_iterator.hasNext() 226 | 227 | def nextBatch(self, is_test=False): 228 | if is_test: 229 | return self.val_iterator.getNext() 230 | else: 231 | return self.train_iterator.getNext() 232 | 233 | def reconstructKeypsOrder(self, batch): 234 | return self.val_iterator.reconstructKeypsOrder(batch) 235 | 236 | def reconstructAudioOrder(self, batch): 237 | return self.val_iterator.reconstructAudioOrder(batch) 238 | 239 | def toPixelSpace(self, predictions): 240 | recon = (predictions * self.stds) + self.means 241 | if self.pca: 242 | recon = self.pca.inverse_transform(recon) 243 | return recon 244 | 245 | def upsample(self, array, n_times, use_repeat=False): 246 | if not len(array): 247 | return array 248 | result = array 249 | for _ in range(n_times): 250 | if use_repeat: 251 | result = np.repeat(result, 2, axis=0) 252 | result = result[:-1] 253 | else: 254 | n_examples, n_feats = result.shape 255 | new_arry = np.zeros((n_examples * 2 - 1, n_feats)) 256 | new_arry[0::2, :] = result 257 | new_arry[1::2, :] = (result[1:, :] + result[:-1, :]) / 2.0 258 | result = new_arry 259 | return result 260 | -------------------------------------------------------------------------------- /data_utils/transform_keypoints.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | """ 9 | Code for transforming keypoints to fixed reference frame in order to isolate 10 | motion due to music. 11 | """ 12 | from __future__ import absolute_import, division, print_function, unicode_literals 13 | import numpy as np 14 | 15 | MIN_VALID_PTS = 2 # The minimum number of valid points to be 16 | 17 | 18 | def normalizePts(pts): 19 | N = pts.shape[0] 20 | cent = np.mean(pts, axis=0) 21 | ptsNorm = pts - cent 22 | sumOfPointDistancesFromOriginSquared = np.sum(np.power(ptsNorm[:, 0:2], 2)) 23 | if sumOfPointDistancesFromOriginSquared > 0: 24 | scaleFactor = \ 25 | np.sqrt(2 * N) / np.sqrt(sumOfPointDistancesFromOriginSquared) 26 | else: 27 | scaleFactor = 1 28 | 29 | ptsNorm = ptsNorm * scaleFactor 30 | 31 | normMtxInv = np.array([[1 / scaleFactor, 0, 0], 32 | [0, 1 / scaleFactor, 0], 33 | [cent[0], cent[1], 1]]) 34 | 35 | return ptsNorm, normMtxInv 36 | 37 | 38 | def transformPtsWithT(pts, T): 39 | if pts.ndim != 2: 40 | raise Exception("Must 2-D array") 41 | newPts = np.zeros(pts.shape) 42 | newPts[:, 0] = (T[0, 0] * pts[:, 0]) + (T[1, 0] * pts[:, 1]) + T[2, 0] 43 | newPts[:, 1] = (T[0, 1] * pts[:, 0]) + (T[1, 1] * pts[:, 1]) + T[2, 1] 44 | return newPts 45 | 46 | 47 | def alignKeypoints(keypoints, reference=None, keyptstouse=None, confthresh=None): 48 | can_transform = (reference is not None) and (keyptstouse is not None) 49 | if can_transform: 50 | pts = keypoints[keyptstouse] 51 | valid_ind = np.where(pts[:, 2] > confthresh)[0] 52 | if (len(valid_ind) >= MIN_VALID_PTS): 53 | fixed_pts = reference[keyptstouse][valid_ind] 54 | valid_keyps = pts[valid_ind] 55 | try: 56 | transform = findNonreflectiveSimilarity(valid_keyps, fixed_pts) 57 | alignedKeypoints = transformPtsWithT(keypoints, transform) 58 | except Exception as e: 59 | print(e) 60 | transform = np.zeros((3, 3)) 61 | transform[0, 0] = transform[1, 1] = 1 62 | alignedKeypoints = keypoints 63 | 64 | else: 65 | transform = np.zeros((3, 3)) 66 | transform[0, 0] = transform[1, 1] = 1 67 | alignedKeypoints = keypoints 68 | return alignedKeypoints, transform 69 | 70 | 71 | def findNonreflectiveSimilarity(src, dst): 72 | src, normMatrix1 = normalizePts(src) 73 | dst, normMatrix2 = normalizePts(dst) 74 | 75 | minRequiredNonCollinearPairs = 2 76 | M = dst.shape[0] 77 | 78 | x = np.expand_dims(dst[:, 0], axis=1) 79 | y = np.expand_dims(dst[:, 1], axis=1) 80 | X = np.concatenate((np.concatenate((x, y, np.ones((M, 1)), np.zeros((M, 1))), axis=1), 81 | np.concatenate((y, -x, np.zeros((M, 1)), np.ones((M, 1))), axis=1)), axis=0) 82 | 83 | u = np.expand_dims(src[:, 0], axis=1) 84 | v = np.expand_dims(src[:, 1], axis=1) 85 | U = np.concatenate((u, v), axis=0) 86 | 87 | # We know that X * r = U 88 | if np.linalg.matrix_rank(X) >= 2 * minRequiredNonCollinearPairs: 89 | r, _, _, _ = np.linalg.lstsq(X, U) 90 | else: 91 | raise ValueError('images:geotrans:requiredNonCollinearPoints', 92 | minRequiredNonCollinearPairs, 'nonreflectivesimilarity') 93 | 94 | sc = float(r[0]) 95 | ss = float(r[1]) 96 | tx = float(r[2]) 97 | ty = float(r[3]) 98 | 99 | Tinv = np.array([[sc, -ss, 0], 100 | [ss, sc, 0], 101 | [tx, ty, 1]]) 102 | 103 | Tinv = np.linalg.solve(normMatrix2, np.dot(Tinv, normMatrix1)) 104 | T = np.linalg.inv(Tinv) 105 | T[:, 2] = np.array([0, 0, 1]) 106 | 107 | return T 108 | 109 | 110 | def buildRotT(sina): 111 | cosa = np.sqrt(1 - sina ** 2) 112 | T = np.array([[cosa, -sina, 0], 113 | [sina, cosa, 0], 114 | [0, 0, 1]]) 115 | return T 116 | -------------------------------------------------------------------------------- /generate_stitched_video.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from __future__ import absolute_import, division, print_function, unicode_literals 9 | import json 10 | import numpy as np 11 | from visualize import visualizeKeypoints 12 | import argparse 13 | 14 | ''' 15 | Example Run 16 | 17 | python generateStitchedVideo.py --vidtype piano --body_path final_body/body_data.json 18 | --righthand_path final_righthand/righthand_data.json 19 | --lefthand_path final_lefthand/lefthand_data.json 20 | --vid_path viz/all_pts.mp4 --pred_path viz/just_pred.mp4 21 | --audio_path audio.wav 22 | ''' 23 | 24 | 25 | def stitchHandsToBody(body_keyps, rh_keyps, lh_keyps): 26 | body_keyps = np.array(body_keyps) 27 | rh_keyps = np.array(rh_keyps) 28 | lh_keyps = np.array(lh_keyps) 29 | for ind, pts in enumerate(body_keyps): 30 | rh_diff = np.expand_dims(pts[:, 2] - rh_keyps[ind][:, 0], axis=1) 31 | lh_diff = np.expand_dims(pts[:, 5] - lh_keyps[ind][:, 0], axis=1) 32 | rh_keyps[ind] = rh_keyps[ind] + rh_diff 33 | lh_keyps[ind] = lh_keyps[ind] + lh_diff 34 | return body_keyps, rh_keyps, lh_keyps 35 | 36 | 37 | def createOptions(): 38 | # Default configuration for PianoNet 39 | parser = argparse.ArgumentParser( 40 | description="Pytorch: Audio To Body Dynamics Model" 41 | ) 42 | parser.add_argument("--body_path", type=str, default="body_data.json", 43 | help="Path to body keypoints") 44 | parser.add_argument("--righthand_path", type=str, default="righthand_data.json", 45 | help="Path to righthand keypoints") 46 | parser.add_argument("--lefthand_path", type=str, default="lefthand_data.json", 47 | help="Path to righthand keypoints") 48 | parser.add_argument("--gt_path", type=str, default="ground_truth.mp4", 49 | help="Where to save the ground_truth video.") 50 | parser.add_argument("--pred_path", type=str, default="predictions.mp4", 51 | help="Where to save the resulting prediction video.") 52 | parser.add_argument("--vidtype", type=str, default='piano', 53 | help="Type of video whether piano or violin") 54 | parser.add_argument("--audio_path", type=str, default=None, 55 | help="Only in for Test. Location audio file for" 56 | " generating test video") 57 | args = parser.parse_args() 58 | return args 59 | 60 | 61 | def main(): 62 | args = createOptions() 63 | body = json.load(open(args.body_path, 'r+')) 64 | righthand = json.load(open(args.righthand_path, 'r+')) 65 | lefthand = json.load(open(args.lefthand_path, 'r+')) 66 | 67 | all_pred_pts = np.concatenate(stitchHandsToBody(body[1], righthand[1], 68 | lefthand[1]), axis=2) 69 | all_targ_pts = np.concatenate((body[0], righthand[0], lefthand[0]), axis=2) 70 | 71 | # Just Gt 72 | visualizeKeypoints(args.vidtype, all_targ_pts, all_pred_pts, 73 | args.audio_path, args.gt_path, show_pred=False) 74 | 75 | # Just Pred 76 | visualizeKeypoints(args.vidtype, all_targ_pts, all_pred_pts, args.audio_path, 77 | args.pred_path, show_gt=False) 78 | 79 | 80 | if __name__ == '__main__': 81 | main() 82 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | from __future__ import unicode_literals 11 | 12 | import torch 13 | import torch.nn as nn 14 | import torch.nn.init as init 15 | from torch.autograd import Variable 16 | 17 | 18 | class AudioToKeypointRNN(nn.Module): 19 | 20 | def __init__(self, options): 21 | super(AudioToKeypointRNN, self).__init__() 22 | 23 | # Instantiating the model 24 | self.init = None 25 | 26 | hidden_dim = options['hidden_dim'] 27 | if options['trainable_init']: 28 | device = options['device'] 29 | batch_sz = options['batch_size'] 30 | # Create the trainable initial state 31 | h_init = \ 32 | init.constant_(torch.empty(1, batch_sz, hidden_dim, device=device), 0.0) 33 | c_init = \ 34 | init.constant_(torch.empty(1, batch_sz, hidden_dim, device=device), 0.0) 35 | h_init = Variable(h_init, requires_grad=True) 36 | c_init = Variable(c_init, requires_grad=True) 37 | self.init = (h_init, c_init) 38 | 39 | # Declare the model 40 | self.lstm = nn.LSTM(options['input_dim'], hidden_dim, 1) 41 | self.dropout = nn.Dropout(options['dropout']) 42 | self.fc = nn.Linear(hidden_dim, options['output_dim']) 43 | 44 | self.initialize() 45 | 46 | def initialize(self): 47 | # Initialize LSTM Weights and Biases 48 | for layer in self.lstm._all_weights: 49 | for param_name in layer: 50 | if 'weight' in param_name: 51 | weight = getattr(self.lstm, param_name) 52 | init.xavier_normal_(weight.data) 53 | else: 54 | bias = getattr(self.lstm, param_name) 55 | init.uniform_(bias.data, 0.25, 0.5) 56 | 57 | # Initialize FC 58 | init.xavier_normal_(self.fc.weight.data) 59 | init.constant_(self.fc.bias.data, 0) 60 | 61 | def forward(self, inputs): 62 | # perform the Forward pass of the model 63 | output, (h_n, c_n) = self.lstm(inputs, self.init) 64 | output = output.view(-1, output.size()[-1]) # flatten before FC 65 | dped_output = self.dropout(output) 66 | predictions = self.fc(dped_output) 67 | return predictions 68 | -------------------------------------------------------------------------------- /pytorch_A2B_dynamics.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | from __future__ import unicode_literals 11 | 12 | import torch 13 | from torch import optim 14 | from model import AudioToKeypointRNN 15 | from torch.autograd import Variable 16 | import matplotlib.pyplot as plt 17 | import os 18 | import argparse 19 | import logging 20 | import numpy as np 21 | import json 22 | from data_utils.data import DataIterator 23 | from visualize import visualizeKeypoints 24 | 25 | ''' 26 | This script takes an input of audio MFCC features and uses 27 | an LSTM recurrent neural network to learn to predict 28 | body joints coordinates 29 | ''' 30 | 31 | logging.basicConfig() 32 | log = logging.getLogger("mannerisms_rnn") 33 | log.setLevel(logging.DEBUG) 34 | torch.manual_seed(1234) 35 | np.random.seed(1234) 36 | 37 | 38 | class AudoToBodyDynamics(object): 39 | 40 | def __init__(self, args, data_locs, is_test=False): 41 | super(AudoToBodyDynamics, self).__init__() 42 | 43 | self.is_test_mode = is_test 44 | self.data_iterator = DataIterator(args, data_locs, test_mode=is_test) 45 | 46 | # Refresh data configuration from checkpoint 47 | if self.is_test_mode: 48 | self.loadDataCheckpoint(args.test_model, args.upsample_times) 49 | 50 | input_dim, output_dim = self.data_iterator.getInOutDimensions() 51 | 52 | # construct the model 53 | model_options = { 54 | 'device': args.device, 55 | 'dropout': args.dp, 56 | 'batch_size': args.batch_size, 57 | 'hidden_dim': args.hidden_size, 58 | 'input_dim': input_dim, 59 | 'output_dim': output_dim, 60 | 'trainable_init': args.trainable_init 61 | } 62 | self.vidtype = args.vidtype 63 | self.device = args.device 64 | self.log_frequency = args.log_frequency 65 | self.upsample_times = args.upsample_times 66 | self.model = AudioToKeypointRNN(model_options).cuda(args.device) 67 | self.optim = optim.Adam(self.model.parameters(), lr=args.lr) 68 | 69 | # Load checkpoint model 70 | if self.is_test_mode: 71 | self.loadModelCheckpoint(args.test_model) 72 | 73 | def buildLoss(self, rnn_out, target, mask): 74 | square_diff = (rnn_out - target)**2 75 | out = torch.sum(square_diff, 1, keepdim=True) 76 | masked_out = out * mask 77 | return torch.mean(masked_out), masked_out 78 | 79 | def saveModel(self, state_info, path): 80 | torch.save(state_info, path) 81 | 82 | def loadModelCheckpoint(self, path): 83 | checkpoint = torch.load(path) 84 | self.model.load_state_dict(checkpoint['model_state_dict']) 85 | self.optim.load_state_dict(checkpoint['optim_state_dict']) 86 | 87 | def loadDataCheckpoint(self, path, upsample_times): 88 | checkpoint = torch.load(path) 89 | self.data_iterator.loadStateDict(checkpoint['data_state_dict']) 90 | self.data_iterator.processTestData(upsample_times=upsample_times) 91 | 92 | def runNetwork(self, validate=False): 93 | def to_numpy(x): 94 | return x.cpu().data.numpy() 95 | 96 | # Set up inputs to the network 97 | batch_info = self.data_iterator.nextBatch(is_test=validate) 98 | in_batch, out_batch, mask_batch = batch_info 99 | inputs = Variable(torch.FloatTensor(in_batch).to(self.device)) 100 | targets = Variable(torch.FloatTensor(out_batch).to(self.device)) 101 | masks = Variable(torch.FloatTensor(mask_batch).to(self.device)) 102 | 103 | # Run the network 104 | predictions = self.model.forward(inputs) 105 | 106 | # Get loss in pca coefficient space 107 | loss, _ = self.buildLoss(predictions, targets, masks) 108 | 109 | # Get loss in pixel space 110 | pixel_predictions = self.data_iterator.toPixelSpace(to_numpy(predictions)) 111 | pixel_predictions = torch.FloatTensor(pixel_predictions).to(self.device) 112 | 113 | pixel_targets = self.data_iterator.toPixelSpace(out_batch) 114 | pixel_targets = torch.FloatTensor(pixel_targets).to(self.device) 115 | _, frame_loss = self.buildLoss(pixel_predictions, pixel_targets, masks) 116 | 117 | frame_loss = frame_loss / pixel_targets.size()[1] 118 | # Gives the average deviation of prediction from target pixel 119 | pixel_loss = torch.mean(torch.sqrt(frame_loss)) 120 | 121 | return (to_numpy(predictions), to_numpy(targets)), loss, pixel_loss 122 | 123 | def runEpoch(self): 124 | pixel_losses, coeff_losses = [], [] 125 | val_pix_losses, val_coeff_losses = [], [] 126 | predictions, targets = [], [] 127 | 128 | while (not self.is_test_mode and self.data_iterator.hasNext(is_test=False)): 129 | self.model.train() 130 | _, pca_coeff_loss, pixel_loss = self.runNetwork(validate=False) 131 | self.optim.zero_grad() 132 | pca_coeff_loss.backward() 133 | self.optim.step() 134 | 135 | pca_coeff_loss = pca_coeff_loss.data.tolist() 136 | pixel_loss = pixel_loss.data.tolist() 137 | pixel_losses.append(pixel_loss) 138 | coeff_losses.append(pca_coeff_loss) 139 | 140 | while(self.data_iterator.hasNext(is_test=True)): 141 | self.model.eval() 142 | vis_data, pca_coeff_loss, pixel_loss = self.runNetwork(validate=True) 143 | pca_coeff_loss = pca_coeff_loss.data.tolist() 144 | pixel_loss = pixel_loss.data.tolist() 145 | 146 | val_pix_losses.append(pixel_loss) 147 | val_coeff_losses.append(pca_coeff_loss) 148 | 149 | predictions.append(vis_data[0]) 150 | targets.append(vis_data[1]) 151 | 152 | train_info = (pixel_losses, coeff_losses) 153 | val_info = (val_pix_losses, val_coeff_losses) 154 | return train_info, val_info, predictions, targets 155 | 156 | def trainModel(self, max_epochs, logfldr, patience): 157 | log.debug("Training model") 158 | epoch_losses = [] 159 | batch_losses = [] 160 | val_losses = [] 161 | i, best_loss, iters_without_improvement = 0, float('inf'), 0 162 | best_train_loss, best_val_loss = float('inf'), float('inf') 163 | 164 | while(i < max_epochs): 165 | i += 1 166 | self.data_iterator.reset() 167 | iter_train, iter_val, predictions, targets = self.runEpoch() 168 | iter_mean = np.mean(iter_train[0]), np.mean(iter_train[1]) 169 | iter_val_mean = np.mean(iter_val[0]), np.mean(iter_val[1]) 170 | 171 | epoch_losses.append(iter_mean) 172 | batch_losses.extend(iter_train) 173 | val_losses.append(iter_val_mean) 174 | 175 | log.info("Epoch {} / {}".format(i, max_epochs)) 176 | log.info("Training Loss (1980 x 1080): {}".format(iter_mean)) 177 | log.info("Validation Loss (1980 x 1080): {}".format(iter_val_mean)) 178 | 179 | improved = iter_val_mean[1] < best_loss 180 | if improved: 181 | best_loss = iter_val_mean[1] 182 | best_val_loss = iter_val_mean 183 | best_train_loss = iter_mean 184 | iters_without_improvement = 0 185 | else: 186 | iters_without_improvement += 1 187 | if iters_without_improvement >= patience: 188 | log.info("Stopping Early because no improvment in {}".format( 189 | iters_without_improvement)) 190 | break 191 | if improved or (i % self.log_frequency) == 0: 192 | # Save the model information 193 | path = os.path.join(logfldr, "Epoch_{}".format(i)) 194 | os.makedirs(path) 195 | path = os.path.join(path, "model_db.pth") 196 | state_info = { 197 | 'epoch': i, 198 | 'epoch_losses': epoch_losses, 199 | 'batch_losses': batch_losses, 200 | 'validation_losses': val_losses, 201 | 'model_state_dict': self.model.state_dict(), 202 | 'optim_state_dict': self.optim.state_dict(), 203 | 'data_state_dict': self.data_iterator.stateDict() 204 | } 205 | self.saveModel(state_info, path) 206 | if improved: 207 | path = os.path.join(logfldr, "best_model_db.pth") 208 | self.saveModel(state_info, path) 209 | 210 | # Visualize the PCA Coefficients 211 | num_vis = min(3, targets[0].shape[-1]) 212 | for j in range(num_vis): 213 | save_path = os.path.join( 214 | logfldr, "Epoch_{}/pca_{}.png".format(i, j)) 215 | self.visualizePCA(predictions[0], targets[0], j, save_path) 216 | 217 | self.plotResults(logfldr, epoch_losses, batch_losses, val_losses) 218 | return best_train_loss, best_val_loss 219 | 220 | def formatVizArrays(self, predictions, targets): 221 | final_pred, final_targ = [], [] 222 | for ind, pred in enumerate(predictions): 223 | pred = self.data_iterator.toPixelSpace(pred) 224 | targ = self.data_iterator.toPixelSpace(targets[ind]) 225 | pred = self.data_iterator.reconstructKeypsOrder(pred) 226 | targ = self.data_iterator.reconstructKeypsOrder(targ) 227 | final_pred.append(pred) 228 | final_targ.append(targ) 229 | 230 | final_pred, final_targ = np.vstack(final_pred), np.vstack(final_targ) 231 | final_pred = final_pred[0::(2**self.upsample_times)] 232 | final_targ = final_targ[0::(2**self.upsample_times)] 233 | 234 | return final_pred, final_targ 235 | 236 | def visualizePCA(self, preds, targets, pca_dim, save_path): 237 | preds = self.data_iterator.getPCASeq(preds, pca_dim=pca_dim) 238 | targs = self.data_iterator.getPCASeq(targets, pca_dim=pca_dim) 239 | assert(len(preds) == len(targs)) 240 | plt.plot(preds, color='red', label='Predictions') 241 | plt.plot(targs, color='green', label='Ground Truth') 242 | plt.legend() 243 | plt.savefig(save_path) 244 | plt.close() 245 | 246 | def plotResults(self, logfldr, epoch_losses, batch_losses, val_losses): 247 | losses = [epoch_losses, batch_losses, val_losses] 248 | names = [ 249 | ["Epoch pixel losses", "Epoch coeff losses"], 250 | ["Batch pixel losses", "Batch coeff losses"], 251 | ["Val pixel losses", "Val coeff losses"]] 252 | _, ax = plt.subplots(nrows=len(losses), ncols=2) 253 | for index, pair in enumerate(zip(losses, names)): 254 | for i in range(2): 255 | data = [pair[0][j][i] for j in range(len(pair[0]))] 256 | ax[index][i].plot(data, label=pair[1][i]) 257 | ax[index][i].legend() 258 | save_filename = os.path.join(logfldr, "results.png") 259 | plt.savefig(save_filename) 260 | plt.close() 261 | 262 | 263 | def createOptions(): 264 | # Default configuration for PianoNet 265 | parser = argparse.ArgumentParser( 266 | description="Pytorch: Audio To Body Dynamics Model" 267 | ) 268 | parser.add_argument("--data", type=str, default="piano_data.json", 269 | help="Path to data file") 270 | parser.add_argument("--audio_file", type=str, default=None, 271 | help="Only in for Test. Location audio file for" 272 | " generating test video") 273 | parser.add_argument("--logfldr", type=str, default=None, 274 | help="Path to folder to save training information", 275 | required=True) 276 | parser.add_argument("--batch_size", type=int, default=100, 277 | help="Training batch size. Set to 1 in test") 278 | parser.add_argument("--val_split", type=float, default=0.2, 279 | help="The fraction of the training data to use as validation") 280 | parser.add_argument("--hidden_size", type=int, default=200, 281 | help="Dimension of the hidden representation") 282 | parser.add_argument("--test_model", type=str, default=None, 283 | help="Location for saved model to load") 284 | parser.add_argument("--vidtype", type=str, default='piano', 285 | help="Type of video whether piano or violin") 286 | parser.add_argument("--visualize", type=bool, default=False, 287 | help="Visualize the output of the model. Use only in Test") 288 | parser.add_argument("--save_predictions", type=bool, default=True, 289 | help="Whether or not to save predictions. Use only in Test") 290 | parser.add_argument("--device", type=str, default="cuda:0", 291 | help="Device to train on. Use 'cpu' if to train on cpu.") 292 | parser.add_argument("--max_epochs", type=int, default=300, 293 | help="max number of epochs to run for") 294 | parser.add_argument("--lr", type=float, default=1e-3, 295 | help="Learning Rate for optimizer") 296 | parser.add_argument("--time_steps", type=int, default=60, 297 | help="Prediction Timesteps") 298 | parser.add_argument("--patience", type=int, default=100, 299 | help="Number of epochs with no validation improvement" 300 | " before stopping training") 301 | parser.add_argument("--time_delay", type=int, default=6, 302 | help="Time delay for RNN. Negative values mean no delay." 303 | "Give in terms of frames. 30 frames = 1 second.") 304 | parser.add_argument("--dp", type=float, default=0.1, 305 | help="Dropout Ratio For Trainining") 306 | parser.add_argument("--upsample_times", type=int, default=2, 307 | help="number of times to upsample") 308 | parser.add_argument("--numpca", type=int, default=15, 309 | help="number of pca dimensions. Use -1 if no pca - " 310 | "Train on XY coordinates") 311 | parser.add_argument("--log_frequency", type=int, default=10, 312 | help="The frequency with which to checkpoint the model") 313 | parser.add_argument("--trainable_init", action='store_false', 314 | help="LSTM initial state should be trained. Default is True") 315 | 316 | args = parser.parse_args() 317 | return args 318 | 319 | 320 | def main(): 321 | args = createOptions() 322 | args.device = torch.device(args.device) 323 | data_loc = args.data 324 | is_test_mode = args.test_model is not None 325 | 326 | dynamics_learner = AudoToBodyDynamics(args, data_loc, is_test=is_test_mode) 327 | logfldr = args.logfldr 328 | if not os.path.isdir(logfldr): 329 | os.makedirs(logfldr) 330 | 331 | if not is_test_mode: 332 | min_train, min_val = dynamics_learner.trainModel( 333 | args.max_epochs, logfldr, args.patience) 334 | else: 335 | dynamics_learner.data_iterator.reset() 336 | outputs = dynamics_learner.runEpoch() 337 | iter_train, iter_val, targ, pred = outputs 338 | min_train, min_val = np.mean(iter_train[0]), np.mean(iter_val[0]) 339 | 340 | # Format the visualization appropriately 341 | targ, pred = dynamics_learner.formatVizArrays(pred, targ) 342 | 343 | # Save the predictions 344 | if args.save_predictions: 345 | viz_info = (targ.tolist(), pred.tolist()) 346 | save_path = "{}/{}_data.json".format(logfldr, args.vidtype) 347 | json.dump(viz_info, open(save_path, 'w+')) 348 | 349 | # Create Video of Results 350 | if args.visualize: 351 | vid_path = "{}/{}.mp4".format(logfldr, args.vidtype) 352 | visualizeKeypoints(args.vidtype, targ, pred, args.audio_file, vid_path) 353 | 354 | best_lossess = [min_train, min_val] 355 | log.info("The best validation is : {}".format(best_lossess)) 356 | 357 | 358 | if __name__ == '__main__': 359 | main() 360 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | certifi==2018.8.24 2 | cycler==0.10.0 3 | kiwisolver==1.0.1 4 | matplotlib==3.0.0 5 | numpy==1.15.2 6 | Pillow==5.3.0 7 | pyparsing==2.2.2 8 | python-dateutil==2.7.3 9 | scikit-learn==0.20.0 10 | scipy==1.1.0 11 | six==1.11.0 12 | sklearn==0.0 13 | torch==0.4.1 14 | torchvision==0.2.1 15 | opencv-python==3.4.3.18 16 | -------------------------------------------------------------------------------- /run_pipeline.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright (c) Facebook, Inc. and its affiliates. 4 | # All rights reserved. 5 | # 6 | # This source code is licensed under the license found in the 7 | # LICENSE file in the root directory of this source tree. 8 | # 9 | 10 | # This script provides end-to-end training, testing and visualization on separate body parts 11 | 12 | test_audio="data/test_audio.wav" 13 | 14 | # Update Arguments with desired configuration 15 | 16 | # Train on Body 17 | echo "TRAINING MODEL ON BODY KEYPOINTS" 18 | part="body" 19 | # python pytorch_A2B_dynamics.py --data "data/train_$part.json" --vidtype body --numpca -1 --upsample_times 1 --logfldr "$HOME/logfldr/$part" --max_epoch 100 --time_steps 150 20 | # Test on Body 21 | echo "TESTING MODEL ON BODY KEYPOINTS" 22 | python pytorch_A2B_dynamics.py --data "data/test_$part.json" --test_model "$HOME/logfldr/$part/best_model_db.pth" --vidtype "$part" --batch_size 1 --audio_file "$test_audio" --logfldr "$HOME/logfldr/$part" 23 | 24 | # Train on Left Hand 25 | echo "TRAINING MODEL ON LEFTHAND KEYPOINTS" 26 | part="lefthand" 27 | # python pytorch_A2B_dynamics.py --data "data/train_$part.json" --vidtype body --numpca 15 --upsample_times 1 --logfldr "$HOME/logfldr/$part" --max_epoch 100 --time_steps 150 28 | # Test on Left Hand 29 | echo "TESTING MODEL ON LEFTHAND KEYPOINTS" 30 | python pytorch_A2B_dynamics.py --data "data/test_$part.json" --test_model "$HOME/logfldr/$part/best_model_db.pth" --vidtype "$part" --batch_size 1 --audio_file "$test_audio" --logfldr "$HOME/logfldr/$part" 31 | 32 | # Train on Right Hand 33 | echo "TRAINING MODEL ON RIGHTHAND KEYPOINTS" 34 | part="righthand" 35 | # python pytorch_A2B_dynamics.py --data "data/train_$part.json" --vidtype body --numpca 15 --upsample_times 1 --logfldr "$HOME/logfldr/$part" --max_epoch 100 --time_steps 150 36 | # Test on Right Hand 37 | echo "TESTING MODEL ON RIGHTHAND KEYPOINTS" 38 | python pytorch_A2B_dynamics.py --data "data/test_$part.json" --test_model "$HOME/logfldr/$part/best_model_db.pth" --vidtype "$part" --batch_size 1 --audio_file "$test_audio" --logfldr "$HOME/logfldr/$part" 39 | 40 | # Generate the Stitched Video 41 | echo "GENERATING VIDEO OF STITCHED PART KEYPOINTS" 42 | vidtype="piano" 43 | python generate_stitched_video.py --vidtype "$vidtype" --body_path "$HOME/logfldr/body/body_data.json" --righthand_path "$HOME/logfldr/righthand/righthand_data.json" --lefthand_path \ 44 | "$HOME/logfldr/lefthand/lefthand_data.json" --gt_path "$HOME/logfldr/gt.mp4" --pred_path "$HOME/logfldr/pred.mp4" --audio_path "$test_audio" 45 | -------------------------------------------------------------------------------- /visualize.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | 3 | from __future__ import absolute_import 4 | from __future__ import division 5 | from __future__ import print_function 6 | from __future__ import unicode_literals 7 | import cv2 8 | import os 9 | import numpy as np 10 | from data_utils.transform_keypoints import transformPtsWithT 11 | from scipy.io.wavfile import write 12 | 13 | THRESH = 0 14 | FFMPEG_LOC = "ffmpeg " 15 | 16 | 17 | def getUpperOPBodyKeypsLines(): 18 | kp_lines = [[0, 1], [1, 2], [0, 3], [3, 4], [4, 5]] 19 | return kp_lines 20 | 21 | 22 | def getUpperOPSHELBodyKeypsLines(): 23 | kp_lines = [[0, 1], [0, 2], [2, 3]] 24 | return kp_lines 25 | 26 | 27 | def getUpperOPELWRBodyKeypsLines(): 28 | kp_lines = [[0, 1], [2, 3]] 29 | return kp_lines 30 | 31 | 32 | def getUpperOPWRBodyKeypsLines(): 33 | kp_lines = [[0, 1]] 34 | return kp_lines 35 | 36 | 37 | def getUpperOPONESIDEBodyKeypsLines(): 38 | kp_lines = [[0, 1], [1, 2]] 39 | return kp_lines 40 | 41 | 42 | def getUpperOPHandsKeypsLines(): 43 | kp_lines = [[0, 1], [1, 2], [2, 3], [3, 4], [0, 5], 44 | [5, 6], [6, 7], [7, 8], [0, 9], [9, 10], 45 | [10, 11], [11, 12], [0, 13], [13, 14], [14, 15], 46 | [15, 16], [0, 17], [17, 18], [18, 19], [19, 20]] 47 | return kp_lines 48 | 49 | 50 | def drawMouth(image, mouthlmk, color=(0, 255, 0)): 51 | x = mouthlmk[:, 0] 52 | y = mouthlmk[:, 1] 53 | for indices in range(len(x) - 1): 54 | x1, x2 = int(x[indices]), int(x[indices + 1]) 55 | y1, y2 = int(y[indices]), int(y[indices + 1]) 56 | if x1 >= 0 and x2 >= 0 and y1 >= 0 and y2 >= 0: 57 | pt1, pt2 = (x1, y1), (x2, y2) 58 | cv2.line(image, pt1, pt2, color, 1, cv2.LINE_AA) 59 | return image 60 | 61 | 62 | def drawBody(image, bodylmk, tform=None, confidences=None, color=(0, 255, 0), 63 | diffx=-750, diffy=-100): 64 | lines = getUpperOPBodyKeypsLines() 65 | x = bodylmk[0] 66 | y = bodylmk[1] 67 | for indices in lines: 68 | # incorporating tform 69 | if confidences[indices[0]] < THRESH or confidences[indices[1]] < THRESH: 70 | continue 71 | 72 | x1, x2 = int(x[indices[0]]), int(x[indices[1]]) 73 | y1, y2 = int(y[indices[0]]), int(y[indices[1]]) 74 | 75 | if tform is not None: 76 | 77 | tpts = transformPtsWithT(np.array([[x1, y1], [x2, y2]]), tform) 78 | x1, x2 = int(tpts[0, 0]), int(tpts[1, 0]) 79 | y1, y2 = int(tpts[0, 1]), int(tpts[1, 1]) 80 | 81 | # pdb.set_trace() 82 | x1, x2 = x1 + diffx, x2 + diffx 83 | y1, y2 = y1 + diffy, y2 + diffy 84 | 85 | if x1 >= 0 and x2 >= 0 and y1 >= 0 and y2 >= 0: 86 | pt1, pt2 = (x1, y1), (x2, y2) 87 | cv2.circle(image, pt1, 4, color, -1, cv2.LINE_AA) 88 | cv2.circle(image, pt2, 4, color, -1, cv2.LINE_AA) 89 | cv2.line(image, pt1, pt2, color, 3, cv2.LINE_AA) 90 | return image 91 | 92 | 93 | def drawBodyAndFingers(vidtype, image, bodylmk, tform=None, confidences=None, 94 | color=(0, 255, 0)): 95 | 96 | if vidtype == 'shouldelbows': 97 | lines = getUpperOPSHELBodyKeypsLines() 98 | elif vidtype == 'elbowswrists': 99 | lines = getUpperOPELWRBodyKeypsLines() 100 | elif vidtype == 'wrists' or vidtype == 'vshould': 101 | lines = getUpperOPWRBodyKeypsLines() 102 | elif vidtype == 'righthand' or vidtype == 'vrighthand': 103 | lines = [] 104 | elif vidtype == 'lefthand' or vidtype == 'vlefthand': 105 | lines = [] 106 | elif vidtype == 'violinleft' or vidtype == 'violinright': 107 | lines = getUpperOPONESIDEBodyKeypsLines() 108 | else: 109 | lines = getUpperOPBodyKeypsLines() 110 | 111 | x = bodylmk[0] 112 | y = bodylmk[1] 113 | for indices in lines: 114 | 115 | x1, x2 = int(x[indices[0]]), int(x[indices[1]]) 116 | y1, y2 = int(y[indices[0]]), int(y[indices[1]]) 117 | 118 | if tform is not None: 119 | tpts = transformPtsWithT(np.array([[x1, y1], [x2, y2]]), tform) 120 | x1, x2 = int(tpts[0, 0]), int(tpts[1, 0]) 121 | y1, y2 = int(tpts[0, 1]), int(tpts[1, 1]) 122 | 123 | if x1 >= 0 and x2 >= 0 and y1 >= 0 and y2 >= 0: 124 | pt1, pt2 = (x1, y1), (x2, y2) 125 | cv2.circle(image, pt1, 3, color, -1, cv2.LINE_AA) 126 | cv2.circle(image, pt2, 3, color, -1, cv2.LINE_AA) 127 | cv2.line(image, pt1, pt2, color, 2, cv2.LINE_AA) 128 | 129 | if vidtype == 'violin': 130 | shft = 8 131 | hlines = np.array(getUpperOPHandsKeypsLines()) + shft 132 | hlines = np.append(hlines, np.array(getUpperOPHandsKeypsLines()) + 21 + shft, 0) 133 | elif vidtype == 'piano': 134 | shft = 7 135 | hlines = np.array(getUpperOPHandsKeypsLines()) + shft 136 | hlines = np.append(hlines, np.array(getUpperOPHandsKeypsLines()) + 21 + shft, 0) 137 | elif vidtype == 'righthand' or vidtype == 'lefthand' or \ 138 | vidtype == 'vrighthand' or vidtype == 'vlefthand': 139 | shft = 0 140 | hlines = np.array(getUpperOPHandsKeypsLines()) + shft 141 | else: 142 | return image 143 | 144 | for indices in hlines: 145 | x1, x2 = int(x[indices[0]]), int(x[indices[1]]) 146 | y1, y2 = int(y[indices[0]]), int(y[indices[1]]) 147 | 148 | if x1 >= 0 and x2 >= 0 and y1 >= 0 and y2 >= 0: 149 | pt1, pt2 = (x1, y1), (x2, y2) 150 | cv2.circle(image, pt1, 2, color, -1, cv2.LINE_AA) 151 | cv2.circle(image, pt2, 2, color, -1, cv2.LINE_AA) 152 | cv2.line(image, pt1, pt2, color, 2, cv2.LINE_AA) 153 | return image 154 | 155 | 156 | def writeAudio(vid_loc, audio_loc): 157 | new_vid_loc = vid_loc.split(".mp4")[0] + "_audio.mp4" 158 | cmd = FFMPEG_LOC + " -loglevel panic -i " + vid_loc + " -i " + audio_loc 159 | cmd += " -c:v copy -c:a aac -strict experimental " + new_vid_loc 160 | os.system(cmd) 161 | return new_vid_loc 162 | 163 | 164 | def videoFromImages(imgs, outputfile, audio_path, fps=27.1): 165 | fourcc_format = cv2.VideoWriter_fourcc(*'MP4V') 166 | size = imgs[0].shape[1], imgs[0].shape[0] 167 | vid = cv2.VideoWriter(outputfile, fourcc_format, fps, size) 168 | for img in imgs: 169 | vid.write(img) 170 | vid.release() 171 | if audio_path is not None: 172 | writeAudio(outputfile, audio_path) 173 | return outputfile 174 | 175 | 176 | def visualizeKeypoints(vidtype, targetKeypts, predictedKeypts, audio_path, 177 | outfile, img_size=300, show_pred=True, show_gt=True, fps=27.1): 178 | targ_color = (0, 255, 0) 179 | pred_color = (0, 0, 255) 180 | images = [] 181 | 182 | for ind, targkeyps in enumerate(targetKeypts): 183 | img_y, img_x = (img_size, img_size) 184 | if show_gt and show_pred: 185 | img_y, img_x = (img_size, img_size * 2) 186 | targkeyps[0] += img_size # Shift the ground truth points 187 | 188 | newImage = np.ones((img_y, img_x, 3), dtype=np.uint8) * 255 189 | predkeyps = predictedKeypts[ind] 190 | if show_pred: 191 | newImage = drawBodyAndFingers(vidtype, newImage, predkeyps, 192 | color=pred_color) 193 | if show_gt: 194 | newImage = drawBodyAndFingers(vidtype, newImage, targkeyps, 195 | color=targ_color) 196 | 197 | newImage = cv2.resize(newImage, None, fx=2, fy=2, 198 | interpolation=cv2.INTER_CUBIC) 199 | 200 | images.append(newImage) 201 | if images: 202 | videoFromImages(images, outfile, audio_path, fps=fps) 203 | --------------------------------------------------------------------------------