├── Audio2BodyPrediction.gif
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── data
    └── README.md
├── data_utils
    ├── data.py
    └── transform_keypoints.py
├── generate_stitched_video.py
├── model.py
├── pytorch_A2B_dynamics.py
├── requirements.txt
├── run_pipeline.sh
└── visualize.py


/Audio2BodyPrediction.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookarchive/Audio2BodyDynamics/e79ff68e8d0799ef4452810d5efe9e1506db75d0/Audio2BodyPrediction.gif


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Code of Conduct
2 | 
3 | Facebook has adopted a Code of Conduct that we expect project participants to adhere to.
4 | Please read the [full text](https://code.fb.com/codeofconduct/)
5 | so that you can understand what actions will and will not be tolerated.
6 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to Audio2BodyDynamics
 2 | 
 3 | While we are seeding this project with an initial set of popular tasks and a few
 4 | models and examples, ongoing contributions from the research community are
 5 | desired to increase the pool of tasks, models, and baselines.
 6 | 
 7 | ## Pull Requests
 8 | We actively welcome your pull requests.
 9 | 
10 | 1. Fork the repo and create your branch from `master`.
11 | 2. If you've added code that should be tested, add tests.
12 | 3. If you've changed APIs, update the documentation.
13 | 4. Make sure your code lints.
14 | 5. If you haven't already, complete the Contributor License Agreement ("CLA").
15 | 
16 | ## Contributor License Agreement ("CLA")
17 | In order to accept your pull request, we need you to submit a CLA. You only need
18 | to do this once to work on any of Facebook's open source projects.
19 | 
20 | Complete your CLA here: <https://code.facebook.com/cla>
21 | 
22 | ## Issues
23 | We use GitHub issues for general feature discussion, Q&A and public bugs tracking.
24 | Please ensure your description is clear and has sufficient instructions to be able to
25 | reproduce the issue or understand the problem.
26 | 
27 | Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
28 | disclosure of security bugs. In those cases, please go through the process
29 | outlined on that page and do not file a public issue.
30 | 
31 | ## Coding Style  
32 | We try to follow the PEP style guidelines and encourage you to as well.
33 | 
34 | ## License
35 | By contributing to AudioToBodyDynamics, you agree that your contributions will be licensed
36 | under the LICENSE file in the root directory of this source tree.
37 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Attribution-NonCommercial 4.0 International
  2 | 
  3 | =======================================================================
  4 | 
  5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
  6 | does not provide legal services or legal advice. Distribution of
  7 | Creative Commons public licenses does not create a lawyer-client or
  8 | other relationship. Creative Commons makes its licenses and related
  9 | information available on an "as-is" basis. Creative Commons gives no
 10 | warranties regarding its licenses, any material licensed under their
 11 | terms and conditions, or any related information. Creative Commons
 12 | disclaims all liability for damages resulting from their use to the
 13 | fullest extent possible.
 14 | 
 15 | Using Creative Commons Public Licenses
 16 | 
 17 | Creative Commons public licenses provide a standard set of terms and
 18 | conditions that creators and other rights holders may use to share
 19 | original works of authorship and other material subject to copyright
 20 | and certain other rights specified in the public license below. The
 21 | following considerations are for informational purposes only, are not
 22 | exhaustive, and do not form part of our licenses.
 23 | 
 24 |      Considerations for licensors: Our public licenses are
 25 |      intended for use by those authorized to give the public
 26 |      permission to use material in ways otherwise restricted by
 27 |      copyright and certain other rights. Our licenses are
 28 |      irrevocable. Licensors should read and understand the terms
 29 |      and conditions of the license they choose before applying it.
 30 |      Licensors should also secure all rights necessary before
 31 |      applying our licenses so that the public can reuse the
 32 |      material as expected. Licensors should clearly mark any
 33 |      material not subject to the license. This includes other CC-
 34 |      licensed material, or material used under an exception or
 35 |      limitation to copyright. More considerations for licensors:
 36 | 	wiki.creativecommons.org/Considerations_for_licensors
 37 | 
 38 |      Considerations for the public: By using one of our public
 39 |      licenses, a licensor grants the public permission to use the
 40 |      licensed material under specified terms and conditions. If
 41 |      the licensor's permission is not necessary for any reason--for
 42 |      example, because of any applicable exception or limitation to
 43 |      copyright--then that use is not regulated by the license. Our
 44 |      licenses grant only permissions under copyright and certain
 45 |      other rights that a licensor has authority to grant. Use of
 46 |      the licensed material may still be restricted for other
 47 |      reasons, including because others have copyright or other
 48 |      rights in the material. A licensor may make special requests,
 49 |      such as asking that all changes be marked or described.
 50 |      Although not required by our licenses, you are encouraged to
 51 |      respect those requests where reasonable. More_considerations
 52 |      for the public: 
 53 | 	wiki.creativecommons.org/Considerations_for_licensees
 54 | 
 55 | =======================================================================
 56 | 
 57 | Creative Commons Attribution-NonCommercial 4.0 International Public
 58 | License
 59 | 
 60 | By exercising the Licensed Rights (defined below), You accept and agree
 61 | to be bound by the terms and conditions of this Creative Commons
 62 | Attribution-NonCommercial 4.0 International Public License ("Public
 63 | License"). To the extent this Public License may be interpreted as a
 64 | contract, You are granted the Licensed Rights in consideration of Your
 65 | acceptance of these terms and conditions, and the Licensor grants You
 66 | such rights in consideration of benefits the Licensor receives from
 67 | making the Licensed Material available under these terms and
 68 | conditions.
 69 | 
 70 | Section 1 -- Definitions.
 71 | 
 72 |   a. Adapted Material means material subject to Copyright and Similar
 73 |      Rights that is derived from or based upon the Licensed Material
 74 |      and in which the Licensed Material is translated, altered,
 75 |      arranged, transformed, or otherwise modified in a manner requiring
 76 |      permission under the Copyright and Similar Rights held by the
 77 |      Licensor. For purposes of this Public License, where the Licensed
 78 |      Material is a musical work, performance, or sound recording,
 79 |      Adapted Material is always produced where the Licensed Material is
 80 |      synched in timed relation with a moving image.
 81 | 
 82 |   b. Adapter's License means the license You apply to Your Copyright
 83 |      and Similar Rights in Your contributions to Adapted Material in
 84 |      accordance with the terms and conditions of this Public License.
 85 | 
 86 |   c. Copyright and Similar Rights means copyright and/or similar rights
 87 |      closely related to copyright including, without limitation,
 88 |      performance, broadcast, sound recording, and Sui Generis Database
 89 |      Rights, without regard to how the rights are labeled or
 90 |      categorized. For purposes of this Public License, the rights
 91 |      specified in Section 2(b)(1)-(2) are not Copyright and Similar
 92 |      Rights.
 93 |   d. Effective Technological Measures means those measures that, in the
 94 |      absence of proper authority, may not be circumvented under laws
 95 |      fulfilling obligations under Article 11 of the WIPO Copyright
 96 |      Treaty adopted on December 20, 1996, and/or similar international
 97 |      agreements.
 98 | 
 99 |   e. Exceptions and Limitations means fair use, fair dealing, and/or
100 |      any other exception or limitation to Copyright and Similar Rights
101 |      that applies to Your use of the Licensed Material.
102 | 
103 |   f. Licensed Material means the artistic or literary work, database,
104 |      or other material to which the Licensor applied this Public
105 |      License.
106 | 
107 |   g. Licensed Rights means the rights granted to You subject to the
108 |      terms and conditions of this Public License, which are limited to
109 |      all Copyright and Similar Rights that apply to Your use of the
110 |      Licensed Material and that the Licensor has authority to license.
111 | 
112 |   h. Licensor means the individual(s) or entity(ies) granting rights
113 |      under this Public License.
114 | 
115 |   i. NonCommercial means not primarily intended for or directed towards
116 |      commercial advantage or monetary compensation. For purposes of
117 |      this Public License, the exchange of the Licensed Material for
118 |      other material subject to Copyright and Similar Rights by digital
119 |      file-sharing or similar means is NonCommercial provided there is
120 |      no payment of monetary compensation in connection with the
121 |      exchange.
122 | 
123 |   j. Share means to provide material to the public by any means or
124 |      process that requires permission under the Licensed Rights, such
125 |      as reproduction, public display, public performance, distribution,
126 |      dissemination, communication, or importation, and to make material
127 |      available to the public including in ways that members of the
128 |      public may access the material from a place and at a time
129 |      individually chosen by them.
130 | 
131 |   k. Sui Generis Database Rights means rights other than copyright
132 |      resulting from Directive 96/9/EC of the European Parliament and of
133 |      the Council of 11 March 1996 on the legal protection of databases,
134 |      as amended and/or succeeded, as well as other essentially
135 |      equivalent rights anywhere in the world.
136 | 
137 |   l. You means the individual or entity exercising the Licensed Rights
138 |      under this Public License. Your has a corresponding meaning.
139 | 
140 | Section 2 -- Scope.
141 | 
142 |   a. License grant.
143 | 
144 |        1. Subject to the terms and conditions of this Public License,
145 |           the Licensor hereby grants You a worldwide, royalty-free,
146 |           non-sublicensable, non-exclusive, irrevocable license to
147 |           exercise the Licensed Rights in the Licensed Material to:
148 | 
149 |             a. reproduce and Share the Licensed Material, in whole or
150 |                in part, for NonCommercial purposes only; and
151 | 
152 |             b. produce, reproduce, and Share Adapted Material for
153 |                NonCommercial purposes only.
154 | 
155 |        2. Exceptions and Limitations. For the avoidance of doubt, where
156 |           Exceptions and Limitations apply to Your use, this Public
157 |           License does not apply, and You do not need to comply with
158 |           its terms and conditions.
159 | 
160 |        3. Term. The term of this Public License is specified in Section
161 |           6(a).
162 | 
163 |        4. Media and formats; technical modifications allowed. The
164 |           Licensor authorizes You to exercise the Licensed Rights in
165 |           all media and formats whether now known or hereafter created,
166 |           and to make technical modifications necessary to do so. The
167 |           Licensor waives and/or agrees not to assert any right or
168 |           authority to forbid You from making technical modifications
169 |           necessary to exercise the Licensed Rights, including
170 |           technical modifications necessary to circumvent Effective
171 |           Technological Measures. For purposes of this Public License,
172 |           simply making modifications authorized by this Section 2(a)
173 |           (4) never produces Adapted Material.
174 | 
175 |        5. Downstream recipients.
176 | 
177 |             a. Offer from the Licensor -- Licensed Material. Every
178 |                recipient of the Licensed Material automatically
179 |                receives an offer from the Licensor to exercise the
180 |                Licensed Rights under the terms and conditions of this
181 |                Public License.
182 | 
183 |             b. No downstream restrictions. You may not offer or impose
184 |                any additional or different terms or conditions on, or
185 |                apply any Effective Technological Measures to, the
186 |                Licensed Material if doing so restricts exercise of the
187 |                Licensed Rights by any recipient of the Licensed
188 |                Material.
189 | 
190 |        6. No endorsement. Nothing in this Public License constitutes or
191 |           may be construed as permission to assert or imply that You
192 |           are, or that Your use of the Licensed Material is, connected
193 |           with, or sponsored, endorsed, or granted official status by,
194 |           the Licensor or others designated to receive attribution as
195 |           provided in Section 3(a)(1)(A)(i).
196 | 
197 |   b. Other rights.
198 | 
199 |        1. Moral rights, such as the right of integrity, are not
200 |           licensed under this Public License, nor are publicity,
201 |           privacy, and/or other similar personality rights; however, to
202 |           the extent possible, the Licensor waives and/or agrees not to
203 |           assert any such rights held by the Licensor to the limited
204 |           extent necessary to allow You to exercise the Licensed
205 |           Rights, but not otherwise.
206 | 
207 |        2. Patent and trademark rights are not licensed under this
208 |           Public License.
209 | 
210 |        3. To the extent possible, the Licensor waives any right to
211 |           collect royalties from You for the exercise of the Licensed
212 |           Rights, whether directly or through a collecting society
213 |           under any voluntary or waivable statutory or compulsory
214 |           licensing scheme. In all other cases the Licensor expressly
215 |           reserves any right to collect such royalties, including when
216 |           the Licensed Material is used other than for NonCommercial
217 |           purposes.
218 | 
219 | Section 3 -- License Conditions.
220 | 
221 | Your exercise of the Licensed Rights is expressly made subject to the
222 | following conditions.
223 | 
224 |   a. Attribution.
225 | 
226 |        1. If You Share the Licensed Material (including in modified
227 |           form), You must:
228 | 
229 |             a. retain the following if it is supplied by the Licensor
230 |                with the Licensed Material:
231 | 
232 |                  i. identification of the creator(s) of the Licensed
233 |                     Material and any others designated to receive
234 |                     attribution, in any reasonable manner requested by
235 |                     the Licensor (including by pseudonym if
236 |                     designated);
237 | 
238 |                 ii. a copyright notice;
239 | 
240 |                iii. a notice that refers to this Public License;
241 | 
242 |                 iv. a notice that refers to the disclaimer of
243 |                     warranties;
244 | 
245 |                  v. a URI or hyperlink to the Licensed Material to the
246 |                     extent reasonably practicable;
247 | 
248 |             b. indicate if You modified the Licensed Material and
249 |                retain an indication of any previous modifications; and
250 | 
251 |             c. indicate the Licensed Material is licensed under this
252 |                Public License, and include the text of, or the URI or
253 |                hyperlink to, this Public License.
254 | 
255 |        2. You may satisfy the conditions in Section 3(a)(1) in any
256 |           reasonable manner based on the medium, means, and context in
257 |           which You Share the Licensed Material. For example, it may be
258 |           reasonable to satisfy the conditions by providing a URI or
259 |           hyperlink to a resource that includes the required
260 |           information.
261 | 
262 |        3. If requested by the Licensor, You must remove any of the
263 |           information required by Section 3(a)(1)(A) to the extent
264 |           reasonably practicable.
265 | 
266 |        4. If You Share Adapted Material You produce, the Adapter's
267 |           License You apply must not prevent recipients of the Adapted
268 |           Material from complying with this Public License.
269 | 
270 | Section 4 -- Sui Generis Database Rights.
271 | 
272 | Where the Licensed Rights include Sui Generis Database Rights that
273 | apply to Your use of the Licensed Material:
274 | 
275 |   a. for the avoidance of doubt, Section 2(a)(1) grants You the right
276 |      to extract, reuse, reproduce, and Share all or a substantial
277 |      portion of the contents of the database for NonCommercial purposes
278 |      only;
279 | 
280 |   b. if You include all or a substantial portion of the database
281 |      contents in a database in which You have Sui Generis Database
282 |      Rights, then the database in which You have Sui Generis Database
283 |      Rights (but not its individual contents) is Adapted Material; and
284 | 
285 |   c. You must comply with the conditions in Section 3(a) if You Share
286 |      all or a substantial portion of the contents of the database.
287 | 
288 | For the avoidance of doubt, this Section 4 supplements and does not
289 | replace Your obligations under this Public License where the Licensed
290 | Rights include other Copyright and Similar Rights.
291 | 
292 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
293 | 
294 |   a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
295 |      EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
296 |      AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
297 |      ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
298 |      IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
299 |      WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
300 |      PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
301 |      ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
302 |      KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
303 |      ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
304 | 
305 |   b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
306 |      TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
307 |      NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
308 |      INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
309 |      COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
310 |      USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
311 |      ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
312 |      DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
313 |      IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
314 | 
315 |   c. The disclaimer of warranties and limitation of liability provided
316 |      above shall be interpreted in a manner that, to the extent
317 |      possible, most closely approximates an absolute disclaimer and
318 |      waiver of all liability.
319 | 
320 | Section 6 -- Term and Termination.
321 | 
322 |   a. This Public License applies for the term of the Copyright and
323 |      Similar Rights licensed here. However, if You fail to comply with
324 |      this Public License, then Your rights under this Public License
325 |      terminate automatically.
326 | 
327 |   b. Where Your right to use the Licensed Material has terminated under
328 |      Section 6(a), it reinstates:
329 | 
330 |        1. automatically as of the date the violation is cured, provided
331 |           it is cured within 30 days of Your discovery of the
332 |           violation; or
333 | 
334 |        2. upon express reinstatement by the Licensor.
335 | 
336 |      For the avoidance of doubt, this Section 6(b) does not affect any
337 |      right the Licensor may have to seek remedies for Your violations
338 |      of this Public License.
339 | 
340 |   c. For the avoidance of doubt, the Licensor may also offer the
341 |      Licensed Material under separate terms or conditions or stop
342 |      distributing the Licensed Material at any time; however, doing so
343 |      will not terminate this Public License.
344 | 
345 |   d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
346 |      License.
347 | 
348 | Section 7 -- Other Terms and Conditions.
349 | 
350 |   a. The Licensor shall not be bound by any additional or different
351 |      terms or conditions communicated by You unless expressly agreed.
352 | 
353 |   b. Any arrangements, understandings, or agreements regarding the
354 |      Licensed Material not stated herein are separate from and
355 |      independent of the terms and conditions of this Public License.
356 | 
357 | Section 8 -- Interpretation.
358 | 
359 |   a. For the avoidance of doubt, this Public License does not, and
360 |      shall not be interpreted to, reduce, limit, restrict, or impose
361 |      conditions on any use of the Licensed Material that could lawfully
362 |      be made without permission under this Public License.
363 | 
364 |   b. To the extent possible, if any provision of this Public License is
365 |      deemed unenforceable, it shall be automatically reformed to the
366 |      minimum extent necessary to make it enforceable. If the provision
367 |      cannot be reformed, it shall be severed from this Public License
368 |      without affecting the enforceability of the remaining terms and
369 |      conditions.
370 | 
371 |   c. No term or condition of this Public License will be waived and no
372 |      failure to comply consented to unless expressly agreed to by the
373 |      Licensor.
374 | 
375 |   d. Nothing in this Public License constitutes or may be interpreted
376 |      as a limitation upon, or waiver of, any privileges and immunities
377 |      that apply to the Licensor or You, including from the legal
378 |      processes of any jurisdiction or authority.
379 | 
380 | =======================================================================
381 | 
382 | Creative Commons is not a party to its public
383 | licenses. Notwithstanding, Creative Commons may elect to apply one of
384 | its public licenses to material it publishes and in those instances
385 | will be considered the “Licensor.” The text of the Creative Commons
386 | public licenses is dedicated to the public domain under the CC0 Public
387 | Domain Dedication. Except for the limited purpose of indicating that
388 | material is shared under a Creative Commons public license or as
389 | otherwise permitted by the Creative Commons policies published at
390 | creativecommons.org/policies, Creative Commons does not authorize the
391 | use of the trademark "Creative Commons" or any other trademark or logo
392 | of Creative Commons without its prior written consent including,
393 | without limitation, in connection with any unauthorized modifications
394 | to any of its public licenses or any other arrangements,
395 | understandings, or agreements concerning use of licensed material. For
396 | the avoidance of doubt, this paragraph does not form part of the
397 | public licenses.
398 | 
399 | Creative Commons may be contacted at creativecommons.org.
400 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Audio2BodyDynamics
 2 | 
 3 | ## Introduction
 4 | This repository contains the code to predict skeleton movements that correspond to music, published in:
 5 | * [Audio To Body   Dynamics](http://openaccess.thecvf.com/content_cvpr_2018/papers/Shlizerman_Audio_to_Body_CVPR_2018_paper.pdf), CVPR 2018
 6 | * Project Page https://arviolin.github.io/AudioBodyDynamics/
 7 | 
 8 | ## Abstract
 9 | We present a method that gets as input an audio of violin
10 | or piano playing, and outputs a video of skeleton predictions
11 | which are further used to animate an avatar. The key
12 | idea is to create an animation of an avatar that moves their
13 | hands similarly to how a pianist or violinist would do, just
14 | from audio. Notably, it’s not clear if body movement can
15 | be predicted from music at all and our aim in this work is
16 | to explore this possibility. In this paper, we present the first
17 | result that shows that natural body dynamics can be predicted.
18 | We built an LSTM network that is trained on violin
19 | and piano recital videos uploaded to the Internet. The predicted
20 | points are applied onto a rigged avatar to create the
21 | animation
22 | 
23 | ## Predicted Skeleton Video
24 | ![Predicted Skeleton Video](Audio2BodyPrediction.gif)
25 | 
26 | ## Getting Started
27 | 
28 | * Install requirements by running: `pip install -r requirements.txt`
29 | * Download [ffmpeg](https://www.ffmpeg.org/download.html) to enable visualization
30 | * This repository contains starter data in the **data** folder. We provide json files formatted as follows
31 |   * Naming convention - {**split**}_{**body part**}.json
32 |   * **video_id**  : **(audio mfcc features, keypoints)**
33 |   * keypoints : NxC where N is the number of frames and C is the number of keypoints
34 |   * audio mfcc features : NxD where N is the number of frames and D is the number of MFCC Features
35 | 
36 | 
37 | ## Training Instructions for training on All Keypoints together
38 | 
39 | * Run python **pytorch\_A2B_dynamics.py --help** for argument list
40 | * For training
41 |   * python pytorch\_A2B_dynamics.py --logfldr {...} --data data/train_all.json --device {...} ...
42 |   * See run_pipeline.sh for an example
43 | * For testing - generates video from test model
44 |   * python pytorch\_A2B\_dynamics.py --test_model {...} --logfldr {...} --data test_all.json --device {...} ... --audio_file {...} --batch_size 1
45 |    * See run_pipeline.sh for an example
46 |    * **NB** : Testing is constrained to 1 video at a time. We restrict batch size to 1 for the test video and proceed to generate the whole test sequence at once instead of breaking it up.
47 | 
48 | ## Training Instructions for separate training of Body, Lefthand and Righthand
49 | 
50 | * We expose data and functionality for training and testing on key-points of  individual parts of the body and stitching the final results into a single video.
51 | * **sh run_pipeline.sh**
52 | * Outputs are by default logged to **$HOME/logfldr**
53 | 
54 | ## Other Quirks
55 | *  Checkpointing saves training data statistics for use in testing.  
56 | *  Modify FFMPEG_LOC in visualize.py to specify the path to ffmpeg.
57 |   * Set the --visualize flag to turn off visualization during testing
58 | *  The losses observed for provided data are different from those reported in the paper due to different resolution of train and test images used for this dataset.
59 | 
60 | ## Citation
61 | 
62 | Please cite the [Audio To Body Dynamics paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Shlizerman_Audio_to_Body_CVPR_2018_paper.pdf) if you use this code:
63 | ```
64 | @inproceedings{shlizerman2018audio,
65 |   title={Audio to body dynamics},
66 |   author={Shlizerman, Eli and Dery, Lucio and Schoen, Hayden and Kemelmacher-Shlizerman, Ira}
67 |   journal={CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition}
68 |   year={2018}
69 | }
70 | ```
71 | 
72 | ## License
73 | Audio2BodyDynamics is Non-Commercial Creative Commons Licensed. Please refer to [LICENSE](LICENSE).
74 | 


--------------------------------------------------------------------------------
/data/README.md:
--------------------------------------------------------------------------------
 1 | ## Train and test data
 2 | Download [data.zip](https://github.com/facebookresearch/Audio2BodyDynamics/releases/download/v1.0/data.zip) containing the files below and extract them to "/data" folder.
 3 | The data.zip file is attached with version 1.0 in the GitHub release.
 4 | 
 5 | data.zip:
 6 | * -logfldr (trained models)
 7 | * --body
 8 | * ---best_model_db.pth
 9 | * --righthand
10 | * ---best_model_db.pth
11 | * --lefthand
12 | * ---best_model_db.pth
13 | 
14 | * -list_of_recital_videos.txt (list of videos used for training)
15 | * -pred_audio.mp4 (sample visualized output generated with trained models)
16 | 
17 | * -test_audio.wav (testing music)
18 | * -test_body.json  (body keypoints for testing)
19 | * -test_righthand.json (righthand keypoints for testing)
20 | * -test_lefthand.json (lefthand keypoints for testing)
21 | 
22 | * -train_body.json (body keypoints for training)
23 | * -train_righthand.json (righthand keypoints for testing)
24 | * -train_lefthand.json (lefthand keypoints for training)
25 | 


--------------------------------------------------------------------------------
/data_utils/data.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | from __future__ import absolute_import, division, print_function, unicode_literals
  9 | import json
 10 | import numpy as np
 11 | from sklearn.decomposition import PCA
 12 | 
 13 | 
 14 | class dataBatcher(object):
 15 | 
 16 |     def __init__(self, data, seq_len, batch_size, delay, shuffle=False):
 17 |         self.cur_batch = 0
 18 |         self.data = data
 19 |         self.seq_len = seq_len
 20 |         self.batch_size = batch_size
 21 |         self.delay = delay
 22 | 
 23 |         self.num_batches = len(self.data[0]) // (self.seq_len * self.batch_size)
 24 |         assert(self.num_batches != 0), 'Size of data must be > time_steps x batch_size'
 25 |         self.indices = range(self.num_batches)
 26 |         if shuffle:
 27 |             self.indices = np.random.permutation(self.num_batches)
 28 | 
 29 |     def hasNext(self):
 30 |         return self.cur_batch < len(self.indices)
 31 | 
 32 |     def correctDimensions(self, arr):
 33 |         num_feats = arr.shape[-1]
 34 |         arr = np.reshape(arr, [self.batch_size, self.seq_len, num_feats])
 35 |         arr = np.transpose(arr, (1, 0, 2))  # LSTM expects seqlen x batchsize x D
 36 |         return arr
 37 | 
 38 |     def delayArray(self, arr, dummy_var=0):
 39 |         arr[self.delay:, :, :] = arr[:(self.seq_len - self.delay), :, :]
 40 |         arr[:self.delay, :, :] = dummy_var
 41 |         return arr
 42 | 
 43 |     def reconstructKeypsOrder(self, arr):
 44 |         arr = np.reshape(arr, [self.seq_len, self.batch_size, -1])
 45 |         arr = arr[self.delay:, :, :]
 46 |         arr = np.transpose(arr, (1, 0, 2))
 47 |         num_pts = arr.shape[2]
 48 |         arr = np.reshape(arr, [-1, num_pts])
 49 |         arr = np.reshape(arr, [-1, 2, num_pts // 2])  # convert to X-Y format
 50 |         return arr
 51 | 
 52 |     def getNext(self):
 53 |         start = self.indices[self.cur_batch] * self.seq_len * self.batch_size
 54 |         end = (self.indices[self.cur_batch] + 1) * self.seq_len * self.batch_size
 55 |         assert((end - start) == (self.seq_len * self.batch_size))
 56 |         cur_aud = np.copy(self.data[0][start: end])
 57 |         cur_keyps = np.copy(self.data[1][start: end])
 58 | 
 59 |         cur_aud = self.correctDimensions(cur_aud)
 60 |         cur_keyps = self.correctDimensions(cur_keyps)
 61 |         cur_keyps = self.delayArray(cur_keyps)
 62 |         cur_keyps = np.reshape(cur_keyps, [-1, cur_keyps.shape[2]])
 63 | 
 64 |         mask = np.ones((self.seq_len * self.batch_size, 1))
 65 |         mask = self.correctDimensions(mask)
 66 |         mask = self.delayArray(mask)
 67 |         mask = np.reshape(mask, [-1, mask.shape[2]])
 68 | 
 69 |         self.cur_batch += 1
 70 |         return cur_aud, cur_keyps, mask
 71 | 
 72 | 
 73 | class DataIterator(object):
 74 | 
 75 |     def __init__(self, args, data_loc, test_mode=False):
 76 |         super(DataIterator, self).__init__()
 77 |         self.test_mode = test_mode
 78 |         self.seq_len = args.time_steps
 79 |         if self.test_mode:
 80 |             assert(args.batch_size == 1), \
 81 |                 'No batching at test time. Run on full sequence.'
 82 |         self.batch_sz = args.batch_size
 83 |         self.delay = args.time_delay
 84 |         val_split = args.val_split if not self.test_mode else 1.0
 85 |         self.loadData(data_loc, val_split, args.upsample_times)
 86 |         if not self.test_mode:
 87 |             if (args.numpca > 0):
 88 |                 self.performPCA(args.numpca)
 89 |             else:
 90 |                 self.pca = None
 91 |             self.getDataStats()
 92 |             self.normalizeDataset()
 93 | 
 94 |     def stateDict(self):
 95 |         state_dict = {}
 96 |         state_dict['pca'] = self.pca
 97 |         state_dict['audio_stats'] = (self.aud_means, self.aud_stds)
 98 |         state_dict['keyps_stats'] = (self.means, self.stds)
 99 |         return state_dict
100 | 
101 |     def loadStateDict(self, state_dict):
102 |         self.pca = state_dict['pca']
103 |         self.aud_means, self.aud_stds = state_dict['audio_stats']
104 |         self.means, self.stds = state_dict['keyps_stats']
105 |         return state_dict
106 | 
107 |     def getPCASeq(self, seq, pca_dim=0, batch_dim=0):
108 |         seq = np.reshape(seq, [self.seq_len, self.batch_sz, -1])
109 |         seq = seq[self.delay:, :, :]
110 |         seq = np.transpose(seq, (1, 0, 2))
111 |         return seq[batch_dim, : , pca_dim]
112 | 
113 |     def processTestData(self, upsample_times=1):
114 |         if self.pca:
115 |             self.val_keyps = self.pca.transform(self.val_keyps)
116 |         self.normalizeDataset()
117 | 
118 |     def loadData(self, data_loc, val_split, upsample_times):
119 |         with open(data_loc, "r+") as fhandle:
120 |             data = json.load(fhandle)
121 |         self.train_audio, self.train_keyps = [], []
122 |         self.val_audio, self.val_keyps = [], []
123 | 
124 |         # Data Format : video_id : (audio_features, body_keypoints, raw_audio)
125 |         # body keypoints may or may not be transformed into fixed reference depending
126 |         # depending on user.
127 |         num_pts_per_batch = self.seq_len
128 | 
129 |         for _, (audio_feats, keyps) in data.items():
130 |             audio_feats = np.array(audio_feats)
131 |             keyps = np.array(keyps)
132 | 
133 |             if (upsample_times > 0):
134 |                 audio_feats = self.upsample(audio_feats, upsample_times)
135 |                 keyps = self.upsample(keyps, upsample_times)
136 | 
137 |             if not self.test_mode:
138 |                 # In Training mode, split the data
139 |                 num_batches = len(audio_feats) // num_pts_per_batch
140 |                 num_train = int(num_batches * (1 - val_split))
141 | 
142 |                 # Throw away extra points
143 |                 audio_feats = audio_feats[:(num_batches * num_pts_per_batch)]
144 |                 keyps = keyps[:(num_batches * num_pts_per_batch)]
145 | 
146 |                 audio_split = np.split(audio_feats, num_batches)
147 |                 keyps_split = np.split(keyps, num_batches)
148 | 
149 |                 # Split into Train and Test
150 |                 train_indices = np.random.choice(num_batches, num_train, replace=False)
151 |                 val_indices = [x for x in range(num_batches) if x not in train_indices]
152 | 
153 |                 for ind in train_indices:
154 |                     assert(len(audio_split[ind]) == num_pts_per_batch)
155 |                     self.train_audio.extend(audio_split[ind])
156 |                     self.train_keyps.extend(keyps_split[ind])
157 | 
158 |                 for ind in val_indices:
159 |                     assert(len(keyps_split[ind]) == num_pts_per_batch)
160 |                     self.val_audio.extend(audio_split[ind])
161 |                     self.val_keyps.extend(keyps_split[ind])
162 |             else:
163 |                 self.val_audio = audio_feats
164 |                 self.val_keyps = keyps
165 | 
166 |                 # Perform Inference on whole video at once
167 |                 self.seq_len = len(self.val_keyps)
168 | 
169 |                 break  # Testing is executed on 1 video at a time.
170 | 
171 |         self.train_audio, self.train_keyps = \
172 |             np.array(self.train_audio), np.array(self.train_keyps)
173 |         self.val_audio, self.val_keyps = \
174 |             np.array(self.val_audio), np.array(self.val_keyps)
175 | 
176 |     def performPCA(self, num_components):
177 |         self.pca = PCA(n_components=num_components)
178 |         self.train_keyps = self.pca.fit_transform(self.train_keyps)
179 |         self.val_keyps = self.pca.transform(self.val_keyps)
180 | 
181 |     def getDataStats(self):
182 |         self.means = self.train_keyps.mean(axis=0)
183 |         self.stds = np.max(self.train_keyps.std(axis=0))
184 | 
185 |         self.aud_means = 0.0
186 |         self.aud_stds = 1.0
187 | 
188 |     def normalizeDataset(self):
189 | 
190 |         def normalize(dataset, mean, std):
191 |             EPSILON = 1E-8
192 |             if not len(dataset):
193 |                 return dataset
194 |             return (dataset - mean) / (std + EPSILON)
195 | 
196 |         self.train_keyps = normalize(self.train_keyps, self.means, self.stds)
197 |         self.val_keyps = normalize(self.val_keyps, self.means, self.stds)
198 | 
199 |         self.train_audio = normalize(self.train_audio, self.aud_means, self.aud_stds)
200 |         self.val_audio = normalize(self.val_audio, self.aud_means, self.aud_stds)
201 | 
202 |     def getInOutDimensions(self):
203 |         return self.val_audio.shape[1], self.val_keyps.shape[1]
204 | 
205 |     def reset(self):
206 |         self.val_iterator = self.createIterator(is_test=True)
207 |         if not self.test_mode:
208 |             self.train_iterator = self.createIterator(is_test=False)
209 | 
210 |     def getNumBatches(self):
211 |         train_batches = len(self.train_keyps) // (self.seq_len * self.batch_sz)
212 |         val_batches = len(self.val_keyps) // (self.seq_len * self.batch_sz)
213 |         return train_batches, val_batches
214 | 
215 |     def createIterator(self, is_test=False):
216 |         dataset = (self.val_audio, self.val_keyps) \
217 |             if is_test else (self.train_audio, self.train_keyps)
218 |         return dataBatcher(dataset, self.seq_len, self.batch_sz, self.delay,
219 |                            shuffle=(not is_test))
220 | 
221 |     def hasNext(self, is_test=False):
222 |         if is_test:
223 |             return self.val_iterator.hasNext()
224 |         else:
225 |             return self.train_iterator.hasNext()
226 | 
227 |     def nextBatch(self, is_test=False):
228 |         if is_test:
229 |             return self.val_iterator.getNext()
230 |         else:
231 |             return self.train_iterator.getNext()
232 | 
233 |     def reconstructKeypsOrder(self, batch):
234 |         return self.val_iterator.reconstructKeypsOrder(batch)
235 | 
236 |     def reconstructAudioOrder(self, batch):
237 |         return self.val_iterator.reconstructAudioOrder(batch)
238 | 
239 |     def toPixelSpace(self, predictions):
240 |         recon = (predictions * self.stds) + self.means
241 |         if self.pca:
242 |             recon = self.pca.inverse_transform(recon)
243 |         return recon
244 | 
245 |     def upsample(self, array, n_times, use_repeat=False):
246 |         if not len(array):
247 |             return array
248 |         result = array
249 |         for _ in range(n_times):
250 |             if use_repeat:
251 |                 result = np.repeat(result, 2, axis=0)
252 |                 result = result[:-1]
253 |             else:
254 |                 n_examples, n_feats = result.shape
255 |                 new_arry = np.zeros((n_examples * 2 - 1, n_feats))
256 |                 new_arry[0::2, :] = result
257 |                 new_arry[1::2, :] = (result[1:, :] + result[:-1, :]) / 2.0
258 |                 result = new_arry
259 |         return result
260 | 


--------------------------------------------------------------------------------
/data_utils/transform_keypoints.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | """
  9 | Code for transforming keypoints to fixed reference frame in order to isolate
 10 | motion due to music.
 11 | """
 12 | from __future__ import absolute_import, division, print_function, unicode_literals
 13 | import numpy as np
 14 | 
 15 | MIN_VALID_PTS = 2  # The minimum number of valid points to be
 16 | 
 17 | 
 18 | def normalizePts(pts):
 19 |     N = pts.shape[0]
 20 |     cent = np.mean(pts, axis=0)
 21 |     ptsNorm = pts - cent
 22 |     sumOfPointDistancesFromOriginSquared = np.sum(np.power(ptsNorm[:, 0:2], 2))
 23 |     if sumOfPointDistancesFromOriginSquared > 0:
 24 |         scaleFactor = \
 25 |             np.sqrt(2 * N) / np.sqrt(sumOfPointDistancesFromOriginSquared)
 26 |     else:
 27 |         scaleFactor = 1
 28 | 
 29 |     ptsNorm = ptsNorm * scaleFactor
 30 | 
 31 |     normMtxInv = np.array([[1 / scaleFactor, 0, 0],
 32 |                            [0, 1 / scaleFactor, 0],
 33 |                            [cent[0], cent[1], 1]])
 34 | 
 35 |     return ptsNorm, normMtxInv
 36 | 
 37 | 
 38 | def transformPtsWithT(pts, T):
 39 |     if pts.ndim != 2:
 40 |         raise Exception("Must 2-D array")
 41 |     newPts = np.zeros(pts.shape)
 42 |     newPts[:, 0] = (T[0, 0] * pts[:, 0]) + (T[1, 0] * pts[:, 1]) + T[2, 0]
 43 |     newPts[:, 1] = (T[0, 1] * pts[:, 0]) + (T[1, 1] * pts[:, 1]) + T[2, 1]
 44 |     return newPts
 45 | 
 46 | 
 47 | def alignKeypoints(keypoints, reference=None, keyptstouse=None, confthresh=None):
 48 |     can_transform = (reference is not None) and (keyptstouse is not None)
 49 |     if can_transform:
 50 |         pts = keypoints[keyptstouse]
 51 |         valid_ind = np.where(pts[:, 2] > confthresh)[0]
 52 |         if (len(valid_ind) >= MIN_VALID_PTS):
 53 |             fixed_pts = reference[keyptstouse][valid_ind]
 54 |             valid_keyps = pts[valid_ind]
 55 |             try:
 56 |                 transform = findNonreflectiveSimilarity(valid_keyps, fixed_pts)
 57 |                 alignedKeypoints = transformPtsWithT(keypoints, transform)
 58 |             except Exception as e:
 59 |                 print(e)
 60 |                 transform = np.zeros((3, 3))
 61 |                 transform[0, 0] = transform[1, 1] = 1
 62 |                 alignedKeypoints = keypoints
 63 | 
 64 |     else:
 65 |         transform = np.zeros((3, 3))
 66 |         transform[0, 0] = transform[1, 1] = 1
 67 |         alignedKeypoints = keypoints
 68 |     return alignedKeypoints, transform
 69 | 
 70 | 
 71 | def findNonreflectiveSimilarity(src, dst):
 72 |     src, normMatrix1 = normalizePts(src)
 73 |     dst, normMatrix2 = normalizePts(dst)
 74 | 
 75 |     minRequiredNonCollinearPairs = 2
 76 |     M = dst.shape[0]
 77 | 
 78 |     x = np.expand_dims(dst[:, 0], axis=1)
 79 |     y = np.expand_dims(dst[:, 1], axis=1)
 80 |     X = np.concatenate((np.concatenate((x, y, np.ones((M, 1)), np.zeros((M, 1))), axis=1),
 81 |                         np.concatenate((y, -x, np.zeros((M, 1)), np.ones((M, 1))), axis=1)), axis=0)
 82 | 
 83 |     u = np.expand_dims(src[:, 0], axis=1)
 84 |     v = np.expand_dims(src[:, 1], axis=1)
 85 |     U = np.concatenate((u, v), axis=0)
 86 | 
 87 |     # We know that X * r = U
 88 |     if np.linalg.matrix_rank(X) >= 2 * minRequiredNonCollinearPairs:
 89 |         r, _, _, _ = np.linalg.lstsq(X, U)
 90 |     else:
 91 |         raise ValueError('images:geotrans:requiredNonCollinearPoints',
 92 |                          minRequiredNonCollinearPairs, 'nonreflectivesimilarity')
 93 | 
 94 |     sc = float(r[0])
 95 |     ss = float(r[1])
 96 |     tx = float(r[2])
 97 |     ty = float(r[3])
 98 | 
 99 |     Tinv = np.array([[sc, -ss, 0],
100 |                      [ss, sc, 0],
101 |                      [tx, ty, 1]])
102 | 
103 |     Tinv = np.linalg.solve(normMatrix2, np.dot(Tinv, normMatrix1))
104 |     T = np.linalg.inv(Tinv)
105 |     T[:, 2] = np.array([0, 0, 1])
106 | 
107 |     return T
108 | 
109 | 
110 | def buildRotT(sina):
111 |     cosa = np.sqrt(1 - sina ** 2)
112 |     T = np.array([[cosa, -sina, 0],
113 |                   [sina, cosa, 0],
114 |                   [0, 0, 1]])
115 |     return T
116 | 


--------------------------------------------------------------------------------
/generate_stitched_video.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | 
 8 | from __future__ import absolute_import, division, print_function, unicode_literals
 9 | import json
10 | import numpy as np
11 | from visualize import visualizeKeypoints
12 | import argparse
13 | 
14 | '''
15 | Example Run
16 | 
17 | python generateStitchedVideo.py --vidtype piano --body_path final_body/body_data.json
18 |     --righthand_path final_righthand/righthand_data.json
19 |     --lefthand_path final_lefthand/lefthand_data.json
20 |     --vid_path viz/all_pts.mp4 --pred_path viz/just_pred.mp4
21 |     --audio_path audio.wav
22 | '''
23 | 
24 | 
25 | def stitchHandsToBody(body_keyps, rh_keyps, lh_keyps):
26 |     body_keyps = np.array(body_keyps)
27 |     rh_keyps = np.array(rh_keyps)
28 |     lh_keyps = np.array(lh_keyps)
29 |     for ind, pts in enumerate(body_keyps):
30 |         rh_diff = np.expand_dims(pts[:, 2] - rh_keyps[ind][:, 0], axis=1)
31 |         lh_diff = np.expand_dims(pts[:, 5] - lh_keyps[ind][:, 0], axis=1)
32 |         rh_keyps[ind] = rh_keyps[ind] + rh_diff
33 |         lh_keyps[ind] = lh_keyps[ind] + lh_diff
34 |     return body_keyps, rh_keyps, lh_keyps
35 | 
36 | 
37 | def createOptions():
38 |     # Default configuration for PianoNet
39 |     parser = argparse.ArgumentParser(
40 |         description="Pytorch: Audio To Body Dynamics Model"
41 |     )
42 |     parser.add_argument("--body_path", type=str, default="body_data.json",
43 |                         help="Path to body keypoints")
44 |     parser.add_argument("--righthand_path", type=str, default="righthand_data.json",
45 |                         help="Path to righthand keypoints")
46 |     parser.add_argument("--lefthand_path", type=str, default="lefthand_data.json",
47 |                         help="Path to righthand keypoints")
48 |     parser.add_argument("--gt_path", type=str, default="ground_truth.mp4",
49 |                         help="Where to save the ground_truth video.")
50 |     parser.add_argument("--pred_path", type=str, default="predictions.mp4",
51 |                         help="Where to save the resulting prediction video.")
52 |     parser.add_argument("--vidtype", type=str, default='piano',
53 |                         help="Type of video whether piano or violin")
54 |     parser.add_argument("--audio_path", type=str, default=None,
55 |                         help="Only in for Test. Location audio file for"
56 |                              " generating test video")
57 |     args = parser.parse_args()
58 |     return args
59 | 
60 | 
61 | def main():
62 |     args = createOptions()
63 |     body = json.load(open(args.body_path, 'r+'))
64 |     righthand = json.load(open(args.righthand_path, 'r+'))
65 |     lefthand = json.load(open(args.lefthand_path, 'r+'))
66 | 
67 |     all_pred_pts = np.concatenate(stitchHandsToBody(body[1], righthand[1],
68 |                                   lefthand[1]), axis=2)
69 |     all_targ_pts = np.concatenate((body[0], righthand[0], lefthand[0]), axis=2)
70 | 
71 |     # Just Gt
72 |     visualizeKeypoints(args.vidtype, all_targ_pts, all_pred_pts,
73 |                        args.audio_path, args.gt_path, show_pred=False)
74 | 
75 |     # Just Pred
76 |     visualizeKeypoints(args.vidtype, all_targ_pts, all_pred_pts, args.audio_path,
77 |                        args.pred_path, show_gt=False)
78 | 
79 | 
80 | if __name__ == '__main__':
81 |     main()
82 | 


--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | from __future__ import absolute_import
 8 | from __future__ import division
 9 | from __future__ import print_function
10 | from __future__ import unicode_literals
11 | 
12 | import torch
13 | import torch.nn as nn
14 | import torch.nn.init as init
15 | from torch.autograd import Variable
16 | 
17 | 
18 | class AudioToKeypointRNN(nn.Module):
19 | 
20 |     def __init__(self, options):
21 |         super(AudioToKeypointRNN, self).__init__()
22 | 
23 |         # Instantiating the model
24 |         self.init = None
25 | 
26 |         hidden_dim = options['hidden_dim']
27 |         if options['trainable_init']:
28 |             device = options['device']
29 |             batch_sz = options['batch_size']
30 |             # Create the trainable initial state
31 |             h_init = \
32 |                 init.constant_(torch.empty(1, batch_sz, hidden_dim, device=device), 0.0)
33 |             c_init = \
34 |                 init.constant_(torch.empty(1, batch_sz, hidden_dim, device=device), 0.0)
35 |             h_init = Variable(h_init, requires_grad=True)
36 |             c_init = Variable(c_init, requires_grad=True)
37 |             self.init = (h_init, c_init)
38 | 
39 |         # Declare the model
40 |         self.lstm = nn.LSTM(options['input_dim'], hidden_dim, 1)
41 |         self.dropout = nn.Dropout(options['dropout'])
42 |         self.fc = nn.Linear(hidden_dim, options['output_dim'])
43 | 
44 |         self.initialize()
45 | 
46 |     def initialize(self):
47 |         # Initialize LSTM Weights and Biases
48 |         for layer in self.lstm._all_weights:
49 |             for param_name in layer:
50 |                 if 'weight' in param_name:
51 |                     weight = getattr(self.lstm, param_name)
52 |                     init.xavier_normal_(weight.data)
53 |                 else:
54 |                     bias = getattr(self.lstm, param_name)
55 |                     init.uniform_(bias.data, 0.25, 0.5)
56 | 
57 |         # Initialize FC
58 |         init.xavier_normal_(self.fc.weight.data)
59 |         init.constant_(self.fc.bias.data, 0)
60 | 
61 |     def forward(self, inputs):
62 |         # perform the Forward pass of the model
63 |         output, (h_n, c_n) = self.lstm(inputs, self.init)
64 |         output = output.view(-1, output.size()[-1])  # flatten before FC
65 |         dped_output = self.dropout(output)
66 |         predictions = self.fc(dped_output)
67 |         return predictions
68 | 


--------------------------------------------------------------------------------
/pytorch_A2B_dynamics.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | from __future__ import unicode_literals
 11 | 
 12 | import torch
 13 | from torch import optim
 14 | from model import AudioToKeypointRNN
 15 | from torch.autograd import Variable
 16 | import matplotlib.pyplot as plt
 17 | import os
 18 | import argparse
 19 | import logging
 20 | import numpy as np
 21 | import json
 22 | from data_utils.data import DataIterator
 23 | from visualize import visualizeKeypoints
 24 | 
 25 | '''
 26 | This script takes an input of audio MFCC features and uses
 27 | an LSTM recurrent neural network to learn to predict
 28 | body joints coordinates
 29 | '''
 30 | 
 31 | logging.basicConfig()
 32 | log = logging.getLogger("mannerisms_rnn")
 33 | log.setLevel(logging.DEBUG)
 34 | torch.manual_seed(1234)
 35 | np.random.seed(1234)
 36 | 
 37 | 
 38 | class AudoToBodyDynamics(object):
 39 | 
 40 |     def __init__(self, args, data_locs, is_test=False):
 41 |         super(AudoToBodyDynamics, self).__init__()
 42 | 
 43 |         self.is_test_mode = is_test
 44 |         self.data_iterator = DataIterator(args, data_locs, test_mode=is_test)
 45 | 
 46 |         # Refresh data configuration from checkpoint
 47 |         if self.is_test_mode:
 48 |             self.loadDataCheckpoint(args.test_model, args.upsample_times)
 49 | 
 50 |         input_dim, output_dim = self.data_iterator.getInOutDimensions()
 51 | 
 52 |         # construct the model
 53 |         model_options = {
 54 |             'device': args.device,
 55 |             'dropout': args.dp,
 56 |             'batch_size': args.batch_size,
 57 |             'hidden_dim': args.hidden_size,
 58 |             'input_dim': input_dim,
 59 |             'output_dim': output_dim,
 60 |             'trainable_init': args.trainable_init
 61 |         }
 62 |         self.vidtype = args.vidtype
 63 |         self.device = args.device
 64 |         self.log_frequency = args.log_frequency
 65 |         self.upsample_times = args.upsample_times
 66 |         self.model = AudioToKeypointRNN(model_options).cuda(args.device)
 67 |         self.optim = optim.Adam(self.model.parameters(), lr=args.lr)
 68 | 
 69 |         # Load checkpoint model
 70 |         if self.is_test_mode:
 71 |             self.loadModelCheckpoint(args.test_model)
 72 | 
 73 |     def buildLoss(self, rnn_out, target, mask):
 74 |         square_diff = (rnn_out - target)**2
 75 |         out = torch.sum(square_diff, 1, keepdim=True)
 76 |         masked_out = out * mask
 77 |         return torch.mean(masked_out), masked_out
 78 | 
 79 |     def saveModel(self, state_info, path):
 80 |         torch.save(state_info, path)
 81 | 
 82 |     def loadModelCheckpoint(self, path):
 83 |         checkpoint = torch.load(path)
 84 |         self.model.load_state_dict(checkpoint['model_state_dict'])
 85 |         self.optim.load_state_dict(checkpoint['optim_state_dict'])
 86 | 
 87 |     def loadDataCheckpoint(self, path, upsample_times):
 88 |         checkpoint = torch.load(path)
 89 |         self.data_iterator.loadStateDict(checkpoint['data_state_dict'])
 90 |         self.data_iterator.processTestData(upsample_times=upsample_times)
 91 | 
 92 |     def runNetwork(self, validate=False):
 93 |         def to_numpy(x):
 94 |             return x.cpu().data.numpy()
 95 | 
 96 |         # Set up inputs to the network
 97 |         batch_info = self.data_iterator.nextBatch(is_test=validate)
 98 |         in_batch, out_batch, mask_batch = batch_info
 99 |         inputs = Variable(torch.FloatTensor(in_batch).to(self.device))
100 |         targets = Variable(torch.FloatTensor(out_batch).to(self.device))
101 |         masks = Variable(torch.FloatTensor(mask_batch).to(self.device))
102 | 
103 |         # Run the network
104 |         predictions = self.model.forward(inputs)
105 | 
106 |         # Get loss in pca coefficient space
107 |         loss, _ = self.buildLoss(predictions, targets, masks)
108 | 
109 |         # Get loss in pixel space
110 |         pixel_predictions = self.data_iterator.toPixelSpace(to_numpy(predictions))
111 |         pixel_predictions = torch.FloatTensor(pixel_predictions).to(self.device)
112 | 
113 |         pixel_targets = self.data_iterator.toPixelSpace(out_batch)
114 |         pixel_targets = torch.FloatTensor(pixel_targets).to(self.device)
115 |         _, frame_loss = self.buildLoss(pixel_predictions, pixel_targets, masks)
116 | 
117 |         frame_loss = frame_loss / pixel_targets.size()[1]
118 |         # Gives the average deviation of prediction from target pixel
119 |         pixel_loss = torch.mean(torch.sqrt(frame_loss))
120 | 
121 |         return (to_numpy(predictions), to_numpy(targets)), loss, pixel_loss
122 | 
123 |     def runEpoch(self):
124 |         pixel_losses, coeff_losses = [], []
125 |         val_pix_losses, val_coeff_losses = [], []
126 |         predictions, targets = [], []
127 | 
128 |         while (not self.is_test_mode and self.data_iterator.hasNext(is_test=False)):
129 |             self.model.train()
130 |             _, pca_coeff_loss, pixel_loss = self.runNetwork(validate=False)
131 |             self.optim.zero_grad()
132 |             pca_coeff_loss.backward()
133 |             self.optim.step()
134 | 
135 |             pca_coeff_loss = pca_coeff_loss.data.tolist()
136 |             pixel_loss = pixel_loss.data.tolist()
137 |             pixel_losses.append(pixel_loss)
138 |             coeff_losses.append(pca_coeff_loss)
139 | 
140 |         while(self.data_iterator.hasNext(is_test=True)):
141 |             self.model.eval()
142 |             vis_data, pca_coeff_loss, pixel_loss = self.runNetwork(validate=True)
143 |             pca_coeff_loss = pca_coeff_loss.data.tolist()
144 |             pixel_loss = pixel_loss.data.tolist()
145 | 
146 |             val_pix_losses.append(pixel_loss)
147 |             val_coeff_losses.append(pca_coeff_loss)
148 | 
149 |             predictions.append(vis_data[0])
150 |             targets.append(vis_data[1])
151 | 
152 |         train_info = (pixel_losses, coeff_losses)
153 |         val_info = (val_pix_losses, val_coeff_losses)
154 |         return train_info, val_info, predictions, targets
155 | 
156 |     def trainModel(self, max_epochs, logfldr, patience):
157 |         log.debug("Training model")
158 |         epoch_losses = []
159 |         batch_losses = []
160 |         val_losses = []
161 |         i, best_loss, iters_without_improvement = 0, float('inf'), 0
162 |         best_train_loss, best_val_loss = float('inf'), float('inf')
163 | 
164 |         while(i < max_epochs):
165 |             i += 1
166 |             self.data_iterator.reset()
167 |             iter_train, iter_val, predictions, targets = self.runEpoch()
168 |             iter_mean = np.mean(iter_train[0]), np.mean(iter_train[1])
169 |             iter_val_mean = np.mean(iter_val[0]), np.mean(iter_val[1])
170 | 
171 |             epoch_losses.append(iter_mean)
172 |             batch_losses.extend(iter_train)
173 |             val_losses.append(iter_val_mean)
174 | 
175 |             log.info("Epoch {} / {}".format(i, max_epochs))
176 |             log.info("Training Loss (1980 x 1080): {}".format(iter_mean))
177 |             log.info("Validation Loss (1980 x 1080): {}".format(iter_val_mean))
178 | 
179 |             improved = iter_val_mean[1] < best_loss
180 |             if improved:
181 |                 best_loss = iter_val_mean[1]
182 |                 best_val_loss = iter_val_mean
183 |                 best_train_loss = iter_mean
184 |                 iters_without_improvement = 0
185 |             else:
186 |                 iters_without_improvement += 1
187 |                 if iters_without_improvement >= patience:
188 |                     log.info("Stopping Early because no improvment in {}".format(
189 |                         iters_without_improvement))
190 |                     break
191 |             if improved or (i % self.log_frequency) == 0:
192 |                 # Save the model information
193 |                 path = os.path.join(logfldr, "Epoch_{}".format(i))
194 |                 os.makedirs(path)
195 |                 path = os.path.join(path, "model_db.pth")
196 |                 state_info = {
197 |                     'epoch': i,
198 |                     'epoch_losses': epoch_losses,
199 |                     'batch_losses': batch_losses,
200 |                     'validation_losses': val_losses,
201 |                     'model_state_dict': self.model.state_dict(),
202 |                     'optim_state_dict': self.optim.state_dict(),
203 |                     'data_state_dict': self.data_iterator.stateDict()
204 |                 }
205 |                 self.saveModel(state_info, path)
206 |                 if improved:
207 |                     path = os.path.join(logfldr, "best_model_db.pth")
208 |                     self.saveModel(state_info, path)
209 | 
210 |                 # Visualize the PCA Coefficients
211 |                 num_vis = min(3, targets[0].shape[-1])
212 |                 for j in range(num_vis):
213 |                     save_path = os.path.join(
214 |                         logfldr, "Epoch_{}/pca_{}.png".format(i, j))
215 |                     self.visualizePCA(predictions[0], targets[0], j, save_path)
216 | 
217 |         self.plotResults(logfldr, epoch_losses, batch_losses, val_losses)
218 |         return best_train_loss, best_val_loss
219 | 
220 |     def formatVizArrays(self, predictions, targets):
221 |         final_pred, final_targ = [], []
222 |         for ind, pred in enumerate(predictions):
223 |             pred = self.data_iterator.toPixelSpace(pred)
224 |             targ = self.data_iterator.toPixelSpace(targets[ind])
225 |             pred = self.data_iterator.reconstructKeypsOrder(pred)
226 |             targ = self.data_iterator.reconstructKeypsOrder(targ)
227 |             final_pred.append(pred)
228 |             final_targ.append(targ)
229 | 
230 |         final_pred, final_targ = np.vstack(final_pred), np.vstack(final_targ)
231 |         final_pred = final_pred[0::(2**self.upsample_times)]
232 |         final_targ = final_targ[0::(2**self.upsample_times)]
233 | 
234 |         return final_pred, final_targ
235 | 
236 |     def visualizePCA(self, preds, targets, pca_dim, save_path):
237 |         preds = self.data_iterator.getPCASeq(preds, pca_dim=pca_dim)
238 |         targs = self.data_iterator.getPCASeq(targets, pca_dim=pca_dim)
239 |         assert(len(preds) == len(targs))
240 |         plt.plot(preds, color='red', label='Predictions')
241 |         plt.plot(targs, color='green', label='Ground Truth')
242 |         plt.legend()
243 |         plt.savefig(save_path)
244 |         plt.close()
245 | 
246 |     def plotResults(self, logfldr, epoch_losses, batch_losses, val_losses):
247 |         losses = [epoch_losses, batch_losses, val_losses]
248 |         names = [
249 |             ["Epoch pixel losses", "Epoch coeff losses"],
250 |             ["Batch pixel losses", "Batch coeff losses"],
251 |             ["Val pixel losses", "Val coeff losses"]]
252 |         _, ax = plt.subplots(nrows=len(losses), ncols=2)
253 |         for index, pair in enumerate(zip(losses, names)):
254 |             for i in range(2):
255 |                 data = [pair[0][j][i] for j in range(len(pair[0]))]
256 |                 ax[index][i].plot(data, label=pair[1][i])
257 |                 ax[index][i].legend()
258 |         save_filename = os.path.join(logfldr, "results.png")
259 |         plt.savefig(save_filename)
260 |         plt.close()
261 | 
262 | 
263 | def createOptions():
264 |     # Default configuration for PianoNet
265 |     parser = argparse.ArgumentParser(
266 |         description="Pytorch: Audio To Body Dynamics Model"
267 |     )
268 |     parser.add_argument("--data", type=str, default="piano_data.json",
269 |                         help="Path to data file")
270 |     parser.add_argument("--audio_file", type=str, default=None,
271 |                         help="Only in for Test. Location audio file for"
272 |                              " generating test video")
273 |     parser.add_argument("--logfldr", type=str, default=None,
274 |                         help="Path to folder to save training information",
275 |                         required=True)
276 |     parser.add_argument("--batch_size", type=int, default=100,
277 |                         help="Training batch size. Set to 1 in test")
278 |     parser.add_argument("--val_split", type=float, default=0.2,
279 |                         help="The fraction of the training data to use as validation")
280 |     parser.add_argument("--hidden_size", type=int, default=200,
281 |                         help="Dimension of the hidden representation")
282 |     parser.add_argument("--test_model", type=str, default=None,
283 |                         help="Location for saved model to load")
284 |     parser.add_argument("--vidtype", type=str, default='piano',
285 |                         help="Type of video whether piano or violin")
286 |     parser.add_argument("--visualize", type=bool, default=False,
287 |                         help="Visualize the output of the model. Use only in Test")
288 |     parser.add_argument("--save_predictions", type=bool, default=True,
289 |                         help="Whether or not to save predictions. Use only in Test")
290 |     parser.add_argument("--device", type=str, default="cuda:0",
291 |                         help="Device to train on. Use 'cpu' if to train on cpu.")
292 |     parser.add_argument("--max_epochs", type=int, default=300,
293 |                         help="max number of epochs to run for")
294 |     parser.add_argument("--lr", type=float, default=1e-3,
295 |                         help="Learning Rate for optimizer")
296 |     parser.add_argument("--time_steps", type=int, default=60,
297 |                         help="Prediction Timesteps")
298 |     parser.add_argument("--patience", type=int, default=100,
299 |                         help="Number of epochs with no validation improvement"
300 |                         " before stopping training")
301 |     parser.add_argument("--time_delay", type=int, default=6,
302 |                         help="Time delay for RNN. Negative values mean no delay."
303 |                         "Give in terms of frames. 30 frames = 1 second.")
304 |     parser.add_argument("--dp", type=float, default=0.1,
305 |                         help="Dropout Ratio For Trainining")
306 |     parser.add_argument("--upsample_times", type=int, default=2,
307 |                         help="number of times to upsample")
308 |     parser.add_argument("--numpca", type=int, default=15,
309 |                         help="number of pca dimensions. Use -1 if no pca - "
310 |                              "Train on XY coordinates")
311 |     parser.add_argument("--log_frequency", type=int, default=10,
312 |                         help="The frequency with which to checkpoint the model")
313 |     parser.add_argument("--trainable_init", action='store_false',
314 |                         help="LSTM initial state should be trained. Default is True")
315 | 
316 |     args = parser.parse_args()
317 |     return args
318 | 
319 | 
320 | def main():
321 |     args = createOptions()
322 |     args.device = torch.device(args.device)
323 |     data_loc = args.data
324 |     is_test_mode = args.test_model is not None
325 | 
326 |     dynamics_learner = AudoToBodyDynamics(args, data_loc, is_test=is_test_mode)
327 |     logfldr = args.logfldr
328 |     if not os.path.isdir(logfldr):
329 |         os.makedirs(logfldr)
330 | 
331 |     if not is_test_mode:
332 |         min_train, min_val = dynamics_learner.trainModel(
333 |             args.max_epochs, logfldr, args.patience)
334 |     else:
335 |         dynamics_learner.data_iterator.reset()
336 |         outputs = dynamics_learner.runEpoch()
337 |         iter_train, iter_val, targ, pred = outputs
338 |         min_train, min_val = np.mean(iter_train[0]), np.mean(iter_val[0])
339 | 
340 |         # Format the visualization appropriately
341 |         targ, pred = dynamics_learner.formatVizArrays(pred, targ)
342 | 
343 |         # Save the predictions
344 |         if args.save_predictions:
345 |             viz_info = (targ.tolist(), pred.tolist())
346 |             save_path = "{}/{}_data.json".format(logfldr, args.vidtype)
347 |             json.dump(viz_info, open(save_path, 'w+'))
348 | 
349 |         # Create Video of Results
350 |         if args.visualize:
351 |             vid_path = "{}/{}.mp4".format(logfldr, args.vidtype)
352 |             visualizeKeypoints(args.vidtype, targ, pred, args.audio_file, vid_path)
353 | 
354 |     best_lossess = [min_train, min_val]
355 |     log.info("The best validation is : {}".format(best_lossess))
356 | 
357 | 
358 | if __name__ == '__main__':
359 |     main()
360 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | certifi==2018.8.24
 2 | cycler==0.10.0
 3 | kiwisolver==1.0.1
 4 | matplotlib==3.0.0
 5 | numpy==1.15.2
 6 | Pillow==5.3.0
 7 | pyparsing==2.2.2
 8 | python-dateutil==2.7.3
 9 | scikit-learn==0.20.0
10 | scipy==1.1.0
11 | six==1.11.0
12 | sklearn==0.0
13 | torch==0.4.1
14 | torchvision==0.2.1
15 | opencv-python==3.4.3.18
16 | 


--------------------------------------------------------------------------------
/run_pipeline.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Copyright (c) Facebook, Inc. and its affiliates.
 4 | # All rights reserved.
 5 | #
 6 | # This source code is licensed under the license found in the
 7 | # LICENSE file in the root directory of this source tree.
 8 | #
 9 | 
10 | # This script provides end-to-end training, testing and visualization on separate body parts
11 | 
12 | test_audio="data/test_audio.wav"
13 | 
14 | # Update Arguments with desired configuration
15 | 
16 | # Train on Body
17 | echo "TRAINING MODEL ON BODY KEYPOINTS"
18 | part="body"
19 | # python pytorch_A2B_dynamics.py --data "data/train_$part.json" --vidtype body --numpca -1 --upsample_times 1 --logfldr "$HOME/logfldr/$part" --max_epoch 100 --time_steps 150
20 | # Test on Body
21 | echo "TESTING MODEL ON BODY KEYPOINTS"
22 | python pytorch_A2B_dynamics.py --data "data/test_$part.json" --test_model "$HOME/logfldr/$part/best_model_db.pth" --vidtype "$part" --batch_size 1 --audio_file "$test_audio" --logfldr "$HOME/logfldr/$part"
23 | 
24 | # Train on Left Hand
25 | echo "TRAINING MODEL ON LEFTHAND KEYPOINTS"
26 | part="lefthand"
27 | # python pytorch_A2B_dynamics.py --data "data/train_$part.json" --vidtype body --numpca 15 --upsample_times 1 --logfldr "$HOME/logfldr/$part" --max_epoch 100 --time_steps 150
28 | # Test on Left Hand
29 | echo "TESTING MODEL ON LEFTHAND KEYPOINTS"
30 | python pytorch_A2B_dynamics.py --data "data/test_$part.json" --test_model "$HOME/logfldr/$part/best_model_db.pth" --vidtype "$part" --batch_size 1 --audio_file "$test_audio" --logfldr "$HOME/logfldr/$part"
31 | 
32 | # Train on Right Hand
33 | echo "TRAINING MODEL ON RIGHTHAND KEYPOINTS"
34 | part="righthand"
35 | # python pytorch_A2B_dynamics.py --data "data/train_$part.json" --vidtype body --numpca 15 --upsample_times 1 --logfldr "$HOME/logfldr/$part" --max_epoch 100 --time_steps 150
36 | # Test on Right Hand
37 | echo "TESTING MODEL ON RIGHTHAND KEYPOINTS"
38 | python pytorch_A2B_dynamics.py --data "data/test_$part.json" --test_model "$HOME/logfldr/$part/best_model_db.pth" --vidtype "$part" --batch_size 1 --audio_file "$test_audio" --logfldr "$HOME/logfldr/$part"
39 | 
40 | # Generate the Stitched Video
41 | echo "GENERATING VIDEO OF STITCHED PART KEYPOINTS"
42 | vidtype="piano"
43 | python generate_stitched_video.py --vidtype "$vidtype" --body_path  "$HOME/logfldr/body/body_data.json" --righthand_path "$HOME/logfldr/righthand/righthand_data.json" --lefthand_path \
44 |   "$HOME/logfldr/lefthand/lefthand_data.json" --gt_path "$HOME/logfldr/gt.mp4" --pred_path "$HOME/logfldr/pred.mp4" --audio_path "$test_audio"
45 | 


--------------------------------------------------------------------------------
/visualize.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
  2 | 
  3 | from __future__ import absolute_import
  4 | from __future__ import division
  5 | from __future__ import print_function
  6 | from __future__ import unicode_literals
  7 | import cv2
  8 | import os
  9 | import numpy as np
 10 | from data_utils.transform_keypoints import transformPtsWithT
 11 | from scipy.io.wavfile import write
 12 | 
 13 | THRESH = 0
 14 | FFMPEG_LOC = "ffmpeg "
 15 | 
 16 | 
 17 | def getUpperOPBodyKeypsLines():
 18 |     kp_lines = [[0, 1], [1, 2], [0, 3], [3, 4], [4, 5]]
 19 |     return kp_lines
 20 | 
 21 | 
 22 | def getUpperOPSHELBodyKeypsLines():
 23 |     kp_lines = [[0, 1], [0, 2], [2, 3]]
 24 |     return kp_lines
 25 | 
 26 | 
 27 | def getUpperOPELWRBodyKeypsLines():
 28 |     kp_lines = [[0, 1], [2, 3]]
 29 |     return kp_lines
 30 | 
 31 | 
 32 | def getUpperOPWRBodyKeypsLines():
 33 |     kp_lines = [[0, 1]]
 34 |     return kp_lines
 35 | 
 36 | 
 37 | def getUpperOPONESIDEBodyKeypsLines():
 38 |     kp_lines = [[0, 1], [1, 2]]
 39 |     return kp_lines
 40 | 
 41 | 
 42 | def getUpperOPHandsKeypsLines():
 43 |     kp_lines = [[0, 1], [1, 2], [2, 3], [3, 4], [0, 5],
 44 |                 [5, 6], [6, 7], [7, 8], [0, 9], [9, 10],
 45 |                 [10, 11], [11, 12], [0, 13], [13, 14], [14, 15],
 46 |                 [15, 16], [0, 17], [17, 18], [18, 19], [19, 20]]
 47 |     return kp_lines
 48 | 
 49 | 
 50 | def drawMouth(image, mouthlmk, color=(0, 255, 0)):
 51 |     x = mouthlmk[:, 0]
 52 |     y = mouthlmk[:, 1]
 53 |     for indices in range(len(x) - 1):
 54 |         x1, x2 = int(x[indices]), int(x[indices + 1])
 55 |         y1, y2 = int(y[indices]), int(y[indices + 1])
 56 |         if x1 >= 0 and x2 >= 0 and y1 >= 0 and y2 >= 0:
 57 |             pt1, pt2 = (x1, y1), (x2, y2)
 58 |             cv2.line(image, pt1, pt2, color, 1, cv2.LINE_AA)
 59 |     return image
 60 | 
 61 | 
 62 | def drawBody(image, bodylmk, tform=None, confidences=None, color=(0, 255, 0),
 63 |              diffx=-750, diffy=-100):
 64 |     lines = getUpperOPBodyKeypsLines()
 65 |     x = bodylmk[0]
 66 |     y = bodylmk[1]
 67 |     for indices in lines:
 68 |         # incorporating tform
 69 |         if confidences[indices[0]] < THRESH or confidences[indices[1]] < THRESH:
 70 |             continue
 71 | 
 72 |         x1, x2 = int(x[indices[0]]), int(x[indices[1]])
 73 |         y1, y2 = int(y[indices[0]]), int(y[indices[1]])
 74 | 
 75 |         if tform is not None:
 76 | 
 77 |             tpts = transformPtsWithT(np.array([[x1, y1], [x2, y2]]), tform)
 78 |             x1, x2 = int(tpts[0, 0]), int(tpts[1, 0])
 79 |             y1, y2 = int(tpts[0, 1]), int(tpts[1, 1])
 80 | 
 81 |         # pdb.set_trace()
 82 |         x1, x2 = x1 + diffx, x2 + diffx
 83 |         y1, y2 = y1 + diffy, y2 + diffy
 84 | 
 85 |         if x1 >= 0 and x2 >= 0 and y1 >= 0 and y2 >= 0:
 86 |             pt1, pt2 = (x1, y1), (x2, y2)
 87 |             cv2.circle(image, pt1, 4, color, -1, cv2.LINE_AA)
 88 |             cv2.circle(image, pt2, 4, color, -1, cv2.LINE_AA)
 89 |             cv2.line(image, pt1, pt2, color, 3, cv2.LINE_AA)
 90 |     return image
 91 | 
 92 | 
 93 | def drawBodyAndFingers(vidtype, image, bodylmk, tform=None, confidences=None,
 94 |                        color=(0, 255, 0)):
 95 | 
 96 |     if vidtype == 'shouldelbows':
 97 |         lines = getUpperOPSHELBodyKeypsLines()
 98 |     elif vidtype == 'elbowswrists':
 99 |         lines = getUpperOPELWRBodyKeypsLines()
100 |     elif vidtype == 'wrists' or vidtype == 'vshould':
101 |         lines = getUpperOPWRBodyKeypsLines()
102 |     elif vidtype == 'righthand' or vidtype == 'vrighthand':
103 |         lines = []
104 |     elif vidtype == 'lefthand' or vidtype == 'vlefthand':
105 |         lines = []
106 |     elif vidtype == 'violinleft' or vidtype == 'violinright':
107 |         lines = getUpperOPONESIDEBodyKeypsLines()
108 |     else:
109 |         lines = getUpperOPBodyKeypsLines()
110 | 
111 |     x = bodylmk[0]
112 |     y = bodylmk[1]
113 |     for indices in lines:
114 | 
115 |         x1, x2 = int(x[indices[0]]), int(x[indices[1]])
116 |         y1, y2 = int(y[indices[0]]), int(y[indices[1]])
117 | 
118 |         if tform is not None:
119 |             tpts = transformPtsWithT(np.array([[x1, y1], [x2, y2]]), tform)
120 |             x1, x2 = int(tpts[0, 0]), int(tpts[1, 0])
121 |             y1, y2 = int(tpts[0, 1]), int(tpts[1, 1])
122 | 
123 |         if x1 >= 0 and x2 >= 0 and y1 >= 0 and y2 >= 0:
124 |             pt1, pt2 = (x1, y1), (x2, y2)
125 |             cv2.circle(image, pt1, 3, color, -1, cv2.LINE_AA)
126 |             cv2.circle(image, pt2, 3, color, -1, cv2.LINE_AA)
127 |             cv2.line(image, pt1, pt2, color, 2, cv2.LINE_AA)
128 | 
129 |     if vidtype == 'violin':
130 |         shft = 8
131 |         hlines = np.array(getUpperOPHandsKeypsLines()) + shft
132 |         hlines = np.append(hlines, np.array(getUpperOPHandsKeypsLines()) + 21 + shft, 0)
133 |     elif vidtype == 'piano':
134 |         shft = 7
135 |         hlines = np.array(getUpperOPHandsKeypsLines()) + shft
136 |         hlines = np.append(hlines, np.array(getUpperOPHandsKeypsLines()) + 21 + shft, 0)
137 |     elif vidtype == 'righthand' or vidtype == 'lefthand' or \
138 |          vidtype == 'vrighthand' or vidtype == 'vlefthand':
139 |         shft = 0
140 |         hlines = np.array(getUpperOPHandsKeypsLines()) + shft
141 |     else:
142 |         return image
143 | 
144 |     for indices in hlines:
145 |         x1, x2 = int(x[indices[0]]), int(x[indices[1]])
146 |         y1, y2 = int(y[indices[0]]), int(y[indices[1]])
147 | 
148 |         if x1 >= 0 and x2 >= 0 and y1 >= 0 and y2 >= 0:
149 |             pt1, pt2 = (x1, y1), (x2, y2)
150 |             cv2.circle(image, pt1, 2, color, -1, cv2.LINE_AA)
151 |             cv2.circle(image, pt2, 2, color, -1, cv2.LINE_AA)
152 |             cv2.line(image, pt1, pt2, color, 2, cv2.LINE_AA)
153 |     return image
154 | 
155 | 
156 | def writeAudio(vid_loc, audio_loc):
157 |     new_vid_loc = vid_loc.split(".mp4")[0] + "_audio.mp4"
158 |     cmd = FFMPEG_LOC + " -loglevel panic -i " + vid_loc + " -i " + audio_loc
159 |     cmd += " -c:v copy -c:a aac -strict experimental " + new_vid_loc
160 |     os.system(cmd)
161 |     return new_vid_loc
162 | 
163 | 
164 | def videoFromImages(imgs, outputfile, audio_path, fps=27.1):
165 |     fourcc_format = cv2.VideoWriter_fourcc(*'MP4V')
166 |     size = imgs[0].shape[1], imgs[0].shape[0]
167 |     vid = cv2.VideoWriter(outputfile, fourcc_format, fps, size)
168 |     for img in imgs:
169 |         vid.write(img)
170 |     vid.release()
171 |     if audio_path is not None:
172 |         writeAudio(outputfile, audio_path)
173 |     return outputfile
174 | 
175 | 
176 | def visualizeKeypoints(vidtype, targetKeypts, predictedKeypts, audio_path,
177 |                         outfile, img_size=300, show_pred=True, show_gt=True, fps=27.1):
178 |         targ_color = (0, 255, 0)
179 |         pred_color = (0, 0, 255)
180 |         images = []
181 | 
182 |         for ind, targkeyps in enumerate(targetKeypts):
183 |             img_y, img_x = (img_size, img_size)
184 |             if show_gt and show_pred:
185 |                 img_y, img_x = (img_size, img_size * 2)
186 |                 targkeyps[0] += img_size  # Shift the ground truth points
187 | 
188 |             newImage = np.ones((img_y, img_x, 3), dtype=np.uint8) * 255
189 |             predkeyps = predictedKeypts[ind]
190 |             if show_pred:
191 |                 newImage = drawBodyAndFingers(vidtype, newImage, predkeyps,
192 |                                               color=pred_color)
193 |             if show_gt:
194 |                 newImage = drawBodyAndFingers(vidtype, newImage, targkeyps,
195 |                                               color=targ_color)
196 | 
197 |             newImage = cv2.resize(newImage, None, fx=2, fy=2,
198 |                                   interpolation=cv2.INTER_CUBIC)
199 | 
200 |             images.append(newImage)
201 |         if images:
202 |             videoFromImages(images, outfile, audio_path, fps=fps)
203 | 


--------------------------------------------------------------------------------