├── assets
└── images
│ └── pipeline.png
└── README.md
/assets/images/pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenMotionLab/MotionChain/HEAD/assets/images/pipeline.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
Official repo for MotionChain
3 |
4 |
5 |
6 |
7 |
MotionChain: Conversational Motion Controllers via Multimodal Prompts
8 |
9 |
10 | Arxiv Paper •
11 | Demo •
12 | FAQ •
13 | Citation
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 | ## Intro MotionChain
22 |
23 | MotionChain is a unified
24 | vision-motion-language generative pre-trained model, which performs **conversational**
25 | generation tasks via **multi-modal** inputs with language models.
26 |
27 |
28 | Technical details
29 |
30 | Recent advancements in language models have demonstrated their adeptness in conducting multi-turn dialogues and retaining conversational context. However, this proficiency remains largely unexplored in other multimodal generative models, particularly in human motion models. By integrating multi-turn conversations in controlling continuous virtual human movements, generative human motion models can achieve an intuitive and step-by-step process of human task execution for humanoid robotics, game agents, or other embodied systems. In this work, we present MotionChain, a conversational human motion controller to generate continuous and long-term human motion through multimodal prompts. Specifically, MotionChain consists of multi-modal tokenizers that transform various data types such as text, image, and motion, into discrete tokens, coupled with a Vision-Motion-aware Language model. By leveraging large-scale language, vision-language, and vision-motion data to assist motion-related generation tasks, MotionChain thus comprehends each instruction in multi-turn conversation and generates human motions followed by these prompts. Extensive experiments validate the efficacy of MotionChain, demonstrating state-of-the-art performance in conversational motion generation, as well as more intuitive manners of controlling and interacting with virtual humans.
31 |
32 |
33 |
34 |
35 | ## 🚩 News
36 |
37 | - [2024/07/15] [Conversation dataset](https://huggingface.co/datasets/OpenMotionLab/MotionChain_Conv) released.
38 | - [2024/04/02] Upload paper and init project 🔥🔥🔥
39 |
40 | ## ⚡ Quick Start
41 |
42 |
46 |
47 | ## ▶️ Demo
48 |
49 |
54 |
55 | ## 👀 Visualization
56 |
57 | ## ⚠️ FAQ
58 |
59 | Question-and-Answer
60 |
61 |
62 |
63 |
64 | ## 📖 Citation
65 |
66 | If you find our code or paper helps, please consider citing:
67 |
68 | ```bibtex
69 | @misc{jiang2024motionchain,
70 | title={MotionChain: Conversational Motion Controllers via Multimodal Prompts},
71 | author={Biao Jiang and Xin Chen and Chi Zhang and Fukun Yin and Zhuoyuan Li and Gang YU and Jiayuan Fan},
72 | year={2024},
73 | eprint={2404.01700},
74 | archivePrefix={arXiv},
75 | primaryClass={cs.CV}
76 | }
77 | ```
78 |
79 | ## Acknowledgments
80 |
81 | Thanks to [BEDLAM](https://github.com/pixelite1201/BEDLAM), [TMR](https://github.com/Mathux/TMR), [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch), [Motion-GPT](https://github.com/OpenMotionLab/MotionGPT), [Motion-latent-diffusion](https://github.com/ChenFengYe/motion-latent-diffusion), [T2m-gpt](https://github.com/Mael-zys/T2M-GPT), [TEMOS](https://github.com/Mathux/TEMOS), [ACTOR](https://github.com/Mathux/ACTOR), [HumanML3D](https://github.com/EricGuo5513/HumanML3D) and [joints2smpl](https://github.com/wangsen1312/joints2smpl), our code is partially borrowing from them.
82 |
83 | ## License
84 |
85 | This code is distributed under an [MIT LICENSE](LICENSE).
86 |
87 | Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.
88 |
--------------------------------------------------------------------------------