├── .gitignore
├── README.md
├── RELEASE.md
└── media
├── Aki_ArchFlow_Banner.png
└── Mace_banner.png
/.gitignore:
--------------------------------------------------------------------------------
1 | ../
2 | .vscode/
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # NVIDIA Audio2Face-3D
2 |
3 | 
4 | Audio2Face-3D is an advanced technology that generates high-fidelity 3D facial animation from an audio source, supporting both pre-recorded files and real-time streams.
5 |
6 | The system analyzes vocal data to synthesize detailed and realistic articulation. This includes the precise, synchronized motion of the jaw, tongue, and eyes, as well as the subtle deformations of the facial skin. This comprehensive approach results in two key outputs:
7 |
8 | - Accurate lip-sync based on phonetic analysis.
9 | - Nuanced emotional expression inferred from the tone of the speech.
10 |
11 | Together, these elements produce a complete and lifelike facial performance that matches the source audio. Audio2Face-3D drives a character's facial performance through direct mesh deformations, joint transformations, or blend shape weights.
12 |
13 | This page serves as the central hub for all official Audio2Face-3D technologies and tools, including pre-trained models, the development SDK, plugins for Autodesk Maya and Unreal Engine 5, as well as sample datasets.
14 |
15 | ## Audio2Face-3D Collection
16 |
17 | The Audio2Face-3D collection is distributed across several platforms. The collection offers the following components:
18 |
19 | - **NVIDIA Repositories:** Access source code, helper scripts, packaged builds, and documentation.
20 |
21 | - **Hugging Face Hub:** Download pre-trained models and sample training datasets.
22 |
23 | ## NVIDIA Repositories
24 |
25 |
26 |
27 | Package |
28 | Use |
29 | Format |
30 | License |
31 | Info |
32 |
33 |
34 | Audio2Face-3D SDK |
35 | Authoring and runtime facial animations on-device or in the cloud. |
36 | Source Code - C++ & Python |
37 | MIT |
38 | More |
39 |
40 |
41 | Audio2Face-3D Training Framework |
42 | Framework (v1.0) for creating Audio2Face-3D models with your data. |
43 | Source Code - Python & Docker Container |
44 | Apache |
45 | More |
46 |
47 |
48 | Maya ACE (MACE) |
49 | Animation authoring plugin with local execution (v2.0). |
50 | Autodesk Maya Plugin Module, Source Code C++, and Python |
51 | MIT |
52 | More |
53 |
54 |
55 | Unreal Engine 5 plugin |
56 | Unreal Engine 5 plugin (v2.5) for UE 5.5 and UE 5.6 game engine workflows. |
57 | Unreal Engine (.uplugin) & Source Code C++ |
58 | MIT |
59 | More |
60 |
61 |
62 | Audio2Face-3D NIM |
63 | Simplifies the large-scale deployment of Audio2Face-3D models. |
64 | Docker Container |
65 | NVIDIA Software License Agreement and Product Specific Terms for AI Products |
66 | More |
67 |
68 |
69 |
70 |
71 | ## Hugging Face Hub
72 |
73 |
74 |
75 | Package |
76 | Use |
77 | Format |
78 | License |
79 | Links |
80 |
81 |
82 | Audio2Face-3D Models |
83 | Regression (v2.3) and diffusion (v3.0) pre-trained models for generating lip-sync. |
84 | Open Weights / onnx-trt |
85 | Nvidia Open Model |
86 | v3.0
87 | Mark v2.3
88 | Claire v2.3.1
89 | James v2.3.1 |
90 |
91 |
92 | Audio2Emotion Models |
93 | Production (v2.2) and experimental (v3.0) models to infer emotional state from audio. |
94 | Open Weights / onnx-trt |
95 | Custom (use allowed with Audio2Face only) |
96 | v2.2
97 | v3.0 |
98 |
99 |
100 | Audio2Face-3D Training Sample Data |
101 | Example dataset to get started with the training framework. |
102 | Audio files, blendshape data, animated geometry caches, geometry files, and transform files. |
103 | Custom (evaluation only) |
104 | Claire Dataset v1.0.0 |
105 |
106 |
107 |
108 |
109 | ## Audio2Face-3D SDK
110 |
111 | **Target audience:** C++ developers, tools & pipeline engineers
112 |
113 | **Link:** [A2X SDK on Github](https://github.com/NVIDIA/Audio2Face-3D-SDK)
114 |
115 | The Audio2Face-3D SDK is a powerful C++ library designed for developers who need to integrate Audio2Face-3D's real-time inference capabilities directly into their own software. It provides the necessary tools to build custom applications, plugins, or batch processing tools that can generate high-quality facial animation data from an audio source using any compatible Audio2Face-3D model.
116 |
117 | **Key Features**
118 |
119 | - Cross-Platform C++ Library: Ensures wide compatibility across different operating systems.
120 | - Real-Time Performance: Highly optimized for low-latency inference suitable for interactive applications.
121 | - Hardware Acceleration: Leverages NVIDIA GPUs for maximum performance, with a CPU fallback for broader hardware support.
122 |
123 | **Best Use Cases**
124 |
125 | - Game Development: Integrate real-time facial animation for in-game character dialogue and cinematics.
126 | - Custom Applications: Add Audio2Face-3D functionality to proprietary content creation tools.
127 | - Production Deployment: Build scalable, automated facial animation pipelines for games, films, or broadcast.
128 |
129 | ## Audio2Face-3D Training Framework
130 |
131 | **Target audience:** 3D animators, animation technical and art directors, Python scripters, ML researchers
132 |
133 | **Link:** [Audio2Face-3D Training Framework on Github](https://github.com/NVIDIA/Audio2Face-3D-training-framework)
134 |
135 | The Audio2Face-3D Training Framework provides the complete toolset required to create custom, high-performance facial animation models from your own datasets. This is the same powerful framework used by NVIDIA to train the official Mark, Claire, and James pre-trained models.
136 |
137 | By training on your own character's lip-synced animations, you can generate models that perfectly capture a desired performance style, personality, or language. A provided sample dataset helps users understand the complete workflow from data preparation to a fully trained model.
138 |
139 | **Key Features**
140 |
141 | - Custom Model Training: Train unique Audio2Face-3D models from scratch on your proprietary data.
142 | - Flexible Environment: A Python-based framework distributed via Docker for consistent and reproducible setups.
143 | - Multi-Language Support: Natively supports training on custom datasets in single or multiple languages.
144 | - Standardized Output: Exports trained models along with a JSON model card for seamless integration with other Audio2Face tools.
145 |
146 | **Best Use Cases**
147 |
148 | - Develop unique models that capture a character's specific personality and performance style.
149 | - Create language-specific models to support multilingual dialogue and localization.
150 | - Build highly specialized models to meet unique research, pipeline, or production requirements.
151 |
152 | ## Maya ACE Plugin (MACE)
153 |
154 | 
155 |
156 | **Target audience:** 3D animators, animation technical directors, Python scripters
157 |
158 | **Link:** [Maya ACE on Github](https://github.com/NVIDIA/Maya-ACE)
159 |
160 | MACE is a comprehensive plugin that integrates the full power of Audio2Face-3D directly within the Autodesk Maya environment.
161 |
162 | It provides an intuitive user interface for generating real-time facial animation from an audio source, with the flexibility to output to direct mesh deformations, object transformations, or blend shape weights. A key feature is its ability to load any compatible Audio2Face model via its JSON model card, supporting both pre-trained and custom-trained models. For advanced integration, the plugin's nodes are editable within Maya, and sample scripts are provided to demonstrate how its functionality can be extended.
163 |
164 | **Key Features**
165 |
166 | - Real-Time Feedback: Generate and preview facial animation directly in the Maya viewport.
167 | - Flexible Inference: Supports both local inference on the user's machine and remote inference by connecting to an Audio2Face-3D microservice.
168 | - Audio2Face-3D Model Support: Load any compatible Audio2Face-3D model using its JSON model card.
169 | - Production-Ready Export: Export the final animation as native Maya keyframes or in the FBX format for use in other applications.
170 | - Artist-Friendly User Interface (UI): Fully integrated into the Maya UI, requiring no coding for standard operations.
171 |
172 | **Best Use Cases**
173 |
174 | - 3D Animators: Rapidly create high-quality facial animation and iterate on performances.
175 | - Technical Artists: Create and test inference configurations and integrate Audio2Face-3D into a studio pipeline.
176 | - Directors & Supervisors: Conduct quick previs and real-time reviews of animated dialogue.
177 | - Animation Pipeline: Create configuration files for using in batch pipelines or Audio2Face-3D NIM deployments.
178 |
179 | ## Audio2Face-3D Unreal Engine 5 Plugin
180 |
181 | **Target audience:** games developers using Unreal Engine 5
182 |
183 | **Links:**
184 |
185 | * [ACE Unreal 5.6 Plugin](https://developer.nvidia.com/downloads/assets/ace/nv_ace_reference-ue5.6-v2.5.0rc3.zip)
186 | * [ACE Unreal 5.5 Plugin](https://developer.nvidia.com/downloads/assets/ace/nv_ace_reference-ue5.5-v2.5.0rc3.zip)
187 | * [ACE Unreal 5.4 Plugin](https://developer.nvidia.com/downloads/assets/ace/nv_ace_reference-ue5.4-v2.4.0.zip)
188 | * [Audio2Face-3D 3.0 Models Plugin](https://developer.nvidia.com/downloads/assets/ace/ace_3.0_a2f_models.zip)
189 | * [Audio2Face-3D 2.3 Models Plugin](https://developer.nvidia.com/downloads/assets/ace/ace_2.5_v2.3_a2f_models.zip)
190 | * [Unreal Sample Project](https://developer.nvidia.com/downloads/assets/ace/aceunrealsample-1.0.0.7z)
191 |
192 | The Audio2Face-3D Plugin for Unreal Engine 5 brings real-time, AI-driven facial animation directly into your game development and virtual production workflows.
193 |
194 | It's integrated into the editor via Blueprint nodes, allowing artists to easily connect the animation data to any character's facial rig. A comprehensive sample project is provided to demonstrate a complete setup, including a pre-configured mapping for Epic Games' MetaHumans.
195 |
196 | This solution requires two components to be installed: the core ACE Unreal plugin and the Audio2Face-3D Models plugin, as listed above.
197 |
198 | **Key Features**
199 |
200 | - Real-Time Animation: Drive character facial rigs directly within the Unreal Engine editor from an audio source.
201 | - Blueprint Integration: Provides an artist-friendly workflow through Blueprints, requiring no C++ coding for standard use.
202 | - Flexible Inference: Supports both local inference on your machine and remote inference from a separate server.
203 | - MetaHuman Ready: The included sample project demonstrates how to rig and animate a MetaHuman character.
204 | - Extensible Source Code: The full C++ source code is available for developers who wish to modify or extend the plugin's functionality.
205 |
206 | **Best Use Cases**
207 |
208 | - Game Developers: Create dynamic, audio-driven facial animation for NPCs and player characters.
209 | - Virtual Production & Cinematics: Rapidly generate facial performances for real-time filmmaking and previs.
210 |
211 | ## Audio2Face-3D NIM
212 |
213 | **Target audience:** Linux systems administrators, cloud & devops engineers
214 |
215 | **Link:** [A2F NIM on build.nvidia.com](https://build.nvidia.com/nvidia/audio2face-3d)
216 |
217 | The Audio2Face NIM (NVIDIA Inference Microservice) is a scalable, containerized microservice designed for large-scale and multi-user deployments. It exposes the full functionality of the Audio2Face-3D inference engine via the high-performance gRPC protocol, allowing multiple client applications to connect simultaneously and generate real-time facial animation.
218 |
219 | This architecture is ideal for a variety of production workflows, from powering interactive web avatars to serving as a centralized animation tool for an entire studio.
220 |
221 | **Key Features**
222 |
223 | - Large-Scale Deployment: Built to serve concurrent inference requests in a demanding production environment.
224 | - Multi-Stream Support: Capable of processing multiple simultaneous audio streams for different users or applications.
225 | - High-Performance Communication: Utilizes the gRPC protocol for low-latency and robust client-server communication.
226 | - Centralized Resource: Allows powerful GPU hardware to be centralized on a server, making Audio2Face accessible to users on less powerful client machines.
227 |
228 | **Best Use Cases**
229 |
230 | - Web & Cloud Applications: Powering interactive, web-based digital humans and cloud-native services.
231 | - Multi-Platform Runtime: Providing a backend for real-time animation inference in games and applications across various platforms (for example, mobile or consoles).
232 | - Studio Animation Services: Creating a central Audio2Face-3D service for a team of animators, democratizing access to the technology.
233 | - Automated Batch Processing: Serving as a high-throughput rendering service for generating thousands of animations in a large production pipeline.
234 |
--------------------------------------------------------------------------------
/RELEASE.md:
--------------------------------------------------------------------------------
1 | # Release Notes
2 |
3 | ## 2025-09-16
4 |
5 | * Add Maya-ACE information and template files
6 |
--------------------------------------------------------------------------------
/media/Aki_ArchFlow_Banner.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA/Audio2Face-3D/4d61b6b81ad7b5108512ea0eab10d8712ea4a236/media/Aki_ArchFlow_Banner.png
--------------------------------------------------------------------------------
/media/Mace_banner.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA/Audio2Face-3D/4d61b6b81ad7b5108512ea0eab10d8712ea4a236/media/Mace_banner.png
--------------------------------------------------------------------------------