├── .gitignore ├── Dockerfile ├── LICENSE ├── README.md ├── assets └── main.png ├── data └── .gitkeep ├── data_utils ├── deepspeech_features │ ├── README.md │ ├── deepspeech_features.py │ ├── deepspeech_store.py │ ├── extract_ds_features.py │ ├── extract_wav.py │ └── fea_win.py ├── face_parsing │ ├── logger.py │ ├── model.py │ ├── resnet.py │ └── test.py ├── face_tracking │ ├── __init__.py │ ├── convert_BFM.py │ ├── data_loader.py │ ├── face_tracker.py │ ├── facemodel.py │ ├── geo_transform.py │ ├── render_3dmm.py │ ├── render_land.py │ └── util.py ├── hubert.py ├── process.py ├── wav2mel.py ├── wav2mel_hparams.py └── wav2vec.py ├── encoding.py ├── freqencoder ├── __init__.py ├── backend.py ├── freq.py ├── setup.py └── src │ ├── bindings.cpp │ ├── freqencoder.cu │ └── freqencoder.h ├── gridencoder ├── __init__.py ├── backend.py ├── grid.py ├── setup.py └── src │ ├── bindings.cpp │ ├── gridencoder.cu │ └── gridencoder.h ├── main.py ├── nerf_triplane ├── asr.py ├── gui.py ├── network.py ├── provider.py ├── renderer.py └── utils.py ├── raymarching ├── __init__.py ├── backend.py ├── raymarching.py ├── setup.py └── src │ ├── bindings.cpp │ ├── raymarching.cu │ └── raymarching.h ├── requirements.txt ├── scripts └── train_obama.sh └── shencoder ├── __init__.py ├── backend.py ├── setup.py ├── sphere_harmonics.py └── src ├── bindings.cpp ├── shencoder.cu └── shencoder.h /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | build/ 3 | *.egg-info/ 4 | *.so 5 | *.mp4 6 | 7 | tmp* 8 | trial*/ 9 | 10 | data/* 11 | !data/.gitkeep 12 | data_utils/face_tracking/3DMM/* 13 | data_utils/face_parsing/79999_iter.pth 14 | 15 | scripts/* 16 | !scripts/train_obama.sh 17 | 18 | pretrained 19 | *.mp4 -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | ARG BASE_IMAGE=nvcr.io/nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 2 | FROM $BASE_IMAGE 3 | 4 | VOLUME [ "/ernerf" ] 5 | 6 | RUN apt-get update -yq --fix-missing \ 7 | && DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-recommends \ 8 | pkg-config \ 9 | wget \ 10 | cmake \ 11 | curl \ 12 | git \ 13 | vim \ 14 | portaudio19-dev \ 15 | ffmpeg \ 16 | libsm6 \ 17 | libxext6 18 | 19 | SHELL ["/bin/bash", "-i", "-c"] 20 | 21 | RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 22 | RUN sh Miniconda3-latest-Linux-x86_64.sh -b -u -p ~/miniconda3 23 | RUN ~/miniconda3/bin/conda init 24 | RUN source ~/.bashrc 25 | RUN rm Miniconda3-latest-Linux-x86_64.sh 26 | 27 | RUN conda install nvidia/label/cuda-11.7.1::libcufft nvidia/label/cuda-11.7.1::libcublas nvidia/label/cuda-11.7.1::libnvjpeg nvidia/label/cuda-11.7.1::libcusparse nvidia/label/cuda-11.7.1::cuda-cudart conda-forge::libnvjitlink-dev nvidia/label/cuda-11.7.1::cuda-toolkit -y 28 | RUN conda remove libnvjitlink-dev -y 29 | RUN conda install python==3.10 pytorch==1.13.1 torchvision==0.14.1 cudatoolkit==11.7.1 -c pytorch -y 30 | COPY requirements.txt ./ 31 | RUN pip install -r requirements.txt 32 | 33 | RUN conda install -c fvcore -c iopath -c conda-forge fvcore iopath -y 34 | RUN conda install -c bottler nvidiacub -y 35 | # RUN pip install "git+https://github.com/facebookresearch/pytorch3d.git" 36 | RUN pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu117_pyt1131/download.html 37 | 38 | RUN pip install tensorflow-gpu==2.8.0 39 | 40 | RUN pip uninstall protobuf -y 41 | RUN pip install protobuf==3.20.1 42 | 43 | # RUN conda install ffmpeg -y 44 | 45 | RUN echo 'export LD_LIBRARY_PATH="$CONDA_PREFIX/lib"' >> ~/.bashrc 46 | RUN ln -s $CONDA_PREFIX/lib/libcudart.so /usr/lib/libcudart.so 47 | 48 | COPY ./ /ernerf 49 | WORKDIR /ernerf 50 | 51 | CMD ["/bin/bash"] 52 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 hawkey 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis 2 | This is the official repository for our ICCV 2023 paper **Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis**. 3 | ### [Paper](https://openaccess.thecvf.com/content/ICCV2023/html/Li_Efficient_Region-Aware_Neural_Radiance_Fields_for_High-Fidelity_Talking_Portrait_Synthesis_ICCV_2023_paper.html) | [Project](https://fictionarry.github.io/ER-NeRF/) | [ArXiv](https://arxiv.org/abs/2307.09323) | [Video](https://youtu.be/Gc2d3Z8MMuI) 4 | ![image](assets/main.png) 5 | 6 | ## Updates 7 | - [2025/02/28] Our work [InsTaG](https://fictionarry.github.io/InsTaG/) at CVPR 2025 is released! 🔥 8 | - [2024/07/02] Our work [TalkingGaussian](https://fictionarry.github.io/TalkingGaussian/) at ECCV 2024 is released! 🔥 9 | - TODO: Use AU to implement SyncTalk's full expression control, just like what we do in TalkingGaussian. 10 | 11 | ## Installation 12 | 13 | Tested on Ubuntu 18.04, Pytorch 1.12 and CUDA 11.3. 14 | 15 | ### Install dependency 16 | 17 | ```bash 18 | conda create -n ernerf python=3.10 19 | conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch 20 | pip install -r requirements.txt 21 | pip install "git+https://github.com/facebookresearch/pytorch3d.git" 22 | pip install tensorflow-gpu==2.8.0 23 | ``` 24 | 25 | ### Preparation 26 | 27 | - Prepare face-parsing model. 28 | 29 | ```bash 30 | wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth 31 | ``` 32 | 33 | - Prepare the 3DMM model for head pose estimation. 34 | 35 | ```bash 36 | wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy 37 | wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy 38 | wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj 39 | wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy 40 | ``` 41 | 42 | - Download 3DMM model from [Basel Face Model 2009](https://faces.dmi.unibas.ch/bfm/main.php?nav=1-1-0&id=details): 43 | 44 | ``` 45 | # 1. copy 01_MorphableModel.mat to data_util/face_tracking/3DMM/ 46 | # 2. 47 | cd data_utils/face_tracking 48 | python convert_BFM.py 49 | ``` 50 | 51 | ## Datasets and pretrained models 52 | 53 | We get the experiment videos mainly from [AD-NeRF](https://github.com/YudongGuo/AD-NeRF), [DFRF](https://github.com/sstzal/DFRF), [GeneFace](https://github.com/yerfor/GeneFace) and YouTube. Due to copyright restrictions, we can't distribute all of them. You may have to download and crop these videos by youself. Here is an example training video (Obama) from AD-NeRF with the resolution of 450x450. 54 | 55 | ``` 56 | mkdir -p data/obama 57 | wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/obama/obama.mp4 58 | ``` 59 | 60 | We also provide pretrained checkpoints on the Obama video clip. After completing the data pre-processing step, you can [download](https://github.com/Fictionarry/ER-NeRF/releases/tag/ckpt) and test them by: 61 | 62 | ```bash 63 | python main.py data/obama/ --workspace trial_obama/ -O --test --ckpt trial_obama/checkpoints/ngp.pth # head 64 | python main.py data/obama/ --workspace trial_obama_torso/ -O --test --torso --ckpt trial_obama_torso/checkpoints/ngp.pth # head+torso 65 | ``` 66 | 67 | The test results should be about: 68 | 69 | | setting | PSNR | LPIPS | LMD | 70 | | ---------- | ------ | ------ | ----- | 71 | | head | 35.607 | 0.0178 | 2.525 | 72 | | head+torso | 26.594 | 0.0446 | 2.550 | 73 | 74 | ## Usage 75 | 76 | ### Pre-processing Custom Training Video 77 | 78 | * Put training video under `data//.mp4`. 79 | 80 | The video **must be 25FPS, with all frames containing the talking person**. 81 | The resolution should be about 512x512, and duration about 1-5 min. 82 | 83 | * Run script to process the video. (may take several hours) 84 | 85 | ```bash 86 | python data_utils/process.py data//.mp4 87 | ``` 88 | 89 | * Obtain AU45 for eyes blinking 90 | 91 | Run `FeatureExtraction` in [OpenFace](https://github.com/TadasBaltrusaitis/OpenFace), rename and move the output CSV file to `data//au.csv`. 92 | 93 | ### Audio Pre-process 94 | 95 | In our paper, we use DeepSpeech features for evaluation. 96 | 97 | You should specify the type of audio feature by `--asr_model ` when **training and testing**. 98 | 99 | * DeepSpeech 100 | 101 | ```bash 102 | python data_utils/deepspeech_features/extract_ds_features.py --input data/.wav # save to data/.npy 103 | ``` 104 | 105 | * Wav2Vec 106 | 107 | You can also try to extract audio features via Wav2Vec like [RAD-NeRF](https://github.com/ashawkey/RAD-NeRF) by: 108 | 109 | ```bash 110 | python data_utils/wav2vec.py --wav data/.wav --save_feats # save to data/_eo.npy 111 | ``` 112 | 113 | * HuBERT 114 | 115 | In our test, HuBERT extractor performs better for more languages, which has already been used in [GeneFace](https://github.com/yerfor/GeneFace). 116 | 117 | ```bash 118 | # Borrowed from GeneFace. English pre-trained. 119 | python data_utils/hubert.py --wav data/.wav # save to data/_hu.npy 120 | ``` 121 | 122 | ### Train 123 | 124 | First time running will take some time to compile the CUDA extensions. 125 | 126 | ```bash 127 | # train (head and lpips finetune, run in sequence) 128 | python main.py data/obama/ --workspace trial_obama/ -O --iters 100000 129 | python main.py data/obama/ --workspace trial_obama/ -O --iters 125000 --finetune_lips --patch_size 32 130 | 131 | # train (torso) 132 | # .pth should be the latest checkpoint in trial_obama 133 | python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt .pth --iters 200000 134 | ``` 135 | 136 | ### Test 137 | 138 | ```bash 139 | # test on the test split 140 | python main.py data/obama/ --workspace trial_obama/ -O --test # only render the head and use GT image for torso 141 | python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test # render both head and torso 142 | ``` 143 | 144 | ### Inference with target audio 145 | 146 | ```bash 147 | # Adding "--smooth_path" may help decrease the jitter of the head, while being less accurate to the original pose. 148 | python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --test_train --aud