└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # awesome-audio-visual-deepfake 2 | This repository contains papers related to audio-visual deepfake. 3 | 4 | ## DeepFake and adversarial attack 5 | + Chen, Xuanjun, et al. "[Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection](https://arxiv.org/abs/2210.00753)." 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023. [[demo]](https://xjchen.tech/Push-Pull/index.html) 6 | + Zhou, Yipin, and Ser-Nam Lim. "[Joint audio-visual deepfake detection](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhou_Joint_Audio-Visual_Deepfake_Detection_ICCV_2021_paper.pdf)." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. 7 | + Tian, Yapeng, and Chenliang Xu. "[Can audio-visual integration strengthen robustness under multimodal attacks?](https://arxiv.org/pdf/2104.02000.pdf)." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021. [[code]](https://github.com/YapengTian/AV-Robustness-CVPR21) 8 | 9 | ## Multimodal model 10 | + Zhang, Rui, et al. "[UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization](https://arxiv.org/abs/2308.14395)." Proceedings of the 31st ACM International Conference on Multimedia. 2023. [[code]](https://github.com/ymhzyj/UMMAFormer) 11 | 12 | ## Audio-only model 13 | + Shin Hyun-seo, et al. "[HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods](https://arxiv.org/abs/2309.08208)." IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2024. 14 | 15 | + Rosello et al. "[A conformer-based classifier for variable-length utterance processing in anti-spoofing](https://www.isca-archive.org/interspeech_2023/rosello23_interspeech.html)" INTERSPEECH, 2023. [[code]](https://github.com/ErosRos/conformer-based-classifier-for-anti-spoofing) 16 | ## Visual-only model 17 | 18 | ## Dataset 19 | + Cai, Zhixi, et al. "[AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset](https://arxiv.org/pdf/2311.15308v1.pdf)." arXiv preprint arXiv:2311.15308 (2023). 20 | + Cai, Zhixi, et al. "[" Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization](https://arxiv.org/abs/2305.01979)." in CVIU 2023. [[code]](https://github.com/ControlNet/LAV-DF?tab=readme-ov-file) 21 | + Cai, Zhixi, et al. "[Do you really mean that? Content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization](https://ieeexplore.ieee.org/document/10034605)." in 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2022. [[code]](https://github.com/ControlNet/LAV-DF?tab=readme-ov-file) 22 | + Khalid, Hasam, et al. "[FakeAVCeleb: A novel audio-video multimodal deepfake dataset](https://arxiv.org/abs/2108.05080)." arXiv preprint arXiv:2108.05080 (2021). [[code]](https://github.com/DASH-Lab/FakeAVCeleb) 23 | --------------------------------------------------------------------------------