├── pic ├── page_1_00.png ├── survey_model.png ├── jailbreak_attack_A.png ├── jailbreak_attack_B.png ├── jailbreak_evaluation.png ├── jailbreak_discriminative_defense.png └── jailbreak_transformative_defense.png └── README.md /pic/page_1_00.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuxuannan/Awesome-Multimodal-Jailbreak/HEAD/pic/page_1_00.png -------------------------------------------------------------------------------- /pic/survey_model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuxuannan/Awesome-Multimodal-Jailbreak/HEAD/pic/survey_model.png -------------------------------------------------------------------------------- /pic/jailbreak_attack_A.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuxuannan/Awesome-Multimodal-Jailbreak/HEAD/pic/jailbreak_attack_A.png -------------------------------------------------------------------------------- /pic/jailbreak_attack_B.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuxuannan/Awesome-Multimodal-Jailbreak/HEAD/pic/jailbreak_attack_B.png -------------------------------------------------------------------------------- /pic/jailbreak_evaluation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuxuannan/Awesome-Multimodal-Jailbreak/HEAD/pic/jailbreak_evaluation.png -------------------------------------------------------------------------------- /pic/jailbreak_discriminative_defense.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuxuannan/Awesome-Multimodal-Jailbreak/HEAD/pic/jailbreak_discriminative_defense.png -------------------------------------------------------------------------------- /pic/jailbreak_transformative_defense.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuxuannan/Awesome-Multimodal-Jailbreak/HEAD/pic/jailbreak_transformative_defense.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # 😈🛡️Awesome-Jailbreak-against-Multimodal-Generative-Models 3 | 🔥🔥🔥 **Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey** 4 | 5 | **[Paper](https://arxiv.org/abs/2411.09259)** 6 | 7 | We've curated a collection of the latest 😋, most comprehensive 😎, and most valuable 🤩 resources on Jailbreak Attack and Defense against Multimodel Generative Models.
8 | But we don't stop there; Our repository is constantly updated to ensure you have the most current information at your fingertips. 9 | 10 | ![survey model](https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak/blob/main/pic/page_1_00.png) 11 | 12 | ## 🤗Introduction 13 | 14 | **This survey presents a comprehensive review of existing jailbreak attack and defense against multimodal generative models.**
15 | **Given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input, encoder, generator, and output.**
16 | 17 | **🧑‍💻 Four Levels of Multimodal Jailbreak lifecycle** 18 | - Input Level: Attackers and defenders operate solely on the input data. Attackers modify inputs to execute attacks, while defenders incorporate protective cues to enhance detection.
19 | - Encoder Level: With access to the encoder, attackers optimize adversarial inputs to inject malicious information into the encoding process, while defenders work to prevent harmful information from being encoded within the latent space.
20 | - Generator Level: : With full access to the generative models, attackers leverage inference information, such as activations and gradients, and fine-tune models to increase adversarial effectiveness, while defenders use these techniques to strengthen model robustness.
21 | - Output Level: With the output from the generative model, attackers can iteratively refine adversarial inputs, while defenders can apply post-processing techniques to enhance detection.
22 | 23 | **Based on this analysis, we present a detailed taxonomy of attack methods, defense mechanisms, and evaluation frameworks specific to multimodal generative models.**
24 | **We cover a wide range of input-output configurations, including modalities such as Any-to-Text, Any-to-Vision, and Any-to-Any within generative systems.**
25 | 26 | ![survey model](https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak/blob/main/pic/survey_model.png) 27 | 28 | ## 🚀Table of Contents 29 | 30 | - [😈🛡️Awesome-Jailbreak-against-Multimodal-Generative-Models](#️awesome-jailbreak-against-multimodal-generative-models) 31 | - [🤗Introduction](#introduction) 32 | - [🚀Table of Contents](#table-of-contents) 33 | - [🔥Multimodal Generative Models](#multimodal-generative-models) 34 | - [📑Any-to-Text Models (LLM Backbone)](#any-to-text-models-llm-backbone) 35 | - [📖Any-to-Vision (Diffusion Backbone)](#any-to-vision-diffusion-backbone) 36 | - [📰Any-to-Any (Unified Backbone)](#any-to-any-unified-backbone) 37 | - [😈JailBreak Attack](#jailbreak-attack) 38 | - [📖Attack-Intro](#attack-intro) 39 | - [📑Papers](#papers) 40 | - [Jailbreak Attack of Any-to-Text Models](#jailbreak-attack-of-any-to-text-models) 41 | - [Jailbreak Attack of Any-to-Vision Models](#jailbreak-attack-of-any-to-vision-models) 42 | - [Jailbreak Attack of Any-to-Any Models](#jailbreak-attack-of-any-to-any-models) 43 | - [🛡️Jailbreak Defense](#️jailbreak-defense) 44 | - [📖Defense-Intro](#defense-intro) 45 | - [📑Papers](#papers-1) 46 | - [Jailbreak Defense of Any-to-Text Models](#jailbreak-defense-of-any-to-text-models) 47 | - [Jailbreak Defense of Any-to-Vision Models](#jailbreak-defense-of-any-to-vision-models) 48 | - [Jailbreak Defense of Any-to-Any Models](#jailbreak-defense-of-any-to-any-models) 49 | - [💯Evaluation](#evaluation) 50 | - [⭐️Evaluation Datasets](#️evaluation-datasets) 51 | - [Used to Any-to-Text Models](#used-to-any-to-text-models) 52 | - [Used to Any-to-Vision Models](#used-to-any-to-vision-models) 53 | - [📚Evaluation Methods](#evaluation-methods) 54 | - [Text Detector](#text-detector) 55 | - [Image Detector](#image-detector) 56 | - [😉Citation](#citation) 57 | 58 | 59 | 60 | ## 🔥Multimodal Generative Models 61 | 62 | **Below are tables of model short name and representative generative models used for jailbreak. For input/output modalities, I: Image, T: Text, V: Video, A: Audio.** 63 | 64 | ### 📑Any-to-Text Models (LLM Backbone) 65 | | Short Name | Modality | Representative Model | 66 | |:--------|:--------:|:--------:| 67 | | I+T→T | I + T → T |[LLaVA](https://arxiv.org/abs/2304.08485), [MiniGPT4](https://arxiv.org/abs/2304.10592), [InstructBLIP](https://arxiv.org/abs/2305.06500) | 68 | | VT2T | V + T → T |[Video-LLaVA](https://arxiv.org/abs/2311.10122), [Video-LLaMA](https://arxiv.org/abs/2306.02858) | 69 | | AT2T | A + T → T |[Audio Flamingo](https://arxiv.org/abs/2402.01831), [Audiopalm](https://arxiv.org/abs/2306.12925) | 70 | 71 | ### 📖Any-to-Vision (Diffusion Backbone) 72 | | Short Name | Modality | Representative Model | 73 | |:--------|:--------:|:--------:| 74 | | T→I | T → I |[Stable Diffusion](https://arxiv.org/abs/2112.10752), [Midjourney](https://www.midjourney.com/), [DALLE](https://platform.openai.com/docs/guides/moderation/overview) | 75 | | IT→I | I + T → I |[DreamBooth](https://arxiv.org/abs/2208.12242), [InstructP2P](https://arxiv.org/abs/2306.07154) | 76 | | T2V | T → V |[Open-Sora](https://github.com/hpcaitech/Open-Sora), [Stable Video Diffusion](https://arxiv.org/abs/2311.15127) | 77 | | IT2V | I + T → V |[VideoPoet](https://arxiv.org/abs/2312.14125), [CogVideoX](https://arxiv.org/abs/2408.06072) | 78 | 79 | ### 📰Any-to-Any (Unified Backbone) 80 | | Short Name | Modality | Representative Model | 81 | |:--------|:--------:|:--------:| 82 | | IT→IT | I + T → I + T |[Next-GPT](https://arxiv.org/abs/2309.05519), [Chameleon](https://arxiv.org/abs/2304.09842) | 83 | | TIV2TIV | T + I + V → T + I + V |[EMU3](https://arxiv.org/abs/2409.18869)| 84 | | Any2Any | Any → Any |[GPT-4o](https://openai.com/index/gpt-4o-system-card/), [Gemini Ultra](https://arxiv.org/abs/2312.11805)| 85 | 86 | 87 | ## 😈JailBreak Attack 88 | 89 | ### 📖Attack-Intro 90 | 91 | **We categorize attack methods into black-box, gray-box, and white-box attacks. in a black-box setting where the model is inaccessible to the attacker, the attack is limited to surface-level interactions, focusing solely on the model’s input and/or output. Regarding gray-box and white-box attacks, we consider model-level attacks, including attacks at both the encoder and generator.** 92 | 93 | - Input-level attack: attackers are compelled to develop more sophisticated input templates across prompt engineering, image engineering, and role-ploy techniques. 94 | - Output-level attack: Attackers focus on querying outputs across multiple input variants. Driven by specific adversarial goals, attackers employ estimation-based and search-based attack techniques to iteratively refine these input variants. 95 | 96 | jailbreak_attack_black_box 97 | 98 | - Encoder-level attack: Attackers are restricted to accessing only the encoders to provoke harmful responses. In this case, attackers typically seek to maximize cosine similarity within the latent space, ensuring the adversarial input retains similar semantics to the target malicious content while still being classified as safe. 99 | - Generator-level attack: Attackers have unrestricted access to the generative model’s architecture and checkpoint, enabling attackers to conduct thorough investigations and manipulations, thus enabling sophisticated attacks. 100 | 101 | jailbreak_attack_white_and_gray_box 102 | 103 | ### 📑Papers 104 | Below are the papers related to jailbreak attacks. 105 | ## Jailbreak Attack of Any-to-Text Models 106 | | Title | Venue | Date | Code | Taxonomy | Multimodal Model| 107 | |:--------|:--------:|:--------:|:--------:|:--------:|:--------:| 108 | |[**Jailbreaking Large Vision Language Models in Intelligent Transportation Systems**](https://arxiv.org/abs/2511.13892) | Arxiv 2025 | 2025/11/17 | None | --- | I+T→T | 109 | |[**An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs**](https://arxiv.org/abs/2511.16163) | Arxiv 2025 | 2025/11/20 | None | --- | I+T→T | 110 | |[**The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks**](https://arxiv.org/abs/2511.16347) | Arxiv 2025 | 2025/11/20 | None | --- | I+T→T | 111 | |[**Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs**](https://arxiv.org/abs/2505.11842) | NeurIPS 2025 | 2025/05/17 | [Homepage](https://liuxuannan.github.io/Video-SafetyBench.github.io/) | Input Level | V+T→T | 112 | |[**Towards Effective MLLM Jailbreaking Through Balanced On-Topicness and OOD-Intensity**](https://arxiv.org/abs/2508.09218) | Arxiv 2025 | 2025/08/11 | [Github](https://github.com/LumaLab-ai/BSD) | --- | I+T→T | 113 | |[**JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering**](https://arxiv.org/abs/2508.05087) | ACM MM 2025 | 2025/08/07 | [Github](https://github.com/thu-coai/JPS) | --- | I+T→T | 114 | |[**PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking**](https://arxiv.org/abs/2507.21540) | Arxiv 2025 | 2025/07/29 | None | --- | I+T→T | 115 | |[**Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models**](https://arxiv.org/abs/2506.16760) | Arxiv 2025 | 2025/07/20 | None | --- | I+T→T | 116 | |[**Innocence in the Crossfire: Roles of Skip Connections in Jailbreaking Visual Language Models**](https://arxiv.org/abs/2507.13761) | TMM 2025 | 2025/07/18 | None | --- | I+T→T | 117 | |[**Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection**](https://arxiv.org/abs/2507.02844) | Arxiv 2025 | 2025/07/03 | [Github](https://github.com/Dtc7w3PQ/Visco-Attack) | --- | I+T→T | 118 | |[**Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models**](http://arxiv.org/abs/2506.01307) | TCSVT 2025 | 2025/07/02 | [Github](https://github.com/Dtc7w3PQ/Visco-Attack) | --- | I+T→T | 119 | |[**USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models**](https://arxiv.org/abs/2505.23793) | Arxiv 2025 | 2025/05/26 | [Github](https://github.com/Hongqiong12/USB-SafeBench) | --- | I+T→T | 120 | |[**VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration**](https://arxiv.org/abs/2505.20362) | Arxiv 2025 | 2025/05/26 | [Github](https://github.com/jiahuigeng/VSCBench) | --- | I+T→T | 121 | |[**JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models**](https://arxiv.org/abs/2505.19610) | Arxiv 2025 | 2025/05/26 | None | --- | I+T→T | 122 | |[**Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models**](https://arxiv.org/abs/2501.13772) | Arxiv 2025 | 2025/05/26 | None | --- | A+T→T | 123 | |[**Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framewor**](https://arxiv.org/abs/2505.18864) | Arxiv 2025 | 2025/05/24 | None | --- | A+T→T | 124 | |[**JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models**](https://arxiv.org/abs/2505.17568) | Arxiv 2025 | 2025/05/23 | None | --- | A+T→T | 125 | |[**BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation**](https://arxiv.org/abs/2505.16446) | Arxiv 2025 | 2025/05/22 | None | --- | I+T→T | 126 | |[**AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models**](https://arxiv.org/abs/2505.14103) | Arxiv 2025 | 2025/05/20 | None | --- | A+T→T | 127 | |[**Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models**](https://arxiv.org/abs/2505.12443) | Arxiv 2025 | 2025/05/18 | None | --- | I+T→T | 128 | |[**Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model**](https://arxiv.org/abs/2505.06538) | Arxiv 2025 | 2025/05/10 | [Github](https://github.com/xinyuelou/Think-in-Safety) | --- | I+T→T | 129 | |[**SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models**](https://arxiv.org/abs/2504.08813) | Arxiv 2025 | 2025/04/09 | [Github](https://github.com/fangjf1/OpenSafeMLRM) | --- | I+T→T | 130 | |[**PiCo: Jailbreaking Multimodal Large Language Models via Pictorial Code Contextualization**](https://arxiv.org/abs/2504.01444) | Arxiv 2025 | 2025/04/02 | None | --- | I+T→T | 131 | |[**Multilingual and Multi-Accent Jailbreaking of Audio LLMs**](https://arxiv.org/abs/2504.01094) | Arxiv 2025 | 2025/04/01 | None | --- | A+T→T | 132 | |[**Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy**](https://arxiv.org/abs/2503.20823) | CVPR 2025 | 2025/03/26 | [Github](https://github.com/naver-ai/JOOD) | --- | I+T→T | 133 | |[**MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks**](https://arxiv.org/abs/2503.19134) | Arxiv 2025 | 2025/03/24 | None | --- | I+T→T | 134 | |[**Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization**](https://arxiv.org/abs/2503.11750) | Arxiv 2025 | 2025/03/14 | None | --- | I+T→T | 135 | |[**ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content**](https://arxiv.org/abs/2503.09964) | Arxiv 2025 | 2025/03/13 | None | --- | I+T→T | 136 | |[**Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs**](https://arxiv.org/abs/2503.06989) | Arxiv 2025 | 2025/03/10 | None | --- | I+T→T | 137 | |[**FC-Attack: Jailbreaking Large Vision-Language Models via Auto-Generated Flowcharts**](https://arxiv.org/abs/2502.21059) | Arxiv 2025 | 2025/02/28 | None | --- | I+T→T | 138 | |[**EigenShield: Causal Subspace Filtering via Random Matrix Theory for Adversarially Robust Vision-Language Models**](https://arxiv.org/abs/2502.14976) | Arxiv 2025 | 2025/02/20 | None | --- | I+T→T | 139 | |[**Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs**](https://arxiv.org/abs/2502.11184) | Arxiv 2025 | 2025/02/16 | None | --- | I+T→T | 140 | |[**Distraction is All You Need for Multimodal Large Language Model Jailbreaking**](https://arxiv.org/abs/2502.10794) | CVPR 2025 | 2025/02/15 | None | --- | I+T→T | 141 | |[**ELITE: Enhanced Language-Image Toxicity Evaluation for Safety**](https://arxiv.org/abs/2502.04757) | Arxiv 2025 | 2025/02/07 | None | --- | I+T→T | 142 | |[**`Do as I say not as I do': A Semi-Automated Approach for Jailbreak Prompt Attack against Multimodal LLMs**](https://arxiv.org/abs/2502.00735) | Arxiv 2025 | 2025/02/02 | None | --- | A+T→T | 143 | |[**"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models**](https://arxiv.org/abs/2502.00718) | Arxiv 2025 | 2025/02/02 | [Github](https://isha-gpt.github.io/) | --- | A+T→T | 144 | |[**Failures to Find Transferable Image Jailbreaks Between Vision-Language Models**](https://openreview.net/forum?id=wvFnqVVUhN) | ICLR 2025 | 2025/01/23 | [Github](https://github.com/RylanSchaeffer/AstraFellowship-When-Do-VLM-Image-Jailbreaks-Transfer) | Generator Level | I+T→T | 145 | |[**Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency**](https://arxiv.org/abs/2501.04931) | ICCV 2025 | 2025/01/09 | None | --- | I+T→T | 146 | |[**Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models**](https://arxiv.org/abs/2412.16555) | Arxiv 2024 | 2024/12/21 | None | --- | I+T+A→T | 147 | |[**AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models**](https://arxiv.org/abs/2412.08608) | ICLR 2025 | 2024/12/11 | [Github](https://github.com/kangmintong/AdvWave) | Generator Level | A+T→T | 148 | |[**Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models**](https://arxiv.org/abs/2412.05934) | ICCV 2025 | 2024/12/8 | [Github](https://github.com/MaTengSYSU/HIMRD-jailbreak) | --- | I+T→T | 149 | |[**PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization**](https://arxiv.org/abs/2412.05892) | Arxiv 2024 | 2024/12/8 | None | --- | I+T→T | 150 | |[**Jailbreak Large Vision-Language Models Through Multi-Modal Linkage**](https://arxiv.org/abs/2412.00473) | Arxiv 2024 | 2024/11/30 | [Github](https://github.com/wangyu-ovo/MML) | --- | I+T→T | 151 | |[**Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models**](https://arxiv.org/abs/2411.18000) | CVPR 2025 | 2024/11/27 | None | --- | I+T→T | 152 | |[**The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense**](https://arxiv.org/abs/2411.08410) | Arxiv 2024 | 2024/11/13 | None | --- | I+T→T | 153 | |[**MMJ-Bench : A Comprehensive Study on Jailbreak Attacks and Defenses for Multimodal Large Language Models**](https://arxiv.org/abs/2408.08464) | Arxiv 2024 | 2024/08/16 | None | --- | I+T→T | 154 | |[**Failures to Find Transferable Image Jailbreaks Between Vision-Language Models**](https://arxiv.org/abs/2407.15211) | NeurIPS 2024 Workshops | 2024/07/21 | None | --- | I+T→T | 155 | |[**MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models**](https://arxiv.org/abs/2406.07594) | NeurIPS 2024 | 2024/06/11 | [Github](https://github.com/Carol-gutianle/MLLMGuard) | --- | I+T→T | 156 | |[**Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks**](https://arxiv.org/abs/2406.06302) | Arxiv 2024 | 2024/06/10 | [Github](https://github.com/NY1024/Jailbreak_GPT4o) | --- | I+T→T | 157 | |[**Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?**](https://arxiv.org/abs/2404.03411) | Arxiv 2024 | 2024/04/04 | [Github](https://github.com/chenxshuo/RedTeamingGPT4V) | --- | I+T→T | 158 | |[**VLSBench: Unveiling Visual Leakage in Multimodal Safety**](https://arxiv.org/abs/2411.19939) | ACL 2025 | 2024/11/29 | [Homepage](https://hxhcreate.github.io/vlsbench.github.io/) | Input Level | I+T→T | 159 | |[**Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models**](https://arxiv.org/abs/2411.11496) | Arxiv 2024 | 2024/11/18 | [Github](https://github.com/gzcch/Safety_Snowball_Agent) | Output Level | I+T→T | 160 | |[**IDEATOR: Jailbreaking Large Vision-Language Models Using Themselves**](https://arxiv.org/abs/2411.00827) | ICCV 2025 | 2024/11/15 | [Github](https://github.com/roywang021/IDEATOR) | Output Level | I+T→T | 161 | |[**Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models**](https://arxiv.org/abs/2411.07559) | NeurIPS SafeGenAi Workshop 2024 | 2024/11/12 | None | Output Level | I+T→T | 162 | |[**Audio is the achilles’heel: Red teaming audio large multimodal models**](https://arxiv.org/abs/2410.23861) | Arxiv 2024 | 2024/10/31 | [Github](https://github.com/YangHao97/RedteamAudioLMMs) | Input Level | A+T→T | 163 | |[**Advweb: Controllable black-box attacks on vlm-powered web agents**](https://arxiv.org/abs/2410.17401) | Arxiv 2024 | 2024/10/22 | None | Input Level | I+T→T | 164 | |[**Can Large Language Models Automatically Jailbreak GPT-4V?**](https://arxiv.org/abs/2407.16686) | NAACL Workshop 2024 | 2024/07/23 | None | Input Level | I+T→T | 165 | |[**Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts**](https://arxiv.org/abs/2407.15050) | ACM MM 2024 | 2024/07/21 | None | Input Level | I+T→T | 166 | |[**Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything**](https://arxiv.org/abs/2407.02534) | Arxiv 2024 | 2024/07/01 | None | Input Level | I+T→T | 167 | |[**From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking**](https://arxiv.org/abs/2406.14859) | EMNLP 2024 | 2024/06/21 | None | Encoder Level | I+T→T | 168 | |[**Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt**](https://arxiv.org/abs/2406.04031) | Arxiv 2024 | 2024/06/06 | [Github](https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt) | Generator Level | I+T→T | 169 | |[**Efficient LLM-Jailbreaking by Introducing Visual Modality**](https://arxiv.org/abs/2405.20015) | Arxiv 2024 | 2024/05/30 | [Github](https://github.com/nobody235/LLM-jailbreak) | Generator Level | I+T→T | 170 | |[**White-box Multimodal Jailbreaks Against Large Vision-Language Models**](https://arxiv.org/abs/2405.17894) | ACM MM 2024 | 2024/05/28 | [Github](https://github.com/roywang021/UMK) | Generator Level | I+T→T | 171 | |[**Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models**](https://arxiv.org/abs/2405.20775) | Arxiv 2024 | 2024/05/26 | [Github](https://github.com/dirtycomputer/O2M_attack) | --- | I+T→T | 172 | |[**Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character**](https://arxiv.org/abs/2405.20773) | Arxiv 2024 | 2024/05/25 | [Github](https://github.com/SiyuanMaCS/VisualRoleplay) | Input Level | I+T→T | 173 | |[**Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models**](https://arxiv.org/abs/2403.09792) | ECCV 2024 | 2024/05/14 | [Github](https://github.com/RUCAIBox/HADES)| Generator Level | I+T→T | 174 | |[**Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast**](https://arxiv.org/abs/2402.08567) | ICML 2024 | 2024/02/13 | [Github](https://github.com/sail-sg/Agent-Smith) | Generator Level | I+T→T | 175 | |[**Jailbreaking Attack against Multimodal Large Language Model**](https://arxiv.org/abs/2402.02309) | Arxiv 2024 | 2024/02/04 | [Github](https://github.com/abc03570128/Jailbreaking-Attack-against-Multimodal-Large-Language-Model) | Generator Level | I+T→T | 176 | |[**Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models**](https://arxiv.org/abs/2307.14539) | ICLR 2024 Spotlight | 2024/01/16 | [Github](https://github.com/erfanshayegani/Jailbreak-In-Pieces) | Encoder Level | I+T→T | 177 | |[**MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models**](https://arxiv.org/abs/2311.17600) | ECCV 2024 | 2023/11/29 | [Github](https://github.com/isXinLiu/MM-SafetyBench) | Input Level | I+T→T | 178 | |[**How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs**](https://arxiv.org/abs/2311.16101) | ECCV 2024 | 2023/11/27 | [Github](https://github.com/UCSC-VLAA/vllm-safety-benchmark) | Encoder Level | I+T→T | 179 | |[**Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts**](https://arxiv.org/abs/2311.09127) | Arxiv 2023 | 2023/11/15 | None | Output Level | I+T→T | 180 | |[**FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts**](https://arxiv.org/abs/2311.05608) | AAAI 2025 | 2023/11/09 | [Github](https://github.com/ThuCCSLab/FigStep) | Input Level | I+T→T | 181 | |[**Image Hijacks: Adversarial Images can Control Generative Models at Runtime**](https://arxiv.org/abs/2309.00236) | ICML 2024 | 2023/09/01 | [Github](https://github.com/euanong/image-hijacks) | Generator Level | I+T→T | 182 | |[**Are aligned neural networks adversarially aligned?**](https://arxiv.org/abs/2306.15447) | NeurIPS 2023 | 2023/06/26 | None | Generator Level | I+T→T | 183 | |[**Visual Adversarial Examples Jailbreak Aligned Large Language Models**](https://arxiv.org/abs/2306.13213) | AAAI 2024 | 2023/06/22 | [Github](https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models) | Generator Level | I+T→T | 184 | |[**On Evaluating Adversarial Robustness of Large Vision-Language Models**](https://proceedings.neurips.cc/paper_files/paper/2023/hash/a97b58c4f7551053b0512f92244b0810-Abstract-Conference.html) | NeurIPS 2023 | 2023/05/26 | [Homepage](https://yunqing-me.github.io/AttackVLM/) | Encoder Level | I+T→T | 185 | 186 | 187 | ## Jailbreak Attack of Any-to-Vision Models 188 | | Title | Venue | Date | Code | Taxonomy | Multimodal Model | 189 | |:--------|:--------:|:--------:|:--------:|:--------:|:--------:| 190 | |[**Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards**](https://arxiv.org/abs/2508.05658) | ACM MM 2025 | 2025/07/30 | [Github](https://github.com/yszbb/U3-Attack) | --- | T→I | 191 | |[**From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models**](https://arxiv.org/abs/2507.17922) | Arxiv 2025 | 2025/07/23 | None | --- | T→I | 192 | |[**PLA: Prompt Learning Attack against Text-to-Image Generative Models**](https://arxiv.org/abs/2508.03696) | ICCV 2025 | 2025/07/14 | None | --- | T→I | 193 | |[**GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization**](https://arxiv.org/abs/2505.18979) | Arxiv 2025 | 2025/05/25 | None | --- | T→I | 194 | |[**TokenProber: Jailbreaking Text-to-image Models via Fine-grained Word Impact Analysis**](https://arxiv.org/abs/2505.08804) | Arxiv 2025 | 2025/05/11 | None | --- | T→I | 195 | |[**T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks**](https://arxiv.org/abs/2505.06679) | Arxiv 2025 | 2025/05/10 | None | --- | T→V | 196 | |[**Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems**](https://arxiv.org/abs/2504.20376) | Arxiv 2025 | 2025/04/29 | None | --- | T→I | 197 | |[**Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models**](https://arxiv.org/abs/2504.11106) | Arxiv 2025 | 2025/04/15 | None | --- | T→I | 198 | |[**Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking**](https://arxiv.org/abs/2504.05838) | CVPR 2025 Highlight | 2025/04/08 | [Github](https://github.com/fhdnskfbeuv/attackIPA) | --- | T→I | 199 | |[**Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning**](https://arxiv.org/abs/2503.17987) | Arxiv 2025 | 2025/03/23 | None | --- | T→I | 200 | |[**Jailbreaking Safeguarded Text-to-Image Models via Large Language Models**](https://arxiv.org/abs/2503.01839) | Arxiv 2025 | 2025/03/03 | None | --- | T→I | 201 | |[**Unified Prompt Attack Against Text-to-Image Generation Models**](https://arxiv.org/abs/2502.16423) | TPAMI 2025 | 2024/02/23 | None | --- | T→I | 202 | |[**T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation**](https://arxiv.org/abs/2501.12612) | Arxiv 2025 | 2025/02/22 | [Github](https://github.com/adwardlee/t2i_safety) | --- | T→I | 203 | |[**CogMorph: Cognitive Morphing Attacks for Text-to-Image Models**](https://arxiv.org/abs/2501.11815) | Arxiv 2025 | 2024/01/21 | None | --- | T→I | 204 | |[**FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models**](https://arxiv.org/abs/2412.18302) | Arxiv 2024 | 2024/12/24 | None | --- | T→I | 205 | |[**Antelope: Potent and Concealed Jailbreak Attack Strategy**](https://arxiv.org/abs/2412.08156) | Arxiv 2024 | 2024/12/11 | None | --- | T→I | 206 | |[**Multimodal Pragmatic Jailbreak on Text-to-image Models**](https://arxiv.org/abs/2409.19149) | Arxiv 2024 | 2024/09/27 | None | --- | T→I | 207 | |[**In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models**](https://arxiv.org/abs/2411.16769) | Arxiv 2024 | 2024/11/25 | None | Output Level | T→I | 208 | |[**Unfiltered and Unseen: Universal Multimodal Jailbreak Attacks on Text-to-Image Model Defenses**](https://openreview.net/forum?id=sshYEYQ82L) | Openreview | 2024/11/13 | None | --- | T→I | 209 | |[**AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models**](https://arxiv.org/abs/2410.21471) | Arxiv 2024 | 2024/10/28 | [Github](https://github.com/Spinozaaa/AdvI2I) | Encoder Level | T→I | 210 | |[**Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step**](https://arxiv.org/abs/2410.03869) | Arxiv 2024 | 2024/10/4 | None | Output Level | T→I | 211 | |[**ColJailBreak: Collaborative Generation and Editing for Jailbreaking Text-to-Image Deep Generation**](https://openreview.net/forum?id=eGIzeTmAtE) | NeurIPS 2024 | 2024/9/25 | [Github](https://github.com/tsingqguo/coljailbreak) | Input Level | T→I | 212 | |[**HTS-Attack: Heuristic Token Search for Jailbreaking Text-to-Image Models**](https://arxiv.org/abs/2408.13896) | Arxiv 2024 | 2024/08/25 | None | Output Level | T→I | 213 | |[**Perception-guided Jailbreak against Text-to-Image Models**](https://arxiv.org/abs/2408.10848) | AAAI 2025 | 2024/08/20 | [Github](https://github.com/LeLiang-SJTU/Perception-guided-Jailbreak-) | Input Level | T→I | 214 | |[**DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization**](https://arxiv.org/abs/2408.11071) | NAACL 2025 | 2024/08/18 | [Github](https://github.com/CherryBlueberry/DiffZOO) | Output Level | T→I | 215 | |[**Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models**](https://arxiv.org/abs/2404.02928) | Arxiv 2024 | 2024/08/02 | None | Encoder Level | T→I | 216 | |[**Jailbreaking Text-to-Image Models with LLM-Based Agents**](https://arxiv.org/abs/2408.00523) | Arxiv 2024 | 2024/08/01 | None | Output Level | T→I | 217 | |[**Automatic Jailbreaking of the Text-to-Image Generative AI Systems**](https://arxiv.org/abs/2405.16567) | ICML 2024 Workshop NextGenAISafety | 2024/05/26 | [Github](https://github.com/Kim-Minseon/APGP) | Output Level | T→I | 218 | |[**UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers**](https://arxiv.org/abs/2405.11336) | ICML 2024 | 2024/05/18 | None | Input Level | T→I | 219 | |[**BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators**](https://arxiv.org/abs/2402.15218) | Arxiv 2024 | 2024/02/23 | None | Input Level | T→I | 220 | |[**Harnessing LLM to Attack LLM-Guarded Text-to-Image Models**](https://arxiv.org/abs/2312.07130) | Arxiv 2023 | 2023/12/12 | [Github](https://github.com/researchcode001/Divide-and-Conquer-Attack) | Input Level | T→I | 221 | |[**MMA-Diffusion: MultiModal Attack on Diffusion Models**](https://arxiv.org/abs/2311.17516) | CVPR 2024 | 2023/11/29 | [Github](https://github.com/cure-lab/MMA-Diffusion) | Encoder Level | T→I | 222 | |[**VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models**](https://arxiv.org/abs/2312.00057) | CVPR 2024 | 2023/11/29 | [Github](https://github.com/South7X/VA3) | Generator Level | T→I | 223 | |[**To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now**](https://arxiv.org/abs/2310.11868) | ECCV 2024 | 2023/10/18 | [Github](https://github.com/OPTML-Group/Diffusion-MU-Attack) | Generator Level | T→I | 224 | |[**Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?**](https://arxiv.org/abs/2310.10012) | ICLR 2024 | 2023/10/16 | [Github]( https://github.com/chiayi-hsu/Ring-A-Bell) | Encoder Level | T→I | 225 | |[**SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution**](https://arxiv.org/abs/2309.14122) | CCS 2024 | 2023/09/25 | None | Input Level | T→I | 226 | |[**Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts**](https://arxiv.org/abs/2309.06135) | ICML 2024 | 2023/09/12 | [Github](https://github.com/joycenerd/P4D) | Generator Level | T→I | 227 | |[**SneakyPrompt: Jailbreaking Text-to-image Generative Models**](https://arxiv.org/abs/2305.12082) | Symposium on Security and Privacy 2024 | 2023/05/20 | [Github](https://github.com/Yuchen413/text2image_safety) | Output Level | T→I | 228 | |[**Red-Teaming the Stable Diffusion Safety Filter**](https://arxiv.org/abs/2210.04610) | NeurIPSW 2022 | 2022/10/03 | None | Input Level | T→I | 229 | 230 | 231 | 232 | 233 | ## Jailbreak Attack of Any-to-Any Models 234 | | Title | Venue | Date | Code | Taxonomy | Multimodal Model | 235 | |:--------|:--------:|:--------:|:--------:|:--------:|:--------:| 236 | |[**Gradient-based Jailbreak Images for Multimodal Fusion Models**](https://arxiv.org/abs/2410.03489) | Arxiv 2024 | 2024/10/4 | [Github](https://github.com/facebookresearch/multimodal-fusion-jailbreaks) | Generator Level | I+T→I+T | 237 | |[**Voice jailbreak attacks against gpt-4o**](https://arxiv.org/abs/2405.19103) | Arxiv 2024 | 2024/05/29 | [Github](https://github.com/TrustAIRLab/VoiceJailbreakAttack) | Output Level | Any→Any | 238 | 239 | ## 🛡️Jailbreak Defense 240 | 241 | ### 📖Defense-Intro 242 | 243 | **Current efforts made in the jailbreak defense of multimodal generative models include two lines of work: Discriminative defense and Transformative defense.** 244 | - Discriminative defenses: is constrained to classification tasks for assigning binary labels. 245 | 246 | jailbreak_discriminative_defense 247 | 248 | - Transformative Defense: aims to produce appropriate and safe responses in the presence of malicious or adversarial inputs. 249 | 250 | jailbreak_transformative_defense 251 | 252 | ### 📑Papers 253 | 254 | Below are the papers related to jailbreak defense. 255 | 256 | ## Jailbreak Defense of Any-to-Text Models 257 | | Title | Venue | Date | Code | Taxonomy | Multimodal Model | 258 | |:--------|:--------:|:--------:|:--------:|:--------:|:--------:| 259 | |[**Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security**](https://arxiv.org/abs/2511.16229) | Arxiv 2025 | 2025/11/20 | https://github.com/Amadeuszhao/QMLLM | --- | I+T→T | 260 | |[**Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models: A Unified and Accurate Approach**](https://arxiv.org/abs/2508.09201) | Arxiv 2025 | 2025/08/08 | None | --- | I+T→T | 261 | |[**Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security**](https://arxiv.org/abs/2507.22037) | Arxiv 2025 | 2025/07/29 | None | --- | I+T→T | 262 | |[**SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism**](https://arxiv.org/abs/2507.01513) | Arxiv 2025 | 2025/07/02 | None | --- | I+T→T | 263 | |[**The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models**](https://arxiv.org/abs/2506.15734) | Arxiv 2025 | 2025/06/15 | None | --- | I+T→T | 264 | |[**Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models**](https://arxiv.org/abs/2505.22271) | Arxiv 2025 | 2025/05/28 | None | --- | I+T→T | 265 | |[**GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning**](https://arxiv.org/abs/2505.11049) | Arxiv 2025 | 2025/05/16 | [Github](https://github.com/yueliu1999/GuardReasoner-VL/) | --- | I+T→T | 266 | |[**DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models**](https://arxiv.org/abs/2504.18053) | Arxiv 2025 | 2025/04/25 | [Github](https://github.com/Kizna1ver/DREAM) | --- | I+T→T | 267 | |[**Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?**](https://arxiv.org/abs/2504.10000) | CVPR 2025 | 2025/04/14 | None | --- | I+T→T | 268 | |[**JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model**](https://arxiv.org/abs/2504.03770) | Arxiv 2025 | 2025/04/3 | [Github](https://github.com/ShenzheZhu/JailDAM) | --- | I+T→T | 269 | |[**Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks**](https://arxiv.org/abs/2504.01308) | ICCV 2025 | 2025/04/2 | [Github](https://github.com/JarvisUSTC/DiffPure-RobustVLM) | --- | I+T→T | 270 | |[**Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense**](https://arxiv.org/abs/2503.11619) | Arxiv 2025 | 2025/03/14 | None | --- | I+T→T | 271 | |[**Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs**](https://arxiv.org/abs/2503.06989) | Arxiv 2025 | 2025/03/10 | None | --- | I+T→T | 272 | |[**Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks**](https://arxiv.org/abs/2503.04833) | Arxiv 2025 | 2025/03/05 | None | --- | I+T→T | 273 | |[**HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States**](https://arxiv.org/abs/2502.14744) | ACL 2025 | 2025/02/20 | [Github](https://github.com/leigest519/HiddenDetect) | --- | I+T→T | 274 | |[**SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning**](https://arxiv.org/abs/2502.12520) | Arxiv 2025 | 2025/02/18 | None | --- | I+T→T | 275 | |[**Understanding and Rectifying Safety Perception Distortion in VLMs**](https://arxiv.org/abs/2502.13095) | Arxiv 2025 | 2025/02/18 | None | --- | I+T→T | 276 | |[**Adversary-Aware DPO: Enhancing Safety Alignment in Vision Language Models via Adversarial Training**](https://arxiv.org/abs/2502.11455) | Arxiv 2025 | 2025/02/17 | None | --- | I+T→T | 277 | |[**Towards Robust Multimodal Large Language Models Against Jailbreak Attacks**](https://arxiv.org/abs/2502.00653) | Arxiv 2025 | 2025/02/02 | None | --- | I+T→T | 278 | |[**Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models**](https://arxiv.org/abs/2501.18533) | Arxiv 2025 | 2025/01/30 | [Github](https://github.com/DripNowhy/MIS) | --- | I+T→T | 279 | |[**Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update**](https://arxiv.org/abs/2501.16378) | Arxiv 2025 | 2025/01/24 | None | --- | I+T→T | 280 | |[**MSTS: A Multimodal Safety Test Suite for Vision-Language Models**](https://arxiv.org/abs/2501.10057) | Arxiv 2025 | 2025/01/17 | [Github](https://github.com/paul-rottger/msts-multimodal-safety) | --- | I+T→T | 281 | |[**Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models**](https://arxiv.org/abs/2501.02029) | Arxiv 2025 | 2025/01/03 | [Github](https://github.com/Ziwei-Zheng/SAHs) | --- | I+T→T | 282 | |[**Defending LVLMs Against Vision Attacks through Partial-Perception Supervision**](https://arxiv.org/abs/2412.12722) | Arxiv 2024 | 2024/12/17 | None | --- | I+T→T | 283 | |[**VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data**](https://arxiv.org/abs/2410.00296) | Arxiv 2024 | 2024/10/01 | None | --- | I+T→T | 284 | |[**Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment**](https://arxiv.org/abs/2411.18688) | CVPR 2025 | 2024/11/27 | [Github](https://github.com/itsvaibhav01/Immune) | Output Level | I+T→T | 285 | |[**Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks**](https://arxiv.org/abs/2411.16721) | CVPR 2025 | 2024/11/23 | [Github](https://github.com/ASTRAL-Group/ASTRA) | Generator Level | I+T→T | 286 | |[**Uniguard: Towards universal safety guardrails for jailbreak attacks on multimodal large language models**](https://arxiv.org/abs/2411.01703) | Arxiv 2024 | 2024/11/03 | None | Input Level | I+T→T | 287 | |[**Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector**](https://arxiv.org/abs/2410.22888) | Arxiv 2024 | 2024/10/30 | [Github](https://github.com/mob-scu/RADAR-NEARSIDE) | Generator Level | I+T→T | 288 | |[**BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks**](https://arxiv.org/abs/2410.20971) | ICLR 2025 | 2024/10/28 | [Github](https://github.com/Vinsonzyh/BlueSuffix) | Input Level | I+T→T | 289 | |[**The Great Contradiction Showdown: How Jailbreak and Stealth Wrestle in Vision-Language Models?**](https://arxiv.org/abs/2410.01438) | Arxiv 2024 | 2024/10/02 | None | Input Level | I+T→T | 290 | |[**CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration**](https://arxiv.org/abs/2409.11365) | COLM 2024 | 2024/9/17 | None | Output Level | I+T→T | 291 | |[**Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks**](https://arxiv.org/abs/2409.07353) | BigData 2024 | 2024/09/11 | None | Encoder Level | I+T→T | 292 | |[**Bathe: Defense against the jailbreak attack in multimodal large language models by treating harmful instruction as backdoor trigger**](https://arxiv.org/abs/2408.09093) | Arxiv 2024 | 2024/08/17 | None | Generator Level | I+T→T | 293 | |[**Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models**](https://arxiv.org/abs/2407.21659v4) | EMNLP 2024 Findings | 2024/07/31 | [Github](https://github.com/pandragonxiii/cider) | Encoder Level | I+T→T | 294 | |[**Sim-clip: Unsupervised siamese adversarial fine-tuning for robust and semantically-rich vision-language models**](https://arxiv.org/abs/2407.14971) | Arxiv 2024 | 2024/07/20 | [Github](https://github.com/speedlab-git/SimCLIP) | Encoder Level | I+T→T | 295 | |[**Can Textual Unlearning Solve Cross-Modality Safety Alignment?**](https://aclanthology.org/2024.findings-emnlp.574.pdf) | EMNLP 2024 Findings | 2024/05/27 | None | Generator Level | I+T→T | 296 | |[**Safety alignment for vision language models**](https://arxiv.org/abs/2405.13581) | Arxiv 2024 | 2024/05/22 | None | Generator Level | I+T→T | 297 | |[**Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting**](https://arxiv.org/abs/2403.09513) | ECCV 2024 | 2024/05/14 | [Github](https://github.com/SaFoLab-WISC/AdaShield) | Input Level | I+T→T | 298 | |[**Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation**](https://arxiv.org/abs/2403.09572) | ECCV 2024 | 2024/03/14 | [Github](https://github.com/gyhdog99/ECSO) | Output Level | I+T→T | 299 | |[**Safety fine-tuning at (almost) no cost: A baseline for vision large language models**](https://arxiv.org/abs/2402.02207) | ICML 2024 | 2024/02/03 | [Github](https://github.com/ys-zong/VLGuard) | Generator Level | I+T→T | 300 | |[**Inferaligner: Inference-time alignment for harmlessness through cross-model guidance**](https://arxiv.org/abs/2401.11206) | EMNLP 2024 | 2024/01/20 | [Github](https://github.com/Jihuai-wpy/InferAligner) | Generator Level | I+T→T | 301 | |[**Mllm-protector: Ensuring mllm’s safety without hurting performance**](https://arxiv.org/abs/2401.02906) | EMNLP 2024 | 2024/01/05 | [Github](https://github.com/pipilurj/MLLM-protector) | Output Level | I+T→T | 302 | |[**Jailguard: A universal detection framework for llm prompt-based attacks**](https://arxiv.org/abs/2312.10766) | Arxiv 2023 | 2023/12/17 | [Github](https://github.com/shiningrain/JailGuard) | Output Level | I+T→T | 303 | |[**Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions**](https://arxiv.org/abs/2309.07875) | ICLR 2024 | 2023/09/14 | [Github](https://github.com/vinid/safety-tuned-llamas) | Generator Level | I+T→T | 304 | 305 | 306 | 307 | ## Jailbreak Defense of Any-to-Vision Models 308 | | Title | Venue | Date | Code | Taxonomy | Multimodal Model | 309 | |:--------|:--------:|:--------:|:--------:|:--------:|:--------:| 310 | |[**Safe-Control: A Safety Patch for Mitigating Unsafe Content in Text-to-Image Generation Models**](https://arxiv.org/abs/2508.21099) | Arxiv 2025 | 2025/08/28 | None | --- | T→I | 311 | |[**Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models**](https://arxiv.org/abs/2508.03006) | Arxiv 2025 | 2025/08/05 | None | --- | T→I | 312 | |[**PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation**](https://arxiv.org/abs/2508.01272) | Arxiv 2025 | 2025/08/02 | None | --- | T→I | 313 | |[**Wukong Framework for Not Safe For Work Detection in Text-to-Image systems**](https://arxiv.org/abs/2508.00591) | Arxiv 2025 | 2025/08/01 | None | --- | T→I | 314 | |[**NSFW-Classifier Guided Prompt Sanitization for Safe Text-to-Image Generation**](https://arxiv.org/abs/2506.18325) | Arxiv 2025 | 2025/07/23 | None | --- | T→I | 315 | |[**T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models**](https://arxiv.org/abs/2504.15512) | Arxiv 2025 | 2025/04/22 | None | --- | T→V | 316 | |[**Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization**](https://arxiv.org/abs/2504.14290) | Arxiv 2025 | 2025/04/19 | None | --- | T→I | 317 | |**I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models** | CVPR 2025 | --- | None | --- | T→V | 318 | |[**Hyperbolic Safety-Aware Vision-Language Models**](https://arxiv.org/abs/2503.12127) | CVPR 2025 | 2025/03/15 | [Github](https://github.com/aimagelab/HySAC) | --- | T→I | 319 | |[**Distorting Embedding Space for Safety: A Defense Mechanism for Adversarially Robust Diffusion Models**](https://arxiv.org/abs/2503.07389) | Arxiv 2025 | 2025/03/10 | [Github](https://github.com/ddgoodgood/TRCE) | --- | T→I | 320 | |[**SafeText: Safe Text-to-image Models via Aligning the Text Encoder**](https://arxiv.org/abs/2502.20623) | Arxiv 2025 | 2025/02/28 | None | --- | T→I | 321 | |[**Comprehensive Assessment and Analysis for NSFW Content Erasure in Text-to-Image Diffusion Models**](https://arxiv.org/abs/2502.12527) | Arxiv 2025 | 2025/02/18 | None | --- | T→I | 322 | |[**A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models**](https://arxiv.org/abs/2502.14896) | Arxiv 2025 | 2025/02/17 | None | --- | T→I | 323 | |[**Training-Free Safe Denoisers for Safe Use of Diffusion Models**](https://arxiv.org/abs/2502.08011) | Arxiv 2025 | 2025/02/11 | None | --- | T→I | 324 | |[**Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images**](https://arxiv.org/abs/2502.05066) | Arxiv 2025 | 2025/02/07 | None | --- | T→I | 325 | |[**Distorting Embedding Space for Safety: A Defense Mechanism for Adversarially Robust Diffusion Models**](https://arxiv.org/abs/2501.18877) | Arxiv 2025 | 2025/01/30 | [Github](https://github.com/aei13/DES) | --- | T→I | 326 | |[**CE-SDWV: Effective and Efficient Concept Erasure for Text-to-Image Diffusion Models via a Semantic-Driven Word Vocabulary**](https://arxiv.org/abs/2501.15562) | Arxiv 2025 | 2025/01/26 | None | --- | T→I | 327 | |[**CROPS: Model-Agnostic Training-Free Framework for Safe Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2501.05359) | Arxiv 2025 | 2025/01/09 | None | --- | T→I | 328 | |[**PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models**](https://arxiv.org/abs/2501.03544) | Arxiv 2025 | 2025/01/07 | [Homepage](https://prompt-guard.github.io/) | --- | T→I | 329 | |[**DuMo: Dual Encoder Modulation Network for Precise Concept Erasure**](https://arxiv.org/abs/2501.01125) | AAAI 2025 | 2025/01/02 | [Github](https://github.com/Maplebb/DuMo) | --- | T→I | 330 | |[**AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models**](https://arxiv.org/abs/2412.18123) | Arxiv 2024 | 2024/12/24 | None | --- | T→I | 331 | |[**SafeCFG: Redirecting Harmful Classifier-Free Guidance for Safe Generation**](https://arxiv.org/abs/2412.16039) | Arxiv 2024 | 2024/12/20 | None | --- | T→I | 332 | |[**SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation**](https://arxiv.org/abs/2412.10493) | ICCV 2025 | 2024/12/13 | [Github](https://github.com/Visualignment/SafetyDPO) | --- | T→I | 333 | |[**TraSCE: Trajectory Steering for Concept Erasure**](https://arxiv.org/abs/2412.07658) | Arxiv 2024 | 2024/12/10 | [Github](https://github.com/anubhav1997/TraSCE/) | --- | T→I | 334 | |[**Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation**](https://arxiv.org/abs/2412.07249) | Arxiv 2024 | 2024/12/10 | None | --- | T→I | 335 | |[**Safeguarding Text-to-Image Generation via Inference-Time Prompt-Noise Optimization**](https://arxiv.org/abs/2412.03876) | Arxiv 2024 | 2024/12/05 | [Github](https://github.com/JonP07/Diffusion-PNO) | --- | T→I | 336 | |[**Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models**](https://arxiv.org/abs/2412.00357) | Arxiv 2024 | 2024/11/30 | None | --- | T→I | 337 | |[**Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction**](https://arxiv.org/abs/2411.13982) | Arxiv 2024 | 2024/11/21 | None | --- | T→I | 338 | |[**Safe Text-to-Image Generation:Simply Sanitize the Prompt Embedding**](https://arxiv.org/abs/2411.10329) | Arxiv 2024 | 2024/11/15 | None | Encoder Level | T→I | 339 | |[**Safree: Training-free and adaptive guard for safe text-to-image and video generation**](https://arxiv.org/abs/2410.12761) | ICLR 2025 | 2024/10/16 | [Github](https://github.com/jaehong31/SAFREE) | Generator Level | T→I/T→V | 340 | |[**Shielddiff: Suppressing sexual content generation from diffusion models through reinforcement learning**](https://arxiv.org/abs/2410.05309) | Arxiv 2024 | 2024/10/04 | [Github](https://github.com/DongHan9722/ShieldDiff) | Generator Level | T→I | 341 | |[**Dark miner: Defend against unsafe generation for text-to-image diffusion models**](https://arxiv.org/html/2409.17682) | Arxiv 2024 | 2024/09/26 | None | Generator Level | T→I | 342 | |[**Score forgetting distillation: A swift, data-free method for machine unlearning in diffusion models**](https://arxiv.org/abs/2409.11219) | ICLR 2025 | 2024/09/17 | None | Generator Level | T→I | 343 | |[**EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts**](https://arxiv.org/abs/2408.01014) | Arxiv 2024 | 2024/08/02 | None | Generator Level | T→I | 344 | |[**Direct Unlearning Optimization for Robust and Safe Text-to-Image Models**](https://arxiv.org/abs/2407.21035) | NeurIPS 2024 | 2024/07/17 | [Github](https://github.com/naver-ai/DUO) | Generator Level | T→I | 345 | |[**Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models**](https://arxiv.org/abs/2407.12383) | ECCV 2024 | 2024/07/17 | [Github](https://github.com/CharlesGong12/RECE) | Generator Level | T→I | 346 | |[**Conceptprune: Concept editing in diffusion models via skilled neuron pruning**](https://arxiv.org/abs/2405.19237) | Arxiv 2024 | 2024/05/29 | [Github](https://github.com/ruchikachavhan/concept-prune) | Generator Level | T→I | 347 | |[**Pruning for Robust Concept Erasing in Diffusion Models**](https://arxiv.org/abs/2405.16534) | NeurIPS SafeGenAi Workshop 2024 | 2024/05/26 | None | Generator Level | T→I | 348 | |[**Defensive unlearning with adversarial training for robust concept erasure in diffusion models**](https://arxiv.org/abs/2405.15234) | NeurIPS 2024 | 2024/05/24 | [Github](https://github.com/OPTML-Group/AdvUnlearn) | Encoder Level | T→I | 349 | |[**Unlearning concepts in diffusion model via concept domain correction and concept preserving gradient**](https://arxiv.org/abs/2405.15304) | AAAI 2025 | 2024/05/24 | [Github](https://github.com/yongliang-wu/DoCo) | Generator Level | T→I | 350 | |[**Espresso: Robust Concept Filtering in Text-to-Image Models**](https://arxiv.org/abs/2404.19227) | CODASPY 2025 | 2024/04/30 | None | Output Level | T→I | 351 | |[**Latent Guard: a Safety Framework for Text-to-image Generation**](https://arxiv.org/abs/2404.08031) | ECCV 2024 | 2024/04/11 | [Github](https://github.com/rt219/LatentGuard) | Encoder Level | T→I | 352 | |[**SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models**](https://arxiv.org/abs/2404.06666) | ACM CCS 2024 | 2024/04/10 | [Github](https://github.com/LetterLiGo/SafeGen_CCS2024) | Generator Level | T→I | 353 | |[**Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation**](https://arxiv.org/abs/2310.12508) | ICLR 2024 | 2024/04/04 | [Github](https://github.com/OPTML-Group/Unlearn-Saliency) | Generator Level | T→I | 354 | |[**GuardT→I: Defending Text-to-Image Models from Adversarial Prompts**](https://arxiv.org/abs/2403.01446) | NeurIPS 2024 | 2024/03/03 | None | Encoder Level | T→I | 355 | |[**Universal prompt optimizer for safe text-to-image generation**](https://arxiv.org/abs/2402.10882) | NAACL 2024 | 2024/02/16 | [Github](https://github.com/Wu-Zongyu/POSI) | Input Level | T→I | 356 | |[**Erasediff: Erasing data influence in diffusion models**](https://arxiv.org/abs/2401.05779) | Arxiv 2024 | 2024/01/11 | [Github](https://github.com/JingWu321/EraseDiff) | Generator Level | T→I | 357 | |[**Localization and manipulation of immoral visual cues for safe text-to-image generation**](https://openaccess.thecvf.com/content/WACV2024/papers/Park_Localization_and_Manipulation_of_Immoral_Visual_Cues_for_Safe_Text-to-Image_WACV_2024_paper.pdf) | WACV 2024 | 2024/01/01 | None | Output Level | T→I | 358 | |[**Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers**](https://arxiv.org/abs/2311.17717) | ECCV 2024 | 2023/11/29 | [Github](https://github.com/jasper0314-huang/Receler) | Generator Level | T→I | 359 | |[**Self-discovering interpretable diffusion latent directions for responsible text-to-image generation**](https://arxiv.org/abs/2311.17216) | CVPR 2024 | 2023/11/28 | [Github](https://github.com/hangligit/InterpretDiffusion) | Encoder Level | T→I | 360 | |[**Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models**](https://arxiv.org/abs/2311.16254) | ECCV 2024 | 2023/11/27 | [Github](https://github.com/aimagelab/safe-clip) | Encoder Level | T→I | 361 | |[**Mace: Mass concept erasure in diffusion models**](https://arxiv.org/abs/2403.06135) | CVPR 2024 | 2023/10/19 | [Github](https://github.com/Shilin-LU/MACE) | Generator Level | T→I | 362 | |[**Implicit concept removal of diffusion models**](https://arxiv.org/abs/2310.05873) | ECCV 2024 | 2023/10/09 | None | Input Level | T→I | 363 | |[**Unified concept editing in diffusion models**](https://arxiv.org/abs/2308.14761) | WACV 2024 | 2023/08/25 | [Github](https://github.com/rohitgandikota/unified-concept-editing) | Generator Level | T→I | 364 | |[**Towards safe self-distillation of internet-scale text-to-image diffusion models**](https://arxiv.org/abs/2307.05977) | ICML 2023 Workshop on Challenges in Deployable Generative AI | 2023/07/12 | [Github](https://github.com/nannullna/safe-diffusion) | Generator Level | T→I | 365 | |[**Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models**](https://arxiv.org/abs/2303.17591) | CVPR Workshop 2024 | 2023/05/30 | [Github](https://github.com/SHI-Labs/Forget-Me-Not) | Generator Level | T→I | 366 | |[**Erasing concepts from diffusion models**](https://arxiv.org/abs/2303.07345) | ICCV 2023 | 2023/05/13 | [Github](https://github.com/rohitgandikota/erasing) | Generator Level | T→I | 367 | |[**Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models**](https://arxiv.org/abs/2211.05105) | CVPR 2023 | 2022/11/09 | [Github](https://github.com/ml-research/safe-latent-diffusion) | Generator Level | T→I | 368 | 369 | 370 | ## Jailbreak Defense of Any-to-Any Models 371 | | Title | Venue | Date | Code | Taxonomy | Multimodal Model | 372 | |:--------|:--------:|:--------:|:--------:|:--------:|:--------:| 373 | 374 | 375 | ## 💯Evaluation 376 | 377 | ### ⭐️Evaluation Datasets 378 | **Below is a comparison table of publicly available representative evaluation datasets and a description of each attribute in the table.** 379 | - Collected: raw data created by humans or collected from real-world websites.
380 | - Reconstructed: Data reorganized from other existing datasets.
381 | - Synthesized: AI-generated data using LLM or diffusion models.
382 | - Adversarial: Adversarial data generated by jailbreak attack methods.
383 | #### Used to Any-to-Text Models 384 | | Dataset | Text Source | Image Source | Volume | Theme | Access | 385 | |:--------|:--------:|:--------:|:--------:|:--------:|:--------:| 386 | |**Figstep** | Synthesized | Adversarial | 500 | 10 | [Github](https://github.com/ThuCCSLab/FigStep) | 387 | |**AdvBench** | Synthesized | --- | 500 | --- | [Github](https://github.com/llm-attacks/llm-attacks) | 388 | |**ReadTeam-2K** | Collected & Reconstructed & Synthesized | N/A | 2000 | 16 | [Huggingface](https://huggingface.co/datasets/JailbreakV-28K/JailBreakV-28k) | 389 | |**HarmBench** | Collected | --- | 510 | 4 | [Github](https://github.com/centerforaisafety/HarmBench) | 390 | |**HADES** | Synthesized | Collected & Synthesized & Adversarial | 750 | 5 | [Github](https://github.com/AoiDragon/HADES) | 391 | |**MM-SafetyBench** | Synthesized | Synthesized & Adversarial | 5040 | 13 | [Github](https://github.com/isXinLiu/MM-SafetyBench) | 392 | |**JailBreakV-28K** | Adversarial | Reconstructed & Synthesized | 28000 | 16 | [Huggingface](https://huggingface.co/datasets/JailbreakV-28K/JailBreakV-28k) | 393 | 394 | #### Used to Any-to-Vision Models 395 | | Dataset | Text Source | Image Source | Volume | Access | Theme | 396 | |:--------|:--------:|:--------:|:--------:|:--------:|:--------:| 397 | |**NSFW-200** | Synthesized | --- | 200 | --- | [Github](https://github.com/Yuchen413/texT→Imagesafety) | 398 | |**MMA** | Reconstructed & Adversarial | Adversarial | 1000 | --- | [Huggingface](https://huggingface.co/datasets/YijunYang280/MMA-Diffusion-NSFW-adv-prompts-benchmark) | 399 | |**VBCDE** | Reconstructed & Adversarial | --- | 100 | 5 | [Github](https://github.com/researchcode001/Divide-and-Conquer-Attack) | 400 | |**I2P** | Collected | Collected | 4703 | 7 | [Huggingface](https://huggingface.co/datasets/AIML-TUDA/i2p) | 401 | |**Unsafe Diffusion** | Collected & Reconstructed | --- | 1434 | --- | [Github](https://github.com/YitingQu/unsafe-diffusion) | 402 | |**MACE-Celebrity** | Collected | --- | 1000 | --- | [Github](https://github.com/Shilin-LU/MACE) | 403 | |**MACE-Art** | Reconstructed | --- | 1000 | --- | [Github](https://github.com/Shilin-LU/MACE) | 404 | |**MPUP** | Synthesized | --- | 1200 | 4 | [Huggingface](https://huggingface.co/datasets/tongliuphysics/multimodalpragmatic) | 405 | |**T2VSafetyBench** | Reconstructed & Synthesized & Adversarial | --- | 4400 | 12 | [Github](https://github.com/yibo-miao/T2VSafetyBench/tree/main) | 406 | 407 | 408 | ### 📚Evaluation Methods 409 | **Current evaluation methods are primarily classified into two categories: manual evaluation and automated evaluation.** 410 | - Manual evaluation involves human assessment to determine if the content is toxic, offering a direct and interpretable method of evaluation. 411 | - Automated approaches assess the safety of multimodal generative models by employing a range of techniques, including detector-based, GPT-based, and rule-based methods. 412 | 413 | jailbreak_evaluation 414 | 415 | 416 | 417 | #### Text Detector 418 | | Toxicity detector | Access | 419 | |:--------|:--------:| 420 | |**LLama-Guard** | [Huggingface](https://huggingface.co/meta-llama) | 421 | |**LLama-Guard2** | [Huggingface](https://huggingface.co/meta-llama) | 422 | |**Detoxify** | [Github](https://github.com/unitaryai/detoxify) | 423 | |**GPTFUZZER** | [Huggingface](https://huggingface.co/hubert233/GPTFuzz/tree/main) | 424 | |**Perspective API** | [Website](https://perspectiveapi.com/) | 425 | 426 | #### Image Detector 427 | | Toxicity detector | Access | 428 | |:--------|:--------:| 429 | |**NudeNet** | [Github](https://github.com/platelminto/NudeNetClassifier) | 430 | |**Q16** | [Github](https://github.com/ml-research/Q16) | 431 | |**Safety Checker** | [Huggingface](https://huggingface.co/CompVis/stable-diffusion-safety-checker) | 432 | |**Imgcensor** | [Github](https://github.com/lucasxlu/XCloud/tree/master/research/imgcensor) | 433 | |**Multi-headed Safety Classifier** | [Github](https://github.com/YitingQu/unsafe-diffusion) | 434 | 435 | ## 😉Citation 436 | 437 | If you find this work useful in your research, Please kindly cite using the following BibTex: 438 | ```bib 439 | @article{liu2024jailbreak, 440 | title={Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey}, 441 | author={Liu, Xuannan and Cui, Xing and Li, Peipei and Li, Zekun and Huang, Huaibo and Xia, Shuhan and Zhang, Miaoxuan and Zou, Yueying and He, Ran}, 442 | journal={arXiv preprint arXiv:2411.09259}, 443 | year={2024}, 444 | } 445 | ``` 446 | 447 | 448 | --------------------------------------------------------------------------------