└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # LVLM-Safety 2 | 3 | ## Full Paper on Research Gate: 4 | [Large Vision-Language Model Security: A Survey](https://www.researchgate.net/publication/387444860_Large_Vision-Language_Model_Security_A_Survey) 5 | 6 | ## Collections of papers 7 | - 1: Introduction 8 | - [The title of the paper...](https://arxiv.org/abs/2403.17336) 9 | - 2: LVLM Architecture 10 | - [Natural language processing applied to mental illness detection: a narrative review](https://www.nature.com/articles/s41746-022-00589-7.) 11 | - [Learning transferable visual models from natural language supervision](http://proceedings.mlr.press/v139/radford21a) 12 | - [Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models](https://proceedings.mlr.press/v202/li23q.html) 13 | - [Improved baselines with visual instruction tuning](https://arxiv.org/abs/2310.03744) 14 | - [Visual instruction tuning](https://proceedings.neurips.cc/paper_files/paper/2023/hash/6dcf277ea32ce3288914faf369fe6de0-Abstract-Conference.html) 15 | - [Minigpt-v2: large language model as a unified interface for vision-language multi-task learning](https://arxiv.org/abs/2310.09478) 16 | - [Flamingo: a visual language model for few-shot learning](https://proceedings.neurips.cc/paper_files/paper/2022/hash/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html) 17 | - [Minigpt-4: Enhancing vision-language understanding with advanced large language models](https://arxiv.org/abs/2304.10592) 18 | - 3: Threaten and Attack 19 | - Jailbreak 20 | - [The title of the paper...](https://arxiv.org/abs/2403.17336) 21 | - Attack Include LLMs 22 | - [Are aligned neural networks adversarially aligned?](http://arxiv.org/abs/2306.15447) 23 | - [How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs](http://arxiv.org/abs/2311.16101) 24 | - [Jailbreaking Attack against Multimodal Large Language Model](http://arxiv.org/abs/2402.02309) 25 | - [Visual Adversarial Examples Jailbreak Aligned Large Language Models](http://arxiv.org/abs/2306.13213) 26 | - Attack Exclude LLMs 27 | - [FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts](http://arxiv.org/abs/2311.05608) 28 | - [Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models](http://arxiv.org/abs/2307.14539) 29 | - Backdoor 30 | - Classical Backdoor Attack 31 | - [Badnets: Identifying vulnerabilities in the machine learning model supply chain.](https://arxiv.org/abs/1708.06733) 32 | - [Protecting intellectual property of deep neural networks with watermarking.](https://dl.acm.org/doi/pdf/10.1145/3196494.3196550?casa_token=NFvV7QfCfYwAAAAA:Pt-OhWycRh4T_jvt9nXAIBhDzR57SgqmpKBRSI6VPKshzcGcAKAWSxvFrcmA2XRCvByArKZNdROf) 33 | - [Latent backdoor attacks on deep neural networks.](https://dl.acm.org/doi/pdf/10.1145/3319535.3354209) 34 | - [Input-aware dynamic backdoor attack.](https://proceedings.neurips.cc/paper_files/paper/2020/file/234e691320c0ad5b45ee3c96d0d7b8f8-Paper.pdf) 35 | - [Complex backdoor detection by symmetric feature differencing.](https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Complex_Backdoor_Detection_by_Symmetric_Feature_Differencing_CVPR_2022_paper.pdf) 36 | - [Distribution preserving backdoor attack in self-supervised learning.](https://www.computer.org/csdl/proceedings-article/sp/2024/313000a029/1RjEa5rjsHK) 37 | - [Rethinking the reverse-engineering of trojan triggers.](https://proceedings.neurips.cc/paper_files/paper/2022/file/3f9bf45ea04c98ad7cb857f951f499e2-Paper-Conference.pdf) 38 | - LLMs Backdoor Attack 39 | - [Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection](https://openreview.net/pdf?id=A3y6CdiUP5) 40 | - [Badchain: Backdoor Chain-of-Thought Prompting for Large Language Models](https://arxiv.org/pdf/2401.12242.pdf) 41 | - [Composite Backdoor Attacks Against Large Language Models](https://arxiv.org/pdf/2310.07676.pdf) 42 | - [Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models](https://arxiv.org/abs/2403.17336) 43 | - [Learning to Poison Large Language Models During Instruction Tuning](https://arxiv.org/pdf/2402.13459.pdf) 44 | - [LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario](https://openreview.net/pdf?id=EV46z1RKhz3) 45 | - [PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models](https://arxiv.org/pdf/2402.07867.pdf) 46 | - [Poisoning Language Models During Instruction Tuning](https://proceedings.mlr.press/v202/wan23b/wan23b.pdf) 47 | - [PoisonPrompt: Backdoor Attack on Prompt-Based Large Language Models](https://arxiv.org/html/2310.12439v2) 48 | - [Rapid Adoption, Hidden Risks: The Dual Impact of Large Language Model Customization](https://arxiv.org/pdf/2402.09179.pdf) 49 | - [Syntactic Ghost: An Imperceptible General-purpose Backdoor Attacks on Pre-trained Language Models](https://arxiv.org/pdf/2402.18945.pdf) 50 | - [The Poison of Alignment](https://arxiv.org/pdf/2308.13449.pdf) 51 | - [Universal Jailbreak Backdoors from Poisoned Human Feedback](https://arxiv.org/pdf/2311.14455.pdf) 52 | - [Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning](https://www.researchgate.net/profile/Shuai-Zhao-68/publication/377810700_Universal_Vulnerabilities_in_Large_Language_Models_Backdoor_Attacks_for_In-context_Learning/links/65cf68ae476dd15fb33c7a65/Universal-Vulnerabilities-in-Large-Language-Models-Backdoor-Attacks-for-In-context-Learning.pdf) 53 | - [The impact of reasoning step length on large language models.](https://arxiv.org/pdf/2401.04925.pdf) 54 | - LVLMs Backdoor Attack 55 | - [ImgTrojan: Jailbreaking Vision-Language Models with ONE Image](https://arxiv.org/pdf/2403.02910.pdf) 56 | - [Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models](https://arxiv.org/pdf/2402.06659.pdf) 57 | - [Test-Time Backdoor Attacks on Multimodal Large Language Models](https://arxiv.org/pdf/2402.08577.pdf) 58 | - [VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models 59 | ](https://arxiv.org/pdf/2402.13851.pdf) 60 | 61 | 62 | - Other Attack Methods 63 | - [On Evaluating Adversarial Robustness of Large Vision-Language Models](https://arxiv.org/pdf/2305.16934.pdf) 64 | - [Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks](https://arxiv.org/pdf/2402.00626.pdf) 65 | - [INSTRUCTTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models](https://arxiv.org/pdf/2312.01886.pdf) 66 | - [On the Robustness of Large Multimodal Models Against Image Adversarial Attacks](https://arxiv.org/pdf/2312.03777.pdf) 67 | - [How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs](https://arxiv.org/pdf/2311.16101.pdf) 68 | - [On the Adversarial Robustness of Multi-Modal Foundation Models](https://arxiv.org/pdf/2308.10741.pdf) 69 | 70 | - Privacy-Watermark 71 | - [The science of detecting llm-generated texts](https://arxiv.org/abs/2303.07205) 72 | - [A survey of text watermarking in the era of large language models](https://arxiv.org/abs/2312.07913) 73 | - Training Time Watermark 74 | - [Watermarking text data on large language models for dataset copyright protection](https://arxiv.org/abs/2305.13257) 75 | - [Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning](https://arxiv.org/abs/2402.14883) 76 | - [Did you train on my dataset? towards public dataset protection with cleanlabel backdoor watermarking](https://dl.acm.org/doi/abs/10.1145/3606274.3606279?casa_token=XpqF5nSd3KoAAAAA:kaqTQCvLdNFENQCzJ5JClUC1KMM7e-u9ypJ2zNJajucpuTdyXHxNTecaD098g_i6z4NBqYV8LqHYWg4) 77 | - [Codemark: Imperceptible watermarking for code datasets against neural code completion models](https://dl.acm.org/doi/abs/10.1145/3611643.3616297?casa_token=BRChaVDBIWAAAAAA:0ab9SSFcxWkel3nS0IJ6GlN2jLstk65GV8nzk-yHW5RyKY3wF2V9hQAURB_Plm51LhlePlYDeTQGacs) 78 | - [Resilient Watermarking for LLM-Generated Codes](https://arxiv.org/abs/2402.07518) 79 | - Generation Time Watermark 80 | - [A watermark for large language models](https://proceedings.mlr.press/v202/kirchenbauer23a.html) 81 | - [Who wrote this code? watermarking for code generation](https://arxiv.org/abs/2305.15060) 82 | - [Towards codable text watermarking for large language models](https://arxiv.org/abs/2307.15992) 83 | 84 | 85 | 86 | 87 | - 4: Defense Mechanisms 88 | - Defense for Jailbreak 89 | - Datasets 90 | - [Red Teaming Visual Language Models](http://arxiv.org/abs/2401.12915) 91 | - [Self-Guard: Empower the LLM to Safeguard Itself](http://arxiv.org/abs/2310.15851) 92 | - Defense 93 | - [A Mutation-Based Method for Multi-Modal Jailbreaking Attack Detection](http://arxiv.org/abs/2312.10766) 94 | - [Diffusion Models for Adversarial Purification](http://arxiv.org/abs/2205.07460) 95 | - [LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked](http://arxiv.org/abs/2308.07308) 96 | - Defense for Backdoor 97 | - Security and Reliability 98 | - [Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space](https://arxiv.org/pdf/2402.12026.pdf) 99 | - [LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors](https://arxiv.org/pdf/2308.13904.pdf) 100 | - [The Philosopher’s Stone: Trojaning Plugins of Large Language Models](https://arxiv.org/abs/2312.00374) 101 | - Defense 102 | - [Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning](https://arxiv.org/pdf/2402.12168.pdf) 103 | - [Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations](https://arxiv.org/pdf/2311.09763.pdf) 104 | - [Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections](https://arxiv.org/pdf/2312.00027.pdf) 105 | - Defense for Other Attack Methods 106 | - [The title of the paper...](https://arxiv.org/abs/2403.17336) 107 | - 5: Application Risk and Solution 108 | - Hallucination-Corresponding Solution 109 | - [The title of the paper...](https://arxiv.org/abs/2403.17336) 110 | - Misinformation 111 | - [The title of the paper...](https://arxiv.org/abs/2403.17336) 112 | - Privacy 113 | - [A survey on large language model security and privacy: The good, the bad, and the ugly](https://www.sciencedirect.com/science/article/pii/S266729522400014X) 114 | - Membership Inference Attack 115 | - [Revisiting membership inference under realistic assumptions](https://arxiv.org/abs/2005.10881) 116 | - [Label-only membership inference attacks](https://proceedings.mlr.press/v139/choquette-choo21a.html) 117 | - [Membership inference attacks from first principles](https://ieeexplore.ieee.org/abstract/document/9833649) 118 | - PII Attack 119 | - [Are large pre-trained language models leaking your personal information?](https://arxiv.org/abs/2205.12628) 120 | - [Propile: Probing privacy leakage in large language models](https://proceedings.neurips.cc/paper_files/paper/2023/hash/420678bb4c8251ab30e765bc27c3b047-Abstract-Conference.html) 121 | - [Analyzing information leakage of updates to natural language models](https://dl.acm.org/doi/abs/10.1145/3372297.3417880?casa_token=Q707t7Q5ji0AAAAA:QV3IjXpYsBdoKowpoy7sEQ1C1vClzTn33EHHwaB0ZefMMgdWu1B327asJnDNDdoUtrkiPcz-HDno9qU) 122 | 123 | 124 | - 6: Limitations of Existing Works and Future Research Directions 125 | - [The title of the paper...](https://arxiv.org/abs/2403.17336) 126 | 127 | --------------------------------------------------------------------------------