├── 1 ├── .gitignore ├── .vscode └── settings.json ├── attack_processing.png ├── test.py └── README.md /1: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ._* 2 | DP-OPT/* -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "editor.renderFinalNewline": "on" 3 | } -------------------------------------------------------------------------------- /attack_processing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Chen-X666/privacy-preserving-prompt/HEAD/attack_processing.png -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | from transformers import AutoTokenizer, AutoModel 2 | import torch 3 | 4 | tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") 5 | model = AutoModel.from_pretrained("bert-base-uncased") 6 | 7 | def get_similar_words(word): 8 | # 拆分输入文本并将其编码 9 | input_ids = tokenizer.encode(word, return_tensors='pt') 10 | with torch.no_grad(): 11 | embeddings = model(input_ids)[0] 12 | # 计算标记之间的余弦相似度 13 | cosine_similarities = torch.nn.functional.cosine_similarity(embeddings, embeddings) 14 | # 获取相似度最高的标记 15 | similar_word_idx = cosine_similarities.argsort().numpy()[0][-2] 16 | similar_word = tokenizer.decode([similar_word_idx]) 17 | return similar_word -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Privacy-Preserving Prompt Tuning for Large Language Model 2 | 3 | | Symbol | 🌟 | ⬜️ | ⬛️ | 4 | | --- | --- | --- | --- | 5 | | Description | Inspiration | White-box method | Black-box method | 6 | 7 | ## Attacker Methodology by Stages 8 | 9 |
10 | 11 | 12 | ### Prompt Injection Attacks (PIA) 13 | | Paper | Year | Source | Attack Prompt Type | Tasks | 14 | |-------|------|-----------------------|----------------------|-----------| 15 | | ⬛️Effective Prompt Extraction from Language Models | 2024.02 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2307.06865) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://anonymous.4open.science/r/prompt-extraction-83C1) | Instruction Prompt | Information Extration | 16 | | ⬛️Prompt Stealing Attacks Against Large Language Models | 2024.02 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2402.12959) | Role-Based Prompt, Direct Prompt, In-Context Prompt | Q&A | 17 | | 🌟⬛️TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models | (NeurIPS, 2023) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2306.06815) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/UCF-ML-Research/TrojLLM) | Instruction prompt | Classification | 18 | | 🌟⬛️Ignore Previous Prompt : Attack Techniques For Language Models | (NeurIPS, 2022) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2211.09527) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/agencyenterprise/PromptInject) | Instruction prompt, In-context learning | 19 | | 🌟⬜️BadPrompt: Backdoor Attacks on Continuous Prompts | (NeurIPS, 2022) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2211.14719) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/papersPapers/BadPrompt) | Instruction prompt| 20 | | ⬜️PromptAttack: Prompt-based Attack for Language Models via Gradient Search | (NLPCC, 2022) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2209.01882) | Instruction prompt | 21 | 22 | 23 | ### Membership Inference Attacks (MIA) 24 | | Paper | Year | Source | 25 | |-------|------|-----------------| 26 | | 🌟⬛️Do Membership Inference AttacksWork on Large Language Models? | 2024.02 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2402.07841) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](http://github.com/iamgroot42/mimir) | 27 | | ⬜️Language Model Inversion | 2023.11 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2311.13647) | 28 | | ⬛️Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks | 2023.10 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](2310.13291) | 29 | | ⬛️Beyond Memorization: Violating Privacy Via Inference with Large Language Models | 2023.10 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2310.07298) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](http://github.com/iamgroot42/mimir) | 30 | | ⬜️Extracting Training Data from Large Language Models | (USENIX Security, 2021) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://www.usenix.org/system/files/sec21-carlini-extracting.pdf) | 31 | 32 | ## Protector Methodology 33 | 34 | ### Differential Privacy (DP) 35 | | Paper | Year | Source | Tasks | Defense | 36 | |-------|------|------------------|---------|---------| 37 | | 🌟⬛️Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation | (ICLR, 2024) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2309.11765) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/microsoft/dp-few-shot-generation) | Classification, Information Extraction | 38 | | 🌟⬛️DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer | (ICLR, 2024) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2312.03724) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/VITA-Group/DP-OPT) | Sentiment Classification | 39 | | 🌟⬛️Privacy-Preserving In-Context Learning For Large Language Models | (ICLR, 2024) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2305.01639) | Classification, Document Q&A, Dialog Summarization | 40 | | ⬛️On the Privacy Risk of In-context Learning | (TrustNLP, 2024)) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://trustnlpworkshop.github.io/papers/13.pdf) | Classification, Generation | MIA 41 | | ⬛️A Customized Text Sanitization Mechanism with Differential Privacy | (ACL, 2023) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://aclanthology.org/2023.findings-acl.355) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/sai4july/CusText) | Classification, Generation | 42 | | 🌟⬛️⬜️Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models | (NeurIPS, 2023) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2305.15594) | Classification | 43 | | 🌟⬛️Locally Differentially Private Document Generation Using Zero Shot Prompting | (EMNLP, 2023) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://aclanthology.org/2023.findings-emnlp.566) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/SaitejaUtpala/dp_prompt) | Text Classification | 44 | | ⬜️DP-forward: Fine-tuning and inference on language models with differential privacy in forward pass | (SIGSAC, 2023) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2309.06746) | Classification | 45 | | ⬛️InferDPT: Privacy-preserving Inference for Black-box Large Language Models | 2023.12 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2310.12214) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/mengtong0110/InferDPT) | Classification, Generation | 46 | | ⬜️Privacy-Preserving Prompt Tuning for Large Language Model Services | 2023.05 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2305.06212) | Sentiment Classification, Document Q&A | 47 | | ⬛️Differential Privacy for Text Analytics via Natural Text Sanitization | (ACL-IJCNLP, 2021) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://aclanthology.org/2021.findings-acl.337) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/xiangyue9607/SanText) | Classification | 48 | 49 | 50 | 51 | ### Secure Multi-Party Computing (SMPC) 52 | | Paper | Year | Source | Tasks | Defence | 53 | |-------|------|--------------|------------|---------| 54 | | ⬜️Ciphergpt: Secure two-party gpt inference | (Crypto, 2024) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://eprint.iacr.org/2023/1147) | Classification | 55 | | ⬜️SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models | 2024.01 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2401.00793) | Classification, Semantic Similarity, Linguistic Acceptability | Model Inside | 56 | | ⬜️LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers | 2023.05 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2305.18396) | Classification | Model Inside | 57 | 58 | 59 | 60 | ### Anonymization Techniques 61 | | Paper | Year | Source | Tasks | Keywords | 62 | |-------|------|-------------|----------|----------| 63 | | ⬛️SEmojiCrypt: Prompt Encryption for Secure Communication with Large Language Models | 2024.02 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2402.05868) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/agiresearch/EmojiCrypt) | Classification | Emoji | 64 | | ⬛️ProPILE: Probing Privacy Leakage in Large Language Models | (NeurIPS, 2023) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2307.01881) | PII | 65 | | ⬛️Recovering from Privacy-Preserving Masking with Large Language Models | 2023.12 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2309.08628) | |[MASK] 66 | | ⬛️Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection | 2023.09 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2309.03057) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/alohachen/Hide-and-Seek) | PII | 67 | 68 | 69 | ### Other Methods 70 | | Paper | Year | Source | Tasks | Method Keyword | 71 | |-------|------|-----------------------|-------|----------------| 72 | | ⬜️Privatelora For Efficient Privacy Preserving LLM | (CoRR, 2023) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2311.14030) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/alipay/private_llm) | | LoRA 73 | | ⬜️TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations | (ACL, 2023) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://aclanthology.org/2023.findings-acl.337) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/xzhou20/TextObfuscator) | Classification | 74 | 75 | ### Related Survey 76 | | Paper | Year | Source | 77 | |-------|------|-----------------------| 78 | On Protecting the Data Privacy of Large Language Models (LLMs): A Survey | 2024.03 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2403.05156)| 79 | Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory | (ICLR, 2024) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs//2310.17884)| 80 | A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly | 2023.12 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2312.02003) 81 | Privacy in Large Language Models: Attacks, Defenses and Future Directions | 2023.10 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2310.10383) 82 | Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | 2023.05 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2302.12173) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/greshake/llm-security) | 83 | Privacy-Preserving Large Language Models (PPLLMs) | 2023.01 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://www.researchgate.net/publication/372607103_Privacy-Preserving_Large_Language_Models_PPLLMs) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/greshake/llm-security) | 84 | 85 | ### Fine-tuning 86 | | Paper | Year | Source | 87 | |-------|------|-----------------------| 88 | | SentinelLMs: Encrypted Input Adaptation and Fine-tuning of Language Models for Private and Secure Inference | (AAAI, 2024) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2312.17342) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/abhijitmishra/sentinellm-aaai2024) 89 | --------------------------------------------------------------------------------