├── 1
├── .gitignore
├── .vscode
    └── settings.json
├── attack_processing.png
├── test.py
└── README.md


/1:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | ._*
2 | DP-OPT/*


--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 |     "editor.renderFinalNewline": "on"
3 | }


--------------------------------------------------------------------------------
/attack_processing.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Chen-X666/privacy-preserving-prompt/HEAD/attack_processing.png


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
 1 | from transformers import AutoTokenizer, AutoModel
 2 | import torch
 3 | 
 4 | tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
 5 | model = AutoModel.from_pretrained("bert-base-uncased")
 6 | 
 7 | def get_similar_words(word):
 8 |     # 拆分输入文本并将其编码
 9 |     input_ids = tokenizer.encode(word, return_tensors='pt')
10 |     with torch.no_grad():
11 |         embeddings = model(input_ids)[0]
12 |     # 计算标记之间的余弦相似度
13 |     cosine_similarities = torch.nn.functional.cosine_similarity(embeddings, embeddings)
14 |     # 获取相似度最高的标记
15 |     similar_word_idx = cosine_similarities.argsort().numpy()[0][-2]
16 |     similar_word = tokenizer.decode([similar_word_idx])
17 |     return similar_word


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Privacy-Preserving Prompt Tuning for Large Language Model
 2 | 
 3 | | Symbol | 🌟 | ⬜️ | ⬛️ |
 4 | | --- | --- | --- | --- |
 5 | | Description | Inspiration | White-box method | Black-box method |
 6 | 
 7 | ## Attacker Methodology by Stages
 8 | 
 9 | <div align="center"><img src="https://github.com/Chen-X666/privacy-preserving-prompt/blob/main/attack_processing.png" height="600px"/></div>
10 | 
11 | 
12 | ### Prompt Injection Attacks (PIA)
13 | | Paper | Year |   Source              |  Attack Prompt Type  |   Tasks   |
14 | |-------|------|-----------------------|----------------------|-----------|
15 | | ⬛️Effective Prompt Extraction from Language Models  |  2024.02   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2307.06865) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://anonymous.4open.science/r/prompt-extraction-83C1)  | Instruction Prompt | Information Extration | 
16 | | ⬛️Prompt Stealing Attacks Against Large Language Models  |  2024.02   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2402.12959)  | Role-Based Prompt, Direct Prompt, In-Context Prompt | Q&A |
17 | | 🌟⬛️TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models  |  (NeurIPS, 2023)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2306.06815) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/UCF-ML-Research/TrojLLM)  | Instruction prompt | Classification | 
18 | | 🌟⬛️Ignore Previous Prompt : Attack Techniques For Language Models  |  (NeurIPS, 2022)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2211.09527) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/agencyenterprise/PromptInject)  | Instruction prompt, In-context learning |
19 | | 🌟⬜️BadPrompt: Backdoor Attacks on Continuous Prompts  |  (NeurIPS, 2022)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2211.14719) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/papersPapers/BadPrompt)  | Instruction prompt|
20 | | ⬜️PromptAttack: Prompt-based Attack for Language Models via Gradient Search  |  (NLPCC, 2022)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2209.01882)  | Instruction prompt |
21 | 
22 | 
23 | ### Membership Inference Attacks (MIA)
24 | | Paper | Year |  Source         |
25 | |-------|------|-----------------|
26 | | 🌟⬛️Do Membership Inference AttacksWork on Large Language Models?  |  2024.02   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2402.07841) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](http://github.com/iamgroot42/mimir)  |
27 | | ⬜️Language Model Inversion  |  2023.11   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2311.13647)  | 
28 | | ⬛️Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks  |  2023.10   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](2310.13291)  |
29 | | ⬛️Beyond Memorization: Violating Privacy Via Inference with Large Language Models  |  2023.10   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2310.07298) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](http://github.com/iamgroot42/mimir)  |
30 | | ⬜️Extracting Training Data from Large Language Models  |  (USENIX Security, 2021)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://www.usenix.org/system/files/sec21-carlini-extracting.pdf)  | 
31 | 
32 | ## Protector Methodology
33 | 
34 | ### Differential Privacy (DP)
35 | | Paper | Year |  Source          |  Tasks  | Defense |
36 | |-------|------|------------------|---------|---------|
37 | | 🌟⬛️Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation  |  (ICLR, 2024)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2309.11765) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/microsoft/dp-few-shot-generation)  | Classification, Information Extraction | 
38 | | 🌟⬛️DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer  |  (ICLR, 2024)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2312.03724) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/VITA-Group/DP-OPT)  | Sentiment Classification |
39 | | 🌟⬛️Privacy-Preserving In-Context Learning For Large Language Models  |  (ICLR, 2024)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2305.01639)  |  Classification, Document Q&A, Dialog Summarization |
40 | | ⬛️On the Privacy Risk of In-context Learning  |  (TrustNLP, 2024))   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://trustnlpworkshop.github.io/papers/13.pdf)  |  Classification, Generation |  MIA 
41 | | ⬛️A Customized Text Sanitization Mechanism with Differential Privacy  |  (ACL, 2023)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://aclanthology.org/2023.findings-acl.355) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/sai4july/CusText)   |  Classification, Generation |
42 | | 🌟⬛️⬜️Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models  | (NeurIPS, 2023)  | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2305.15594)  | Classification |
43 | | 🌟⬛️Locally Differentially Private Document Generation Using Zero Shot Prompting  |  (EMNLP, 2023)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://aclanthology.org/2023.findings-emnlp.566) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/SaitejaUtpala/dp_prompt)  | Text Classification | 
44 | | ⬜️DP-forward: Fine-tuning and inference on language models with differential privacy in forward pass  |  (SIGSAC, 2023)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2309.06746)  |  Classification |
45 | | ⬛️InferDPT: Privacy-preserving Inference for Black-box Large Language Models  |  2023.12   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2310.12214) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/mengtong0110/InferDPT) |  Classification, Generation |
46 | | ⬜️Privacy-Preserving Prompt Tuning for Large Language Model Services  | 2023.05  | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2305.06212)  | Sentiment Classification, Document Q&A |
47 | | ⬛️Differential Privacy for Text Analytics via Natural Text Sanitization  |  (ACL-IJCNLP, 2021)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://aclanthology.org/2021.findings-acl.337) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/xiangyue9607/SanText) |  Classification |
48 | 
49 | 
50 | 
51 | ### Secure Multi-Party Computing (SMPC)
52 | | Paper | Year |   Source     |    Tasks   | Defence |
53 | |-------|------|--------------|------------|---------|
54 | | ⬜️Ciphergpt: Secure two-party gpt inference  |  (Crypto, 2024)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://eprint.iacr.org/2023/1147)  |  Classification |
55 | | ⬜️SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models  |  2024.01   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2401.00793)  | Classification, Semantic Similarity, Linguistic Acceptability | Model Inside |
56 | | ⬜️LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers  |  2023.05   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2305.18396)  | Classification | Model Inside |
57 | 
58 | 
59 | 
60 | ### Anonymization Techniques
61 | | Paper | Year |  Source     |  Tasks   | Keywords | 
62 | |-------|------|-------------|----------|----------|
63 | | ⬛️SEmojiCrypt: Prompt Encryption for Secure Communication with Large Language Models  |  2024.02   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2402.05868) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/agiresearch/EmojiCrypt)  |  Classification | Emoji |
64 | | ⬛️ProPILE: Probing Privacy Leakage in Large Language Models  |  (NeurIPS, 2023)  | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2307.01881) | PII | 
65 | | ⬛️Recovering from Privacy-Preserving Masking with Large Language Models  |  2023.12   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2309.08628) |  |[MASK] 
66 | | ⬛️Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection  |  2023.09   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2309.03057)  [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/alohachen/Hide-and-Seek) | PII | 
67 | 
68 | 
69 | ### Other Methods
70 | | Paper | Year |  Source               | Tasks | Method Keyword |
71 | |-------|------|-----------------------|-------|----------------|
72 | | ⬜️Privatelora For Efficient Privacy Preserving LLM  |  (CoRR, 2023)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2311.14030) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/alipay/private_llm)  | | LoRA  
73 | | ⬜️TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations  |  (ACL, 2023)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://aclanthology.org/2023.findings-acl.337) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/xzhou20/TextObfuscator) |  Classification |
74 | 
75 | ### Related Survey
76 | | Paper | Year |  Source               |
77 | |-------|------|-----------------------|
78 | On Protecting the Data Privacy of Large Language Models (LLMs): A Survey | 2024.03 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2403.05156)| 
79 | Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory | (ICLR, 2024) | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs//2310.17884)| 
80 | A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly | 2023.12 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2312.02003) 
81 | Privacy in Large Language Models: Attacks, Defenses and Future Directions | 2023.10 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2310.10383) 
82 | Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | 2023.05 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2302.12173) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/greshake/llm-security)  | 
83 | Privacy-Preserving Large Language Models (PPLLMs) | 2023.01 | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://www.researchgate.net/publication/372607103_Privacy-Preserving_Large_Language_Models_PPLLMs) [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/greshake/llm-security)  | 
84 | 
85 | ### Fine-tuning
86 | | Paper | Year |  Source               | 
87 | |-------|------|-----------------------|
88 | | SentinelLMs: Encrypted Input Adaptation and Fine-tuning of Language Models for Private and Secure Inference  |  (AAAI, 2024)   | [![Static Badge](https://img.shields.io/badge/paper-%23B31B1B?logo=arxiv&labelColor=grey)](https://arxiv.org/abs/2312.17342)  [![Static Badge](https://img.shields.io/badge/code-black?logo=github)](https://github.com/abhijitmishra/sentinellm-aaai2024)  
89 | 


--------------------------------------------------------------------------------