├── .gitignore ├── README.md ├── emotion_token.py ├── example.ipynb ├── exp_data ├── malicious_prompt.csv └── normal_prompt.csv ├── load_data.py ├── load_model.py ├── picture ├── overview1.png └── overview2.png ├── requirements.txt ├── resource └── modeling_llama.py ├── vis ├── Llama-2-7b-chat-hf_16_24.png └── acc_Llama-2-7b-chat-hf.png ├── visualization.py ├── w2s_utils.py └── weak2strong.py /.gitignore: -------------------------------------------------------------------------------- 1 | ./exp_data/jailbreak_input 2 | .gitignore -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/README.md -------------------------------------------------------------------------------- /emotion_token.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/emotion_token.py -------------------------------------------------------------------------------- /example.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/example.ipynb -------------------------------------------------------------------------------- /exp_data/malicious_prompt.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/exp_data/malicious_prompt.csv -------------------------------------------------------------------------------- /exp_data/normal_prompt.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/exp_data/normal_prompt.csv -------------------------------------------------------------------------------- /load_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/load_data.py -------------------------------------------------------------------------------- /load_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/load_model.py -------------------------------------------------------------------------------- /picture/overview1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/picture/overview1.png -------------------------------------------------------------------------------- /picture/overview2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/picture/overview2.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/requirements.txt -------------------------------------------------------------------------------- /resource/modeling_llama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/resource/modeling_llama.py -------------------------------------------------------------------------------- /vis/Llama-2-7b-chat-hf_16_24.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/vis/Llama-2-7b-chat-hf_16_24.png -------------------------------------------------------------------------------- /vis/acc_Llama-2-7b-chat-hf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/vis/acc_Llama-2-7b-chat-hf.png -------------------------------------------------------------------------------- /visualization.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/visualization.py -------------------------------------------------------------------------------- /w2s_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/w2s_utils.py -------------------------------------------------------------------------------- /weak2strong.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ydyjya/LLM-IHS-Explanation/HEAD/weak2strong.py --------------------------------------------------------------------------------