├── .DS_Store ├── LICENSE ├── README.md ├── data ├── advbench │ ├── harmful_behaviors.csv │ └── harmful_strings.csv └── transfer_expriment_behaviors.csv ├── environment.yml ├── experiments ├── .DS_Store ├── __init__.py ├── configs │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-38.pyc │ │ ├── __init__.cpython-39.pyc │ │ ├── individual_llama2.cpython-38.pyc │ │ ├── individual_llama2.cpython-39.pyc │ │ ├── individual_vicuna.cpython-39.pyc │ │ ├── template.cpython-38.pyc │ │ ├── template.cpython-39.pyc │ │ └── transfer_llama2.cpython-39.pyc │ ├── individual_llama2.py │ ├── individual_vicuna.py │ ├── template.py │ ├── transfer_llama2.py │ └── transfer_vicuna.py ├── eval_scripts │ ├── run_eval.sh │ └── run_eval_individual.sh ├── evaluate.py ├── evaluate_individual.py ├── launch_scripts │ ├── run_gcg_individual.sh │ └── run_gcg_multiple.sh └── main.py ├── llm_attacks.egg-info ├── PKG-INFO ├── SOURCES.txt ├── dependency_links.txt ├── requires.txt └── top_level.txt ├── llm_attacks ├── README.md ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-38.pyc │ └── __init__.cpython-39.pyc ├── base │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-38.pyc │ │ ├── __init__.cpython-39.pyc │ │ ├── attack_manager.cpython-38.pyc │ │ └── attack_manager.cpython-39.pyc │ └── attack_manager.py ├── gcg │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-38.pyc │ │ ├── __init__.cpython-39.pyc │ │ ├── gcg_attack.cpython-38.pyc │ │ └── gcg_attack.cpython-39.pyc │ └── gcg_attack.py └── minimal_gcg │ ├── __init__.py │ ├── opt_utils.py │ └── string_utils.py ├── probe-sampling.png ├── requirements.txt └── setup.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/.DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/README.md -------------------------------------------------------------------------------- /data/advbench/harmful_behaviors.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/data/advbench/harmful_behaviors.csv -------------------------------------------------------------------------------- /data/advbench/harmful_strings.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/data/advbench/harmful_strings.csv -------------------------------------------------------------------------------- /data/transfer_expriment_behaviors.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/data/transfer_expriment_behaviors.csv -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/environment.yml -------------------------------------------------------------------------------- /experiments/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/.DS_Store -------------------------------------------------------------------------------- /experiments/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /experiments/configs/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /experiments/configs/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /experiments/configs/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /experiments/configs/__pycache__/individual_llama2.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/__pycache__/individual_llama2.cpython-38.pyc -------------------------------------------------------------------------------- /experiments/configs/__pycache__/individual_llama2.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/__pycache__/individual_llama2.cpython-39.pyc -------------------------------------------------------------------------------- /experiments/configs/__pycache__/individual_vicuna.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/__pycache__/individual_vicuna.cpython-39.pyc -------------------------------------------------------------------------------- /experiments/configs/__pycache__/template.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/__pycache__/template.cpython-38.pyc -------------------------------------------------------------------------------- /experiments/configs/__pycache__/template.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/__pycache__/template.cpython-39.pyc -------------------------------------------------------------------------------- /experiments/configs/__pycache__/transfer_llama2.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/__pycache__/transfer_llama2.cpython-39.pyc -------------------------------------------------------------------------------- /experiments/configs/individual_llama2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/individual_llama2.py -------------------------------------------------------------------------------- /experiments/configs/individual_vicuna.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/individual_vicuna.py -------------------------------------------------------------------------------- /experiments/configs/template.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/template.py -------------------------------------------------------------------------------- /experiments/configs/transfer_llama2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/transfer_llama2.py -------------------------------------------------------------------------------- /experiments/configs/transfer_vicuna.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/configs/transfer_vicuna.py -------------------------------------------------------------------------------- /experiments/eval_scripts/run_eval.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/eval_scripts/run_eval.sh -------------------------------------------------------------------------------- /experiments/eval_scripts/run_eval_individual.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/eval_scripts/run_eval_individual.sh -------------------------------------------------------------------------------- /experiments/evaluate.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/evaluate.py -------------------------------------------------------------------------------- /experiments/evaluate_individual.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/evaluate_individual.py -------------------------------------------------------------------------------- /experiments/launch_scripts/run_gcg_individual.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/launch_scripts/run_gcg_individual.sh -------------------------------------------------------------------------------- /experiments/launch_scripts/run_gcg_multiple.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/launch_scripts/run_gcg_multiple.sh -------------------------------------------------------------------------------- /experiments/main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/experiments/main.py -------------------------------------------------------------------------------- /llm_attacks.egg-info/PKG-INFO: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks.egg-info/PKG-INFO -------------------------------------------------------------------------------- /llm_attacks.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks.egg-info/SOURCES.txt -------------------------------------------------------------------------------- /llm_attacks.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /llm_attacks.egg-info/requires.txt: -------------------------------------------------------------------------------- 1 | transformers==4.28.1 2 | ml_collections 3 | fschat==0.2.20 4 | -------------------------------------------------------------------------------- /llm_attacks.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | experiments 2 | llm_attacks 3 | -------------------------------------------------------------------------------- /llm_attacks/README.md: -------------------------------------------------------------------------------- 1 | README.md 2 | -------------------------------------------------------------------------------- /llm_attacks/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/__init__.py -------------------------------------------------------------------------------- /llm_attacks/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /llm_attacks/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /llm_attacks/base/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /llm_attacks/base/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/base/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /llm_attacks/base/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/base/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /llm_attacks/base/__pycache__/attack_manager.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/base/__pycache__/attack_manager.cpython-38.pyc -------------------------------------------------------------------------------- /llm_attacks/base/__pycache__/attack_manager.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/base/__pycache__/attack_manager.cpython-39.pyc -------------------------------------------------------------------------------- /llm_attacks/base/attack_manager.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/base/attack_manager.py -------------------------------------------------------------------------------- /llm_attacks/gcg/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/gcg/__init__.py -------------------------------------------------------------------------------- /llm_attacks/gcg/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/gcg/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /llm_attacks/gcg/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/gcg/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /llm_attacks/gcg/__pycache__/gcg_attack.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/gcg/__pycache__/gcg_attack.cpython-38.pyc -------------------------------------------------------------------------------- /llm_attacks/gcg/__pycache__/gcg_attack.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/gcg/__pycache__/gcg_attack.cpython-39.pyc -------------------------------------------------------------------------------- /llm_attacks/gcg/gcg_attack.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/gcg/gcg_attack.py -------------------------------------------------------------------------------- /llm_attacks/minimal_gcg/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /llm_attacks/minimal_gcg/opt_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/minimal_gcg/opt_utils.py -------------------------------------------------------------------------------- /llm_attacks/minimal_gcg/string_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/llm_attacks/minimal_gcg/string_utils.py -------------------------------------------------------------------------------- /probe-sampling.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/probe-sampling.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | transformers==4.28.1 2 | ml_collections 3 | fschat==0.2.23 4 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaoyiran924/Probe-Sampling/HEAD/setup.py --------------------------------------------------------------------------------