├── .gitignore ├── R1_X_Boundary_demo.py ├── README.md ├── asset ├── experiment.png ├── motivation.png └── reasoning_model_case.png ├── configs ├── accelerate_zero1.yaml ├── accelerate_zero2.yaml ├── accelerate_zero3.yaml └── ds_config.json ├── data ├── test │ ├── OKTest.json │ ├── ORbench_test_300.json │ ├── PHtest.json │ ├── harmbench_test.json │ ├── harmbench_test_ctx.json │ ├── harmbench_test_std.json │ ├── multi_turn │ │ ├── SafeMTData_Attack_test_600.json │ │ └── red_queen_turn5.json │ └── xstest_v2_prompts.json └── train │ ├── ORbench_retain_set.json │ ├── SafeMT_train_600.json │ ├── circuit_breakers_train_2400.json │ └── circuit_breakers_val.json ├── evaluation ├── api.py ├── evaluate.py ├── judge.py ├── multi_jailbreak_eval.py ├── overrefusal_eval.py ├── process_result.ipynb └── utils.py ├── requirements.txt ├── scripts ├── eval │ ├── multi_turn_eval.sh │ ├── overrefusal_eval.sh │ ├── red_queen_eval.sh │ ├── red_queen_eval_llama.sh │ └── single_turn_eval.sh ├── lorra_x_boundary_deepseek-r1-distill-llama-8b.sh ├── lorra_x_boundary_llama3_8b.sh └── lorra_x_boundary_qwen2_7b.sh └── src ├── args.py ├── lorra_x_boundary.py ├── utils.py └── xb_train_dataset.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/.gitignore -------------------------------------------------------------------------------- /R1_X_Boundary_demo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/R1_X_Boundary_demo.py -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/README.md -------------------------------------------------------------------------------- /asset/experiment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/asset/experiment.png -------------------------------------------------------------------------------- /asset/motivation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/asset/motivation.png -------------------------------------------------------------------------------- /asset/reasoning_model_case.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/asset/reasoning_model_case.png -------------------------------------------------------------------------------- /configs/accelerate_zero1.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/configs/accelerate_zero1.yaml -------------------------------------------------------------------------------- /configs/accelerate_zero2.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/configs/accelerate_zero2.yaml -------------------------------------------------------------------------------- /configs/accelerate_zero3.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/configs/accelerate_zero3.yaml -------------------------------------------------------------------------------- /configs/ds_config.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/configs/ds_config.json -------------------------------------------------------------------------------- /data/test/OKTest.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/test/OKTest.json -------------------------------------------------------------------------------- /data/test/ORbench_test_300.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/test/ORbench_test_300.json -------------------------------------------------------------------------------- /data/test/PHtest.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/test/PHtest.json -------------------------------------------------------------------------------- /data/test/harmbench_test.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/test/harmbench_test.json -------------------------------------------------------------------------------- /data/test/harmbench_test_ctx.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/test/harmbench_test_ctx.json -------------------------------------------------------------------------------- /data/test/harmbench_test_std.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/test/harmbench_test_std.json -------------------------------------------------------------------------------- /data/test/multi_turn/SafeMTData_Attack_test_600.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/test/multi_turn/SafeMTData_Attack_test_600.json -------------------------------------------------------------------------------- /data/test/multi_turn/red_queen_turn5.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/test/multi_turn/red_queen_turn5.json -------------------------------------------------------------------------------- /data/test/xstest_v2_prompts.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/test/xstest_v2_prompts.json -------------------------------------------------------------------------------- /data/train/ORbench_retain_set.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/train/ORbench_retain_set.json -------------------------------------------------------------------------------- /data/train/SafeMT_train_600.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/train/SafeMT_train_600.json -------------------------------------------------------------------------------- /data/train/circuit_breakers_train_2400.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/train/circuit_breakers_train_2400.json -------------------------------------------------------------------------------- /data/train/circuit_breakers_val.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/data/train/circuit_breakers_val.json -------------------------------------------------------------------------------- /evaluation/api.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/evaluation/api.py -------------------------------------------------------------------------------- /evaluation/evaluate.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/evaluation/evaluate.py -------------------------------------------------------------------------------- /evaluation/judge.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/evaluation/judge.py -------------------------------------------------------------------------------- /evaluation/multi_jailbreak_eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/evaluation/multi_jailbreak_eval.py -------------------------------------------------------------------------------- /evaluation/overrefusal_eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/evaluation/overrefusal_eval.py -------------------------------------------------------------------------------- /evaluation/process_result.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/evaluation/process_result.ipynb -------------------------------------------------------------------------------- /evaluation/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/evaluation/utils.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/requirements.txt -------------------------------------------------------------------------------- /scripts/eval/multi_turn_eval.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/scripts/eval/multi_turn_eval.sh -------------------------------------------------------------------------------- /scripts/eval/overrefusal_eval.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/scripts/eval/overrefusal_eval.sh -------------------------------------------------------------------------------- /scripts/eval/red_queen_eval.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/scripts/eval/red_queen_eval.sh -------------------------------------------------------------------------------- /scripts/eval/red_queen_eval_llama.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/scripts/eval/red_queen_eval_llama.sh -------------------------------------------------------------------------------- /scripts/eval/single_turn_eval.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/scripts/eval/single_turn_eval.sh -------------------------------------------------------------------------------- /scripts/lorra_x_boundary_deepseek-r1-distill-llama-8b.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/scripts/lorra_x_boundary_deepseek-r1-distill-llama-8b.sh -------------------------------------------------------------------------------- /scripts/lorra_x_boundary_llama3_8b.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/scripts/lorra_x_boundary_llama3_8b.sh -------------------------------------------------------------------------------- /scripts/lorra_x_boundary_qwen2_7b.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/scripts/lorra_x_boundary_qwen2_7b.sh -------------------------------------------------------------------------------- /src/args.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/src/args.py -------------------------------------------------------------------------------- /src/lorra_x_boundary.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/src/lorra_x_boundary.py -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/src/utils.py -------------------------------------------------------------------------------- /src/xb_train_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AI45Lab/X-Boundary/HEAD/src/xb_train_dataset.py --------------------------------------------------------------------------------