├── .gitignore ├── LICENSE ├── README.md ├── conda_recipe.yaml ├── generate_evaluate_completions.py ├── main.py ├── method ├── attacks.py ├── diff_emb_p=2_new.pth ├── eval.py ├── logs │ ├── log_model1_mode=rs-biased-nospace_tkn=5_outit=10_seed=18_mult-2.txt │ ├── log_model2_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-2-top1000_dict=diff_emb_p=2_new_mult-2.txt │ ├── log_model3_mode=rs-nospace_tkn=5_outit=10_seed=15_incl=diff_emb-topk=-1_all-3-top1000_dict=diff_emb_p=2_new_fixed.txt │ ├── log_model3_mode=rs-nospace_tkn=5_outit=4_seed=15_incl=diff_emb-topk=-1_all-3-top1000_dict=diff_emb_p=2_new_fixed.txt │ ├── log_model4_mode=rs-biased-nospace_tkn=15_outit=4_seed=10_mult-2.txt │ └── log_model5_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-5-top1000_dict=diff_emb_p=2_new_mult-2.txt ├── logs_precomputed │ ├── log_model1_mode=rs-biased-nospace_tkn=5_outit=10_seed=18_mult-2.txt │ ├── log_model2_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-2-top1000_dict=diff_emb_p=2_new_mult-2.txt │ ├── log_model3_mode=rs-nospace_tkn=5_outit=10_seed=15_incl=diff_emb-topk=-1_all-3-top1000_dict=diff_emb_p=2_new_fixed.txt │ ├── log_model4_mode=rs-biased-nospace_tkn=15_outit=4_seed=10_mult-2.txt │ └── log_model5_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-5-top1000_mult-2.txt └── utils.py ├── src ├── README.md ├── __init__.py ├── datasets │ ├── __init__.py │ ├── base.py │ ├── constants.py │ └── prompt_only.py └── models │ ├── __init__.py │ └── reward_model.py ├── submission-croce_francesco.csv └── test_eval.sh /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/README.md -------------------------------------------------------------------------------- /conda_recipe.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/conda_recipe.yaml -------------------------------------------------------------------------------- /generate_evaluate_completions.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/generate_evaluate_completions.py -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/main.py -------------------------------------------------------------------------------- /method/attacks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/attacks.py -------------------------------------------------------------------------------- /method/diff_emb_p=2_new.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/diff_emb_p=2_new.pth -------------------------------------------------------------------------------- /method/eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/eval.py -------------------------------------------------------------------------------- /method/logs/log_model1_mode=rs-biased-nospace_tkn=5_outit=10_seed=18_mult-2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs/log_model1_mode=rs-biased-nospace_tkn=5_outit=10_seed=18_mult-2.txt -------------------------------------------------------------------------------- /method/logs/log_model2_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-2-top1000_dict=diff_emb_p=2_new_mult-2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs/log_model2_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-2-top1000_dict=diff_emb_p=2_new_mult-2.txt -------------------------------------------------------------------------------- /method/logs/log_model3_mode=rs-nospace_tkn=5_outit=10_seed=15_incl=diff_emb-topk=-1_all-3-top1000_dict=diff_emb_p=2_new_fixed.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs/log_model3_mode=rs-nospace_tkn=5_outit=10_seed=15_incl=diff_emb-topk=-1_all-3-top1000_dict=diff_emb_p=2_new_fixed.txt -------------------------------------------------------------------------------- /method/logs/log_model3_mode=rs-nospace_tkn=5_outit=4_seed=15_incl=diff_emb-topk=-1_all-3-top1000_dict=diff_emb_p=2_new_fixed.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs/log_model3_mode=rs-nospace_tkn=5_outit=4_seed=15_incl=diff_emb-topk=-1_all-3-top1000_dict=diff_emb_p=2_new_fixed.txt -------------------------------------------------------------------------------- /method/logs/log_model4_mode=rs-biased-nospace_tkn=15_outit=4_seed=10_mult-2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs/log_model4_mode=rs-biased-nospace_tkn=15_outit=4_seed=10_mult-2.txt -------------------------------------------------------------------------------- /method/logs/log_model5_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-5-top1000_dict=diff_emb_p=2_new_mult-2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs/log_model5_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-5-top1000_dict=diff_emb_p=2_new_mult-2.txt -------------------------------------------------------------------------------- /method/logs_precomputed/log_model1_mode=rs-biased-nospace_tkn=5_outit=10_seed=18_mult-2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs_precomputed/log_model1_mode=rs-biased-nospace_tkn=5_outit=10_seed=18_mult-2.txt -------------------------------------------------------------------------------- /method/logs_precomputed/log_model2_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-2-top1000_dict=diff_emb_p=2_new_mult-2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs_precomputed/log_model2_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-2-top1000_dict=diff_emb_p=2_new_mult-2.txt -------------------------------------------------------------------------------- /method/logs_precomputed/log_model3_mode=rs-nospace_tkn=5_outit=10_seed=15_incl=diff_emb-topk=-1_all-3-top1000_dict=diff_emb_p=2_new_fixed.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs_precomputed/log_model3_mode=rs-nospace_tkn=5_outit=10_seed=15_incl=diff_emb-topk=-1_all-3-top1000_dict=diff_emb_p=2_new_fixed.txt -------------------------------------------------------------------------------- /method/logs_precomputed/log_model4_mode=rs-biased-nospace_tkn=15_outit=4_seed=10_mult-2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs_precomputed/log_model4_mode=rs-biased-nospace_tkn=15_outit=4_seed=10_mult-2.txt -------------------------------------------------------------------------------- /method/logs_precomputed/log_model5_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-5-top1000_mult-2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/logs_precomputed/log_model5_mode=rs-nospace_tkn=5_outit=4_seed=10_incl=diff_emb-topk=-1_all-5-top1000_mult-2.txt -------------------------------------------------------------------------------- /method/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/method/utils.py -------------------------------------------------------------------------------- /src/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/src/README.md -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/datasets/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/src/datasets/__init__.py -------------------------------------------------------------------------------- /src/datasets/base.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/src/datasets/base.py -------------------------------------------------------------------------------- /src/datasets/constants.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/src/datasets/constants.py -------------------------------------------------------------------------------- /src/datasets/prompt_only.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/src/datasets/prompt_only.py -------------------------------------------------------------------------------- /src/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/src/models/__init__.py -------------------------------------------------------------------------------- /src/models/reward_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/src/models/reward_model.py -------------------------------------------------------------------------------- /submission-croce_francesco.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/submission-croce_francesco.csv -------------------------------------------------------------------------------- /test_eval.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/rlhf-trojan-competition-submission/HEAD/test_eval.sh --------------------------------------------------------------------------------