├── LICENSE ├── README.md ├── demo.jpg └── weak-to-strong ├── README.md ├── data ├── cai │ ├── harmless_test.jsonl │ ├── harmless_train_gt.jsonl │ └── harmless_train_w2s.jsonl └── hh-single-turn │ ├── harmless_test.jsonl │ ├── harmless_train_gt.jsonl │ ├── harmless_train_w2s.jsonl │ ├── helpful_test.jsonl │ └── helpful_train.jsonl ├── pyproject.toml ├── requirements.txt ├── run_dpo.sh ├── run_reward_model.sh ├── run_simpo.sh ├── run_simpo_bootstrapping.sh ├── run_simpo_bootstrapping_fsdp.sh ├── run_simpo_fsdp.sh ├── train_dpo.py ├── train_reward_model.py ├── train_simpo.py ├── train_simpo_bootstrapping.py ├── train_simpo_bootstrapping_fsdp.py ├── train_simpo_fsdp.py └── weak_to_strong ├── common.py ├── datasets.py ├── eval.py ├── logger.py ├── loss.py ├── model.py └── train.py /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/README.md -------------------------------------------------------------------------------- /demo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/demo.jpg -------------------------------------------------------------------------------- /weak-to-strong/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/README.md -------------------------------------------------------------------------------- /weak-to-strong/data/cai/harmless_test.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/data/cai/harmless_test.jsonl -------------------------------------------------------------------------------- /weak-to-strong/data/cai/harmless_train_gt.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/data/cai/harmless_train_gt.jsonl -------------------------------------------------------------------------------- /weak-to-strong/data/cai/harmless_train_w2s.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/data/cai/harmless_train_w2s.jsonl -------------------------------------------------------------------------------- /weak-to-strong/data/hh-single-turn/harmless_test.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/data/hh-single-turn/harmless_test.jsonl -------------------------------------------------------------------------------- /weak-to-strong/data/hh-single-turn/harmless_train_gt.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/data/hh-single-turn/harmless_train_gt.jsonl -------------------------------------------------------------------------------- /weak-to-strong/data/hh-single-turn/harmless_train_w2s.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/data/hh-single-turn/harmless_train_w2s.jsonl -------------------------------------------------------------------------------- /weak-to-strong/data/hh-single-turn/helpful_test.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/data/hh-single-turn/helpful_test.jsonl -------------------------------------------------------------------------------- /weak-to-strong/data/hh-single-turn/helpful_train.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/data/hh-single-turn/helpful_train.jsonl -------------------------------------------------------------------------------- /weak-to-strong/pyproject.toml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/pyproject.toml -------------------------------------------------------------------------------- /weak-to-strong/requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/requirements.txt -------------------------------------------------------------------------------- /weak-to-strong/run_dpo.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/run_dpo.sh -------------------------------------------------------------------------------- /weak-to-strong/run_reward_model.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/run_reward_model.sh -------------------------------------------------------------------------------- /weak-to-strong/run_simpo.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/run_simpo.sh -------------------------------------------------------------------------------- /weak-to-strong/run_simpo_bootstrapping.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/run_simpo_bootstrapping.sh -------------------------------------------------------------------------------- /weak-to-strong/run_simpo_bootstrapping_fsdp.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/run_simpo_bootstrapping_fsdp.sh -------------------------------------------------------------------------------- /weak-to-strong/run_simpo_fsdp.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/run_simpo_fsdp.sh -------------------------------------------------------------------------------- /weak-to-strong/train_dpo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/train_dpo.py -------------------------------------------------------------------------------- /weak-to-strong/train_reward_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/train_reward_model.py -------------------------------------------------------------------------------- /weak-to-strong/train_simpo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/train_simpo.py -------------------------------------------------------------------------------- /weak-to-strong/train_simpo_bootstrapping.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/train_simpo_bootstrapping.py -------------------------------------------------------------------------------- /weak-to-strong/train_simpo_bootstrapping_fsdp.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/train_simpo_bootstrapping_fsdp.py -------------------------------------------------------------------------------- /weak-to-strong/train_simpo_fsdp.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/train_simpo_fsdp.py -------------------------------------------------------------------------------- /weak-to-strong/weak_to_strong/common.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/weak_to_strong/common.py -------------------------------------------------------------------------------- /weak-to-strong/weak_to_strong/datasets.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/weak_to_strong/datasets.py -------------------------------------------------------------------------------- /weak-to-strong/weak_to_strong/eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/weak_to_strong/eval.py -------------------------------------------------------------------------------- /weak-to-strong/weak_to_strong/logger.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/weak_to_strong/logger.py -------------------------------------------------------------------------------- /weak-to-strong/weak_to_strong/loss.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/weak_to_strong/loss.py -------------------------------------------------------------------------------- /weak-to-strong/weak_to_strong/model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/weak_to_strong/model.py -------------------------------------------------------------------------------- /weak-to-strong/weak_to_strong/train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/keven980716/weak-to-strong-deception/HEAD/weak-to-strong/weak_to_strong/train.py --------------------------------------------------------------------------------