├── .github ├── ISSUE_TEMPLATE │ ├── bug_report.yaml │ └── feature-request.yaml └── PULL_REQUEST_TEMPLATE │ └── pr_template.md ├── .gitignore ├── LICENSE ├── README.md ├── batch_inference ├── 1_get_chunk_info.py ├── 2_get_score.py └── 3_get_pairs.py ├── data └── example │ ├── sampled_responses.jsonl │ └── sft.jsonl ├── evaluation ├── LongBench │ ├── 1_aggregate_data.py │ ├── 2_pred.py │ ├── 3_eval.py │ ├── README.md │ └── dataset2prompt.json └── LongBench_Chat │ ├── eval.py │ ├── few_shots.json │ ├── prompt_fs.txt │ └── test_cases.json ├── long_reward ├── auto_scorer.py └── prompts │ ├── completeness_few_shot.txt │ ├── extract_info_en.txt │ ├── extract_info_zh.txt │ ├── faithfulness_few_shot.txt │ ├── find_fact_few_shot.txt │ ├── helpfulness_few_shot.txt │ └── logicality_few_shot.txt ├── requirements.txt └── utils ├── llm_api.py ├── retrieve.py └── zhipu_embedding.py /.github/ISSUE_TEMPLATE/bug_report.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/.github/ISSUE_TEMPLATE/bug_report.yaml -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature-request.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/.github/ISSUE_TEMPLATE/feature-request.yaml -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE/pr_template.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/.github/PULL_REQUEST_TEMPLATE/pr_template.md -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *venv 2 | *.DS_Store 3 | *.idea/ 4 | dataset 5 | test* -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/README.md -------------------------------------------------------------------------------- /batch_inference/1_get_chunk_info.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/batch_inference/1_get_chunk_info.py -------------------------------------------------------------------------------- /batch_inference/2_get_score.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/batch_inference/2_get_score.py -------------------------------------------------------------------------------- /batch_inference/3_get_pairs.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/batch_inference/3_get_pairs.py -------------------------------------------------------------------------------- /data/example/sampled_responses.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/data/example/sampled_responses.jsonl -------------------------------------------------------------------------------- /data/example/sft.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/data/example/sft.jsonl -------------------------------------------------------------------------------- /evaluation/LongBench/1_aggregate_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/evaluation/LongBench/1_aggregate_data.py -------------------------------------------------------------------------------- /evaluation/LongBench/2_pred.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/evaluation/LongBench/2_pred.py -------------------------------------------------------------------------------- /evaluation/LongBench/3_eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/evaluation/LongBench/3_eval.py -------------------------------------------------------------------------------- /evaluation/LongBench/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/evaluation/LongBench/README.md -------------------------------------------------------------------------------- /evaluation/LongBench/dataset2prompt.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/evaluation/LongBench/dataset2prompt.json -------------------------------------------------------------------------------- /evaluation/LongBench_Chat/eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/evaluation/LongBench_Chat/eval.py -------------------------------------------------------------------------------- /evaluation/LongBench_Chat/few_shots.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/evaluation/LongBench_Chat/few_shots.json -------------------------------------------------------------------------------- /evaluation/LongBench_Chat/prompt_fs.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/evaluation/LongBench_Chat/prompt_fs.txt -------------------------------------------------------------------------------- /evaluation/LongBench_Chat/test_cases.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/evaluation/LongBench_Chat/test_cases.json -------------------------------------------------------------------------------- /long_reward/auto_scorer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/long_reward/auto_scorer.py -------------------------------------------------------------------------------- /long_reward/prompts/completeness_few_shot.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/long_reward/prompts/completeness_few_shot.txt -------------------------------------------------------------------------------- /long_reward/prompts/extract_info_en.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/long_reward/prompts/extract_info_en.txt -------------------------------------------------------------------------------- /long_reward/prompts/extract_info_zh.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/long_reward/prompts/extract_info_zh.txt -------------------------------------------------------------------------------- /long_reward/prompts/faithfulness_few_shot.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/long_reward/prompts/faithfulness_few_shot.txt -------------------------------------------------------------------------------- /long_reward/prompts/find_fact_few_shot.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/long_reward/prompts/find_fact_few_shot.txt -------------------------------------------------------------------------------- /long_reward/prompts/helpfulness_few_shot.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/long_reward/prompts/helpfulness_few_shot.txt -------------------------------------------------------------------------------- /long_reward/prompts/logicality_few_shot.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/long_reward/prompts/logicality_few_shot.txt -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/requirements.txt -------------------------------------------------------------------------------- /utils/llm_api.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/utils/llm_api.py -------------------------------------------------------------------------------- /utils/retrieve.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/utils/retrieve.py -------------------------------------------------------------------------------- /utils/zhipu_embedding.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/THUDM/LongReward/HEAD/utils/zhipu_embedding.py --------------------------------------------------------------------------------