├── GPQA_Main_5-shot_Mistral_7B_wandb_export_2024-10-02T12_55_32.220+08_00.csv ├── GPQA_Main_5-shot_Mistral_Large_Min_P_wandb_export_2024-10-02T13_05_07.370+08_00.csv ├── GPQA_Main_5-shot_Mistral_Large_Top_P_and_control_wandb_export_2024-10-02T12_58_36.356+08_00.csv ├── GPQA_and_GSM8K_COT_all_llama_runs_wandb_export_2025-02-26T14_51_52.538-08_00.csv ├── GSM8K_COT_8-shot_Mistral_7B_wandb_export_2024-10-02T12_54_57.415+08_00.csv ├── LICENSE ├── README.md ├── [LIVE] min_p user preference study v3.0 (Responses) - Form responses 1.csv ├── [PUBLIC]_Min_P_Evals_Replication_for_GPQA_and_GSM8K_COT.ipynb ├── implementation ├── llama1b_gpqa.csv ├── llama31-70b-min-p-gpqa.csv ├── llama31-70b-min-p-gsm8k.csv ├── llm_as_judge_eval1.csv ├── llm_as_judge_eval2.csv ├── min_p user preference study v2.0 (Responses) - Form responses 1 (1) (2).csv └── wandb_rebuttal_response_allen.csv /GPQA_Main_5-shot_Mistral_7B_wandb_export_2024-10-02T12_55_32.220+08_00.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/GPQA_Main_5-shot_Mistral_7B_wandb_export_2024-10-02T12_55_32.220+08_00.csv -------------------------------------------------------------------------------- /GPQA_Main_5-shot_Mistral_Large_Min_P_wandb_export_2024-10-02T13_05_07.370+08_00.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/GPQA_Main_5-shot_Mistral_Large_Min_P_wandb_export_2024-10-02T13_05_07.370+08_00.csv -------------------------------------------------------------------------------- /GPQA_Main_5-shot_Mistral_Large_Top_P_and_control_wandb_export_2024-10-02T12_58_36.356+08_00.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/GPQA_Main_5-shot_Mistral_Large_Top_P_and_control_wandb_export_2024-10-02T12_58_36.356+08_00.csv -------------------------------------------------------------------------------- /GPQA_and_GSM8K_COT_all_llama_runs_wandb_export_2025-02-26T14_51_52.538-08_00.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/GPQA_and_GSM8K_COT_all_llama_runs_wandb_export_2025-02-26T14_51_52.538-08_00.csv -------------------------------------------------------------------------------- /GSM8K_COT_8-shot_Mistral_7B_wandb_export_2024-10-02T12_54_57.415+08_00.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/GSM8K_COT_8-shot_Mistral_7B_wandb_export_2024-10-02T12_54_57.415+08_00.csv -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/README.md -------------------------------------------------------------------------------- /[LIVE] min_p user preference study v3.0 (Responses) - Form responses 1.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/[LIVE] min_p user preference study v3.0 (Responses) - Form responses 1.csv -------------------------------------------------------------------------------- /[PUBLIC]_Min_P_Evals_Replication_for_GPQA_and_GSM8K_COT.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/[PUBLIC]_Min_P_Evals_Replication_for_GPQA_and_GSM8K_COT.ipynb -------------------------------------------------------------------------------- /implementation: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/implementation -------------------------------------------------------------------------------- /llama1b_gpqa.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/llama1b_gpqa.csv -------------------------------------------------------------------------------- /llama31-70b-min-p-gpqa.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/llama31-70b-min-p-gpqa.csv -------------------------------------------------------------------------------- /llama31-70b-min-p-gsm8k.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/llama31-70b-min-p-gsm8k.csv -------------------------------------------------------------------------------- /llm_as_judge_eval1.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/llm_as_judge_eval1.csv -------------------------------------------------------------------------------- /llm_as_judge_eval2.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/llm_as_judge_eval2.csv -------------------------------------------------------------------------------- /min_p user preference study v2.0 (Responses) - Form responses 1 (1) (2).csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/min_p user preference study v2.0 (Responses) - Form responses 1 (1) (2).csv -------------------------------------------------------------------------------- /wandb_rebuttal_response_allen.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/menhguin/minp_paper/HEAD/wandb_rebuttal_response_allen.csv --------------------------------------------------------------------------------