├── 01_prepare_submission.ipynb ├── 02_re_evaluate_submission.ipynb ├── LICENSE ├── README.md ├── data ├── outputs_official.json ├── train_ultrafeedback_gpt-4_vs_nil.pkl └── train_ultrafeedback_gpt-4_vs_nil_swap.pkl ├── example ├── outputs.json └── weighted_alpaca_eval_gpt4_turbo │ └── annotations.json ├── notebook_gpt4 ├── analyze.ipynb ├── gpt-4-1106-preview_vs_nil.ipynb └── saved │ └── gpt-4-1106-preview │ └── evaluated_nil_N_10_tokens_128_step_384_stride_16_seed_0.pkl └── viz ├── gpt-4-1106-preview_vs_nil.pdf └── leaderboard.jpeg /01_prepare_submission.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/01_prepare_submission.ipynb -------------------------------------------------------------------------------- /02_re_evaluate_submission.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/02_re_evaluate_submission.ipynb -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/README.md -------------------------------------------------------------------------------- /data/outputs_official.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/data/outputs_official.json -------------------------------------------------------------------------------- /data/train_ultrafeedback_gpt-4_vs_nil.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/data/train_ultrafeedback_gpt-4_vs_nil.pkl -------------------------------------------------------------------------------- /data/train_ultrafeedback_gpt-4_vs_nil_swap.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/data/train_ultrafeedback_gpt-4_vs_nil_swap.pkl -------------------------------------------------------------------------------- /example/outputs.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/example/outputs.json -------------------------------------------------------------------------------- /example/weighted_alpaca_eval_gpt4_turbo/annotations.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/example/weighted_alpaca_eval_gpt4_turbo/annotations.json -------------------------------------------------------------------------------- /notebook_gpt4/analyze.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/notebook_gpt4/analyze.ipynb -------------------------------------------------------------------------------- /notebook_gpt4/gpt-4-1106-preview_vs_nil.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/notebook_gpt4/gpt-4-1106-preview_vs_nil.ipynb -------------------------------------------------------------------------------- /notebook_gpt4/saved/gpt-4-1106-preview/evaluated_nil_N_10_tokens_128_step_384_stride_16_seed_0.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/notebook_gpt4/saved/gpt-4-1106-preview/evaluated_nil_N_10_tokens_128_step_384_stride_16_seed_0.pkl -------------------------------------------------------------------------------- /viz/gpt-4-1106-preview_vs_nil.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/viz/gpt-4-1106-preview_vs_nil.pdf -------------------------------------------------------------------------------- /viz/leaderboard.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sail-sg/Cheating-LLM-Benchmarks/HEAD/viz/leaderboard.jpeg --------------------------------------------------------------------------------