├── .gitignore ├── README.md ├── code_diversity_fla └── do_fla.py ├── code_ifd ├── data_analysis.py ├── put_analysis_to_data.py └── select_data.py ├── data ├── alpaca_data.json └── data_with_ifd │ ├── alpaca_data_gpt2_data.json │ └── alpaca_gpt4_data_gpt2_data.json ├── evaluation ├── generation │ ├── eva_generation.py │ ├── eval.py │ ├── eval_generation_wrap.py │ └── review_eval_score.py ├── scripts │ ├── do_eval.sh │ ├── do_eval_generation.sh │ ├── do_eval_generation_wrap.sh │ └── do_review_eval_score.sh └── test_data │ ├── koala_test_set.jsonl │ ├── lima_test_set.jsonl │ ├── sinstruct_test_set.jsonl │ ├── vicuna_test_set.jsonl │ └── wizardlm_test_set.jsonl ├── images ├── fast_alpaca.png └── intro.png ├── requirements.txt └── scripts ├── optional_select_data_plus_diversity.sh ├── step1_select_data_analysis_gpt2.sh ├── step2_put_analysis_to_data.sh └── step3_select_data.sh /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/.gitignore -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/README.md -------------------------------------------------------------------------------- /code_diversity_fla/do_fla.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/code_diversity_fla/do_fla.py -------------------------------------------------------------------------------- /code_ifd/data_analysis.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/code_ifd/data_analysis.py -------------------------------------------------------------------------------- /code_ifd/put_analysis_to_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/code_ifd/put_analysis_to_data.py -------------------------------------------------------------------------------- /code_ifd/select_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/code_ifd/select_data.py -------------------------------------------------------------------------------- /data/alpaca_data.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/data/alpaca_data.json -------------------------------------------------------------------------------- /data/data_with_ifd/alpaca_data_gpt2_data.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/data/data_with_ifd/alpaca_data_gpt2_data.json -------------------------------------------------------------------------------- /data/data_with_ifd/alpaca_gpt4_data_gpt2_data.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/data/data_with_ifd/alpaca_gpt4_data_gpt2_data.json -------------------------------------------------------------------------------- /evaluation/generation/eva_generation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/generation/eva_generation.py -------------------------------------------------------------------------------- /evaluation/generation/eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/generation/eval.py -------------------------------------------------------------------------------- /evaluation/generation/eval_generation_wrap.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/generation/eval_generation_wrap.py -------------------------------------------------------------------------------- /evaluation/generation/review_eval_score.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/generation/review_eval_score.py -------------------------------------------------------------------------------- /evaluation/scripts/do_eval.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/scripts/do_eval.sh -------------------------------------------------------------------------------- /evaluation/scripts/do_eval_generation.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/scripts/do_eval_generation.sh -------------------------------------------------------------------------------- /evaluation/scripts/do_eval_generation_wrap.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/scripts/do_eval_generation_wrap.sh -------------------------------------------------------------------------------- /evaluation/scripts/do_review_eval_score.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/scripts/do_review_eval_score.sh -------------------------------------------------------------------------------- /evaluation/test_data/koala_test_set.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/test_data/koala_test_set.jsonl -------------------------------------------------------------------------------- /evaluation/test_data/lima_test_set.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/test_data/lima_test_set.jsonl -------------------------------------------------------------------------------- /evaluation/test_data/sinstruct_test_set.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/test_data/sinstruct_test_set.jsonl -------------------------------------------------------------------------------- /evaluation/test_data/vicuna_test_set.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/test_data/vicuna_test_set.jsonl -------------------------------------------------------------------------------- /evaluation/test_data/wizardlm_test_set.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/evaluation/test_data/wizardlm_test_set.jsonl -------------------------------------------------------------------------------- /images/fast_alpaca.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/images/fast_alpaca.png -------------------------------------------------------------------------------- /images/intro.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/images/intro.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | transformers 3 | sentencepiece 4 | tqdm 5 | scipy -------------------------------------------------------------------------------- /scripts/optional_select_data_plus_diversity.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/scripts/optional_select_data_plus_diversity.sh -------------------------------------------------------------------------------- /scripts/step1_select_data_analysis_gpt2.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/scripts/step1_select_data_analysis_gpt2.sh -------------------------------------------------------------------------------- /scripts/step2_put_analysis_to_data.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/scripts/step2_put_analysis_to_data.sh -------------------------------------------------------------------------------- /scripts/step3_select_data.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tianyi-lab/Superfiltering/HEAD/scripts/step3_select_data.sh --------------------------------------------------------------------------------