├── .gitattributes ├── .gitignore ├── LICENSE ├── README.md ├── checkpoints ├── README.md ├── final_defualt_seed_paths.jsonl └── weka_paths.jsonl ├── eval ├── all_olmes_rc_tasks.txt ├── eval_checkpoints_x_tasks.py ├── get_checkpoint_lists.ipynb └── readme.md ├── perplexity_metrics_by_group.csv ├── pretraining ├── create_ladder_over_scale_script.py ├── eval-for-consistent-ranking-mix-names.jsonl ├── eval-for-consistent-ranking-scales.txt └── readme.md ├── release ├── final_checkpoint_upload_script.sh ├── find_wandb.py ├── loss_curves.ipynb ├── make_final_checkpoint_script.py ├── make_model_table.ipynb ├── make_repo_public.py ├── matched_runs_by_s3.json ├── model_card.md ├── readme.md ├── repo_names.txt ├── requirements.txt ├── upload_checkpoints.py ├── upload_eval.py ├── upload_model.py └── upload_model_card.py ├── requirements.txt ├── results └── readme.md ├── scaling_laws ├── README.md ├── dump │ ├── ladder_ian.py │ └── ladder_tables.py ├── fit_scaling_laws.py ├── notebooks │ ├── example_results.ipynb │ ├── math_and_code.ipynb │ ├── reconcile_bpb.ipynb │ ├── scaling_law_figures.ipynb │ └── scaling_law_tables.ipynb ├── remote │ ├── aws.py │ ├── compile_results.py │ ├── constants.py │ ├── hf.py │ └── preprocess.py ├── render_tables.py ├── requirements.txt └── utils │ ├── constants │ ├── __init__.py │ ├── constants_models.py │ └── constants_recepies.py │ ├── dataloader.py │ ├── plot.py │ ├── scaling_laws.py │ ├── stats.py │ └── table.py ├── single_scale ├── algorithms │ ├── log_fit.py │ └── scaling_law.py ├── config.yaml ├── create_per_task_configs.sh ├── data_exploration_and_cleaning.ipynb ├── data_method.py ├── evaluator.py ├── main.py ├── process_s3_to_csv.py ├── readme.md ├── requirements.txt ├── target_filter.py ├── utils.py └── visualization.py ├── utils ├── constants.py └── undo_normalization.py └── viz ├── compute_story_viz.ipynb ├── math_and_code.ipynb ├── pred_error.ipynb ├── readme.md ├── requirements.txt ├── rq_4.ipynb └── table_gen.ipynb /.gitattributes: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/.gitattributes -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/README.md -------------------------------------------------------------------------------- /checkpoints/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/checkpoints/README.md -------------------------------------------------------------------------------- /checkpoints/final_defualt_seed_paths.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/checkpoints/final_defualt_seed_paths.jsonl -------------------------------------------------------------------------------- /checkpoints/weka_paths.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/checkpoints/weka_paths.jsonl -------------------------------------------------------------------------------- /eval/all_olmes_rc_tasks.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/eval/all_olmes_rc_tasks.txt -------------------------------------------------------------------------------- /eval/eval_checkpoints_x_tasks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/eval/eval_checkpoints_x_tasks.py -------------------------------------------------------------------------------- /eval/get_checkpoint_lists.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/eval/get_checkpoint_lists.ipynb -------------------------------------------------------------------------------- /eval/readme.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/eval/readme.md -------------------------------------------------------------------------------- /perplexity_metrics_by_group.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/perplexity_metrics_by_group.csv -------------------------------------------------------------------------------- /pretraining/create_ladder_over_scale_script.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/pretraining/create_ladder_over_scale_script.py -------------------------------------------------------------------------------- /pretraining/eval-for-consistent-ranking-mix-names.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/pretraining/eval-for-consistent-ranking-mix-names.jsonl -------------------------------------------------------------------------------- /pretraining/eval-for-consistent-ranking-scales.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/pretraining/eval-for-consistent-ranking-scales.txt -------------------------------------------------------------------------------- /pretraining/readme.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/pretraining/readme.md -------------------------------------------------------------------------------- /release/final_checkpoint_upload_script.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/final_checkpoint_upload_script.sh -------------------------------------------------------------------------------- /release/find_wandb.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/find_wandb.py -------------------------------------------------------------------------------- /release/loss_curves.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/loss_curves.ipynb -------------------------------------------------------------------------------- /release/make_final_checkpoint_script.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/make_final_checkpoint_script.py -------------------------------------------------------------------------------- /release/make_model_table.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/make_model_table.ipynb -------------------------------------------------------------------------------- /release/make_repo_public.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/make_repo_public.py -------------------------------------------------------------------------------- /release/matched_runs_by_s3.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/matched_runs_by_s3.json -------------------------------------------------------------------------------- /release/model_card.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/model_card.md -------------------------------------------------------------------------------- /release/readme.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/readme.md -------------------------------------------------------------------------------- /release/repo_names.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/repo_names.txt -------------------------------------------------------------------------------- /release/requirements.txt: -------------------------------------------------------------------------------- 1 | ai2-olmo 2 | transformers 3 | huggingface_hub 4 | datasets 5 | ipywidgets -------------------------------------------------------------------------------- /release/upload_checkpoints.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/upload_checkpoints.py -------------------------------------------------------------------------------- /release/upload_eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/upload_eval.py -------------------------------------------------------------------------------- /release/upload_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/upload_model.py -------------------------------------------------------------------------------- /release/upload_model_card.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/release/upload_model_card.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/requirements.txt -------------------------------------------------------------------------------- /results/readme.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/results/readme.md -------------------------------------------------------------------------------- /scaling_laws/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/README.md -------------------------------------------------------------------------------- /scaling_laws/dump/ladder_ian.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/dump/ladder_ian.py -------------------------------------------------------------------------------- /scaling_laws/dump/ladder_tables.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/dump/ladder_tables.py -------------------------------------------------------------------------------- /scaling_laws/fit_scaling_laws.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/fit_scaling_laws.py -------------------------------------------------------------------------------- /scaling_laws/notebooks/example_results.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/notebooks/example_results.ipynb -------------------------------------------------------------------------------- /scaling_laws/notebooks/math_and_code.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/notebooks/math_and_code.ipynb -------------------------------------------------------------------------------- /scaling_laws/notebooks/reconcile_bpb.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/notebooks/reconcile_bpb.ipynb -------------------------------------------------------------------------------- /scaling_laws/notebooks/scaling_law_figures.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/notebooks/scaling_law_figures.ipynb -------------------------------------------------------------------------------- /scaling_laws/notebooks/scaling_law_tables.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/notebooks/scaling_law_tables.ipynb -------------------------------------------------------------------------------- /scaling_laws/remote/aws.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/remote/aws.py -------------------------------------------------------------------------------- /scaling_laws/remote/compile_results.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/remote/compile_results.py -------------------------------------------------------------------------------- /scaling_laws/remote/constants.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/remote/constants.py -------------------------------------------------------------------------------- /scaling_laws/remote/hf.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/remote/hf.py -------------------------------------------------------------------------------- /scaling_laws/remote/preprocess.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/remote/preprocess.py -------------------------------------------------------------------------------- /scaling_laws/render_tables.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/render_tables.py -------------------------------------------------------------------------------- /scaling_laws/requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/requirements.txt -------------------------------------------------------------------------------- /scaling_laws/utils/constants/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/utils/constants/__init__.py -------------------------------------------------------------------------------- /scaling_laws/utils/constants/constants_models.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/utils/constants/constants_models.py -------------------------------------------------------------------------------- /scaling_laws/utils/constants/constants_recepies.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/utils/constants/constants_recepies.py -------------------------------------------------------------------------------- /scaling_laws/utils/dataloader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/utils/dataloader.py -------------------------------------------------------------------------------- /scaling_laws/utils/plot.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/utils/plot.py -------------------------------------------------------------------------------- /scaling_laws/utils/scaling_laws.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/utils/scaling_laws.py -------------------------------------------------------------------------------- /scaling_laws/utils/stats.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/utils/stats.py -------------------------------------------------------------------------------- /scaling_laws/utils/table.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/scaling_laws/utils/table.py -------------------------------------------------------------------------------- /single_scale/algorithms/log_fit.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/algorithms/log_fit.py -------------------------------------------------------------------------------- /single_scale/algorithms/scaling_law.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/algorithms/scaling_law.py -------------------------------------------------------------------------------- /single_scale/config.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/config.yaml -------------------------------------------------------------------------------- /single_scale/create_per_task_configs.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/create_per_task_configs.sh -------------------------------------------------------------------------------- /single_scale/data_exploration_and_cleaning.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/data_exploration_and_cleaning.ipynb -------------------------------------------------------------------------------- /single_scale/data_method.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/data_method.py -------------------------------------------------------------------------------- /single_scale/evaluator.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/evaluator.py -------------------------------------------------------------------------------- /single_scale/main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/main.py -------------------------------------------------------------------------------- /single_scale/process_s3_to_csv.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/process_s3_to_csv.py -------------------------------------------------------------------------------- /single_scale/readme.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/readme.md -------------------------------------------------------------------------------- /single_scale/requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/requirements.txt -------------------------------------------------------------------------------- /single_scale/target_filter.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/target_filter.py -------------------------------------------------------------------------------- /single_scale/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/utils.py -------------------------------------------------------------------------------- /single_scale/visualization.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/single_scale/visualization.py -------------------------------------------------------------------------------- /utils/constants.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/utils/constants.py -------------------------------------------------------------------------------- /utils/undo_normalization.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/utils/undo_normalization.py -------------------------------------------------------------------------------- /viz/compute_story_viz.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/viz/compute_story_viz.ipynb -------------------------------------------------------------------------------- /viz/math_and_code.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/viz/math_and_code.ipynb -------------------------------------------------------------------------------- /viz/pred_error.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/viz/pred_error.ipynb -------------------------------------------------------------------------------- /viz/readme.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/viz/readme.md -------------------------------------------------------------------------------- /viz/requirements.txt: -------------------------------------------------------------------------------- 1 | pandas 2 | matplotlib 3 | Jinja2 4 | scipy 5 | datasets -------------------------------------------------------------------------------- /viz/rq_4.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/viz/rq_4.ipynb -------------------------------------------------------------------------------- /viz/table_gen.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allenai/DataDecide/HEAD/viz/table_gen.ipynb --------------------------------------------------------------------------------