├── .gitignore ├── LICENSE ├── README.md ├── analysis ├── count_words_in_dataset.py └── plot.py ├── clean_and_create ├── generate_dataset.sh ├── load_data.py └── single_job.sh ├── create_only_with_pdfs ├── generate_dataset.sh ├── load_data.py ├── single_job.sh └── upload_data.py ├── docmatix_thumbnail.png ├── florence_2_dataset └── create_florence_2_dataset.py ├── generation ├── base_prompts.py └── llm_swarm_script.py └── zero_shot_exp └── zero_shot.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/README.md -------------------------------------------------------------------------------- /analysis/count_words_in_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/analysis/count_words_in_dataset.py -------------------------------------------------------------------------------- /analysis/plot.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/analysis/plot.py -------------------------------------------------------------------------------- /clean_and_create/generate_dataset.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/clean_and_create/generate_dataset.sh -------------------------------------------------------------------------------- /clean_and_create/load_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/clean_and_create/load_data.py -------------------------------------------------------------------------------- /clean_and_create/single_job.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/clean_and_create/single_job.sh -------------------------------------------------------------------------------- /create_only_with_pdfs/generate_dataset.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/create_only_with_pdfs/generate_dataset.sh -------------------------------------------------------------------------------- /create_only_with_pdfs/load_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/create_only_with_pdfs/load_data.py -------------------------------------------------------------------------------- /create_only_with_pdfs/single_job.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/create_only_with_pdfs/single_job.sh -------------------------------------------------------------------------------- /create_only_with_pdfs/upload_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/create_only_with_pdfs/upload_data.py -------------------------------------------------------------------------------- /docmatix_thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/docmatix_thumbnail.png -------------------------------------------------------------------------------- /florence_2_dataset/create_florence_2_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/florence_2_dataset/create_florence_2_dataset.py -------------------------------------------------------------------------------- /generation/base_prompts.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/generation/base_prompts.py -------------------------------------------------------------------------------- /generation/llm_swarm_script.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/generation/llm_swarm_script.py -------------------------------------------------------------------------------- /zero_shot_exp/zero_shot.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/docmatix/HEAD/zero_shot_exp/zero_shot.py --------------------------------------------------------------------------------