├── .gitignore ├── EVAL.md ├── LICENSE ├── README.md ├── TRAIN.md ├── app.py ├── assets ├── arch.png ├── bagel-cot-example.png ├── emerging_curves.png ├── teaser.webp └── zebra_cot_datacard.png ├── data ├── __init__.py ├── configs │ └── example.yaml ├── data_utils.py ├── dataset_base.py ├── dataset_info.py ├── distributed_iterable_dataset.py ├── interleave_datasets │ ├── __init__.py │ ├── edit_dataset.py │ ├── interleave_t2i_dataset.py │ └── think_trace_dataset.py ├── parquet_utils.py ├── t2i_dataset.py ├── transforms.py ├── video_utils.py └── vlm_dataset.py ├── download_model.py ├── inference.ipynb ├── inferencer.py ├── infz_bf16.py ├── modeling ├── __init__.py ├── autoencoder.py ├── bagel │ ├── __init__.py │ ├── bagel.py │ ├── modeling_utils.py │ ├── qwen2_navit.py │ └── siglip_navit.py ├── qwen2 │ ├── __init__.py │ ├── configuration_qwen2.py │ ├── modeling_qwen2.py │ ├── tokenization_qwen2.py │ └── tokenization_qwen2_fast.py └── siglip │ ├── __init__.py │ ├── configuration_siglip.py │ ├── convert_siglip_to_hf.py │ ├── image_processing_siglip.py │ ├── modeling_siglip.py │ ├── processing_siglip.py │ └── tokenization_siglip.py ├── requirements.txt ├── scripts ├── eval │ ├── eval_vlm.sh │ ├── run_eval_vlm.sh │ ├── run_gedit.sh │ ├── run_geneval.sh │ ├── run_imgedit.sh │ ├── run_kris.sh │ ├── run_rise.sh │ └── run_wise.sh └── train.sh ├── test_images ├── image.png ├── meme.jpg ├── octupusy.jpg └── women.jpg └── train ├── __init__.py ├── fsdp_utils.py ├── pretrain_unified_navit.py └── train_utils.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/.gitignore -------------------------------------------------------------------------------- /EVAL.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/EVAL.md -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/README.md -------------------------------------------------------------------------------- /TRAIN.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/TRAIN.md -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/app.py -------------------------------------------------------------------------------- /assets/arch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/assets/arch.png -------------------------------------------------------------------------------- /assets/bagel-cot-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/assets/bagel-cot-example.png -------------------------------------------------------------------------------- /assets/emerging_curves.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/assets/emerging_curves.png -------------------------------------------------------------------------------- /assets/teaser.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/assets/teaser.webp -------------------------------------------------------------------------------- /assets/zebra_cot_datacard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/assets/zebra_cot_datacard.png -------------------------------------------------------------------------------- /data/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/__init__.py -------------------------------------------------------------------------------- /data/configs/example.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/configs/example.yaml -------------------------------------------------------------------------------- /data/data_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/data_utils.py -------------------------------------------------------------------------------- /data/dataset_base.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/dataset_base.py -------------------------------------------------------------------------------- /data/dataset_info.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/dataset_info.py -------------------------------------------------------------------------------- /data/distributed_iterable_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/distributed_iterable_dataset.py -------------------------------------------------------------------------------- /data/interleave_datasets/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/interleave_datasets/__init__.py -------------------------------------------------------------------------------- /data/interleave_datasets/edit_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/interleave_datasets/edit_dataset.py -------------------------------------------------------------------------------- /data/interleave_datasets/interleave_t2i_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/interleave_datasets/interleave_t2i_dataset.py -------------------------------------------------------------------------------- /data/interleave_datasets/think_trace_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/interleave_datasets/think_trace_dataset.py -------------------------------------------------------------------------------- /data/parquet_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/parquet_utils.py -------------------------------------------------------------------------------- /data/t2i_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/t2i_dataset.py -------------------------------------------------------------------------------- /data/transforms.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/transforms.py -------------------------------------------------------------------------------- /data/video_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/video_utils.py -------------------------------------------------------------------------------- /data/vlm_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/data/vlm_dataset.py -------------------------------------------------------------------------------- /download_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/download_model.py -------------------------------------------------------------------------------- /inference.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/inference.ipynb -------------------------------------------------------------------------------- /inferencer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/inferencer.py -------------------------------------------------------------------------------- /infz_bf16.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/infz_bf16.py -------------------------------------------------------------------------------- /modeling/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/__init__.py -------------------------------------------------------------------------------- /modeling/autoencoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/autoencoder.py -------------------------------------------------------------------------------- /modeling/bagel/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/bagel/__init__.py -------------------------------------------------------------------------------- /modeling/bagel/bagel.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/bagel/bagel.py -------------------------------------------------------------------------------- /modeling/bagel/modeling_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/bagel/modeling_utils.py -------------------------------------------------------------------------------- /modeling/bagel/qwen2_navit.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/bagel/qwen2_navit.py -------------------------------------------------------------------------------- /modeling/bagel/siglip_navit.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/bagel/siglip_navit.py -------------------------------------------------------------------------------- /modeling/qwen2/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/qwen2/__init__.py -------------------------------------------------------------------------------- /modeling/qwen2/configuration_qwen2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/qwen2/configuration_qwen2.py -------------------------------------------------------------------------------- /modeling/qwen2/modeling_qwen2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/qwen2/modeling_qwen2.py -------------------------------------------------------------------------------- /modeling/qwen2/tokenization_qwen2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/qwen2/tokenization_qwen2.py -------------------------------------------------------------------------------- /modeling/qwen2/tokenization_qwen2_fast.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/qwen2/tokenization_qwen2_fast.py -------------------------------------------------------------------------------- /modeling/siglip/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/siglip/__init__.py -------------------------------------------------------------------------------- /modeling/siglip/configuration_siglip.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/siglip/configuration_siglip.py -------------------------------------------------------------------------------- /modeling/siglip/convert_siglip_to_hf.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/siglip/convert_siglip_to_hf.py -------------------------------------------------------------------------------- /modeling/siglip/image_processing_siglip.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/siglip/image_processing_siglip.py -------------------------------------------------------------------------------- /modeling/siglip/modeling_siglip.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/siglip/modeling_siglip.py -------------------------------------------------------------------------------- /modeling/siglip/processing_siglip.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/siglip/processing_siglip.py -------------------------------------------------------------------------------- /modeling/siglip/tokenization_siglip.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/modeling/siglip/tokenization_siglip.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/requirements.txt -------------------------------------------------------------------------------- /scripts/eval/eval_vlm.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/scripts/eval/eval_vlm.sh -------------------------------------------------------------------------------- /scripts/eval/run_eval_vlm.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/scripts/eval/run_eval_vlm.sh -------------------------------------------------------------------------------- /scripts/eval/run_gedit.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/scripts/eval/run_gedit.sh -------------------------------------------------------------------------------- /scripts/eval/run_geneval.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/scripts/eval/run_geneval.sh -------------------------------------------------------------------------------- /scripts/eval/run_imgedit.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/scripts/eval/run_imgedit.sh -------------------------------------------------------------------------------- /scripts/eval/run_kris.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/scripts/eval/run_kris.sh -------------------------------------------------------------------------------- /scripts/eval/run_rise.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/scripts/eval/run_rise.sh -------------------------------------------------------------------------------- /scripts/eval/run_wise.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/scripts/eval/run_wise.sh -------------------------------------------------------------------------------- /scripts/train.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/scripts/train.sh -------------------------------------------------------------------------------- /test_images/image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/test_images/image.png -------------------------------------------------------------------------------- /test_images/meme.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/test_images/meme.jpg -------------------------------------------------------------------------------- /test_images/octupusy.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/test_images/octupusy.jpg -------------------------------------------------------------------------------- /test_images/women.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/test_images/women.jpg -------------------------------------------------------------------------------- /train/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/train/__init__.py -------------------------------------------------------------------------------- /train/fsdp_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/train/fsdp_utils.py -------------------------------------------------------------------------------- /train/pretrain_unified_navit.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/train/pretrain_unified_navit.py -------------------------------------------------------------------------------- /train/train_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/multimodal-reasoning-lab/Bagel-Zebra-CoT/HEAD/train/train_utils.py --------------------------------------------------------------------------------