├── LICENSE ├── README.md ├── app.py ├── checkpoints └── PUT_CKPT_HERE ├── docs ├── eval.md ├── inference.md └── models.md ├── eval ├── eval_egoschema.py ├── eval_mlvu.py ├── eval_mvbench.py └── eval_videomme.py ├── examples ├── video1.mp4 ├── video2.mp4 └── video3.mp4 ├── longvu ├── __init__.py ├── apply_delta.py ├── builder.py ├── cambrian_arch.py ├── consolidate.py ├── constants.py ├── conversation.py ├── file_io.py ├── language_model │ ├── cambrian_llama.py │ └── cambrian_qwen.py ├── make_delta.py ├── mm_datautils.py ├── mm_trainer.py ├── mm_utils.py ├── multimodal_encoder │ ├── base_encoder.py │ ├── builder.py │ ├── dino_encoder.py │ ├── drop.py │ ├── image.py │ ├── logging.py │ ├── loss.py │ ├── registry.py │ ├── siglip_encoder.py │ └── utils.py ├── multimodal_projector │ └── builder.py ├── train.py ├── utils.py └── vision_sampler.py ├── requirements.txt └── scripts ├── train_image_llama3_2.sh ├── train_image_qwen.sh ├── train_video_llama3_2.sh └── train_video_qwen.sh /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/README.md -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/app.py -------------------------------------------------------------------------------- /checkpoints/PUT_CKPT_HERE: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /docs/eval.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/docs/eval.md -------------------------------------------------------------------------------- /docs/inference.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/docs/inference.md -------------------------------------------------------------------------------- /docs/models.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/docs/models.md -------------------------------------------------------------------------------- /eval/eval_egoschema.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/eval/eval_egoschema.py -------------------------------------------------------------------------------- /eval/eval_mlvu.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/eval/eval_mlvu.py -------------------------------------------------------------------------------- /eval/eval_mvbench.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/eval/eval_mvbench.py -------------------------------------------------------------------------------- /eval/eval_videomme.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/eval/eval_videomme.py -------------------------------------------------------------------------------- /examples/video1.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/examples/video1.mp4 -------------------------------------------------------------------------------- /examples/video2.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/examples/video2.mp4 -------------------------------------------------------------------------------- /examples/video3.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/examples/video3.mp4 -------------------------------------------------------------------------------- /longvu/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/__init__.py -------------------------------------------------------------------------------- /longvu/apply_delta.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/apply_delta.py -------------------------------------------------------------------------------- /longvu/builder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/builder.py -------------------------------------------------------------------------------- /longvu/cambrian_arch.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/cambrian_arch.py -------------------------------------------------------------------------------- /longvu/consolidate.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/consolidate.py -------------------------------------------------------------------------------- /longvu/constants.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/constants.py -------------------------------------------------------------------------------- /longvu/conversation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/conversation.py -------------------------------------------------------------------------------- /longvu/file_io.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/file_io.py -------------------------------------------------------------------------------- /longvu/language_model/cambrian_llama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/language_model/cambrian_llama.py -------------------------------------------------------------------------------- /longvu/language_model/cambrian_qwen.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/language_model/cambrian_qwen.py -------------------------------------------------------------------------------- /longvu/make_delta.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/make_delta.py -------------------------------------------------------------------------------- /longvu/mm_datautils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/mm_datautils.py -------------------------------------------------------------------------------- /longvu/mm_trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/mm_trainer.py -------------------------------------------------------------------------------- /longvu/mm_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/mm_utils.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/base_encoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/base_encoder.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/builder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/builder.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/dino_encoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/dino_encoder.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/drop.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/drop.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/image.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/image.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/logging.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/logging.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/loss.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/loss.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/registry.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/registry.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/siglip_encoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/siglip_encoder.py -------------------------------------------------------------------------------- /longvu/multimodal_encoder/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_encoder/utils.py -------------------------------------------------------------------------------- /longvu/multimodal_projector/builder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/multimodal_projector/builder.py -------------------------------------------------------------------------------- /longvu/train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/train.py -------------------------------------------------------------------------------- /longvu/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/utils.py -------------------------------------------------------------------------------- /longvu/vision_sampler.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/longvu/vision_sampler.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/requirements.txt -------------------------------------------------------------------------------- /scripts/train_image_llama3_2.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/scripts/train_image_llama3_2.sh -------------------------------------------------------------------------------- /scripts/train_image_qwen.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/scripts/train_image_qwen.sh -------------------------------------------------------------------------------- /scripts/train_video_llama3_2.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/scripts/train_video_llama3_2.sh -------------------------------------------------------------------------------- /scripts/train_video_qwen.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Vision-CAIR/LongVU/HEAD/scripts/train_video_qwen.sh --------------------------------------------------------------------------------