├── .gitignore ├── README.md ├── app.py ├── common_utils.py ├── constaints.py ├── data └── examples.json ├── data_utils.py ├── ds_config_first_stage.json ├── ds_config_second_stage.json ├── flash_attention.py ├── inference.py ├── introduction.png ├── main.py ├── mmbench_evaluation.py ├── mme_evaluation.py ├── model.py ├── process_instruction_data.py ├── process_mim.py ├── requirements.txt ├── scripts ├── run_demo.sh ├── run_first_stage.sh ├── run_first_stage_val.sh ├── run_inference.sh ├── run_mmbench_eval.sh ├── run_mme_eval.sh ├── run_second_stage.sh └── run_second_stage_val.sh ├── stable_diffusion.py ├── templates └── index.html └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TextBind: Multi-turn Interleaved Multimodal Instruction-following 2 | 3 |  4 |  5 |  6 | 7 | 8 |
9 | 🌐 Project Page • 🤗 Online Demo • 📃 Paper • ⏬ Data • 🤖 Model 10 |
11 | 12 | 13 | **** 14 | 15 | 16 | ## Content: 17 | * 1. Introduction 18 | * 2. Build Our Demo Locally 19 | * 2.1. Environment Installation 20 | * 2.2. Prepare Vsion Model 21 | * 2.3. Prepare TextBind Weights 22 | * 2.4. Running Demo 23 | * 3. Train Your Own Models Using Our TextBind Recipe 24 | * 3.1. Data Preparation 25 | * 3.2. Prepare BLIP-2 Q-Former 26 | * 3.3. Training Configurations 27 | * 3.4. Training TextBind 28 | * Usage and License Notices 29 | * Citation 30 | 31 | **** 32 | 33 | 34 | 35 | ### 1. Introduction: [Back to Top] 36 | 37 |
38 |
39 |
What it is: This is the demo of TextBind. This demo supports interleaved text and images in a multi-turn conversation. It can also generate appropriate images without showing an explicit description.
187 |How to use:
188 |Tips: (1) If you want to start a new conversation, please use ctrl+R or (cmd+R) to refresh the webpage. (2) Uploading large images (>1MB) may fail, please be careful about the image size. (3) Our server uses the FIFO strategy to handle user requests. Therefore, the waiting time may be very long when there are many users.
200 |