├── .gitignore ├── docs └── opencompass_all.png ├── README.md └── benchmark.md /.gitignore: -------------------------------------------------------------------------------- 1 | ckpts/ -------------------------------------------------------------------------------- /docs/opencompass_all.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alpha-VLLM/WeMix-LLM/HEAD/docs/opencompass_all.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## WeMix-LLM 2 | 3 | WeMix-LLM includes a series of LLMs and multimodal LLMs following the same paradigm. WeMix-LLM is built on [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory). 4 | 5 | 6 | 7 | ### Changelog 8 | * **[2023-10-16]** WeMix-LLM-V2 is now avaliable at [WeMix-LLaMA2-V2-70B](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-V2-70B). 9 | * **[2023-8-31]** Release WeMix-LLM! 10 | 11 | ### Setup 12 | 13 | Please follow the [Environment Setup](https://llama2-accessory.readthedocs.io/en/latest/install.html) of LLaMA2-Accessory. 14 | 15 | ### Models 16 | 17 | #### WeMix-LLaMA2: An Instruction-Following LLM 18 | * Weight: [WeMix-LLaMA2-7B](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-7B), [WeMix-LLaMA2-70B](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-70B), [WeMix-LLaMA2-V2-70B](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-V2-70B). 19 | * Demo: 20 | ```bash 21 | wemix_weight=path/to/WeMix-LLaMA2-[7B/70B]/ 22 | 23 | python demos/multi_turn.py \ 24 | --llama_config ${wemix_weight}/params.json --tokenizer_path ${wemix_weight}/tokenizer.model \ 25 | --pretrained_path ${wemix_weight} --n_gpus [1/4] 26 | ``` 27 | * Benchmark (OpenCompass): 28 | 29 | | Model | WeMix-LLaMA2-70B | LLaMA2-70B | Vicuna-33B | WeMix-LLaMA2-7B | LLaMA-2-7B-Chat | Vicuna-7B | LLaMA-2-7B | 30 | |---------------|------------------|------------|------------|-----------------|-----------------|-----------|------------| 31 | | OVERALL | 58.6 | 57.4 | 50 | 49.6 | 44.8 | 43.4 | 41.6 | 32 | | EXAM | 62.3 | 57.3 | 49.2 | 45.5 | 40.1 | 40.5 | 35.5 | 33 | | LANGUAGE | 52.6 | 51.6 | 44.9 | 45.1 | 44 | 39.6 | 44.1 | 34 | | KNOWLEDGE | 69 | 67.7 | 61.3 | 59.4 | 54.3 | 51.7 | 53.3 | 35 | | UNDERSTANDING | 62.9 | 60.8 | 58.5 | 55.5 | 50.9 | 50.5 | 42.4 | 36 | | REASONING | 54.1 | 55 | 44.7 | 47.4 | 41.4 | 39.9 | 40.1 | 37 | 38 | > Please refer to [benchmark.md](./benchmark.md) for more details. 39 | 40 | #### WeMix-LLaMA2-13B-MM: A Multimodal LLM 41 | 42 | * Weight: [Alpha-VLLM/WeMix-LLaMA2-13B-MM](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-13B-MM) 43 | * Demo: 44 | ```bash 45 | wemix_weight=path/to/WeMix-LLaMA2-13B-MM 46 | 47 | torchrun --nproc-per-node=2 demos/single_turn_mm.py \ 48 | --llama_config ${wemix_weight}/params.json --tokenizer_path ${wemix_weight}/tokenizer.model \ 49 | --pretrained_path ${wemix_weight} 50 | ``` 51 | * Multimodal Benchmark: 52 | 53 | | Model | NoCaps | Flickr30K | 54 | |---------------------------|----------------------|-----------| 55 | | Flamingo-9B | - | 61.5 | 56 | | Flamingo-80B | - | 67.2 | 57 | | Unified-IO-XL | 100.0 | - | 58 | | Kosmos-1 | - | 67.1 | 59 | | Kosmos-2 | - | 66.7 | 60 | | BLIP-2 (Vicuna-13B) | 103.9 | 71.6 | 61 | | InstructBLIP (Vicuna-13B) | 121.9 | 82.8 | 62 | | Shikra (Vicuna-13B) | - | 73.9 | 63 | | Qwen-VL (Qwen-7B) | 121.4 | 85.8 | 64 | | Qwen-VL-Chat | 120.2 | 81.0 | 65 | | WeMix-LLaMA2-13B-MM | 114.7 | 86.0 | 66 | 67 | > The multimodal benchmark is still in progress. Stay tuned!🎉 68 | 69 | 70 | 71 | ### Acknowledgement 72 | 73 | [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), [LLaMA-Adapter](https://github.com/OpenGVLab/LLaMA-Adapter), [LLaMA](https://github.com/facebookresearch/llama). 74 | 75 | ### License 76 | 77 | Llama 2 is licensed under the [LLAMA 2 Community License](https://github.com/facebookresearch/llama/blob/main/LICENSE), Copyright (c) Meta Platforms, Inc. All Rights Reserved. 78 | -------------------------------------------------------------------------------- /benchmark.md: -------------------------------------------------------------------------------- 1 | ## Benchmark 2 | 3 | ### OpenCompass Benchmark 4 | 5 | > Check [OpenCompass Leaderboard](https://opencompass.org.cn/leaderboard-llm) for more details. 6 | 7 | | Model | | Vicuna-33B | WeMix-LLaMA2-7B | LLaMA-2-7B-Chat | Vicuna-7B | LLaMA-2-7B | Alpaca-7B | LLaMA-7B | 8 | |-----------------------------------------|-------------------------|------------|-----------------|-----------------|-----------|------------|-----------|----------| 9 | | | OVERALL | 50 | 49.6 | 44.8 | 43.4 | 41.6 | 39.9 | 38.5 | 10 | | | EXAM | 49.2 | 45.5 | 40.1 | 40.5 | 35.5 | 35.3 | 31.2 | 11 | | | LANGUAGE | 44.9 | 45.1 | 44 | 39.6 | 44.1 | 39.5 | 40.5 | 12 | | | KNOWLEDGE | 61.3 | 59.4 | 54.3 | 51.7 | 53.3 | 44.6 | 49.6 | 13 | | | UNDERSTANDING | 58.5 | 55.5 | 50.9 | 50.5 | 42.4 | 45.1 | 38 | 14 | | | REASONING | 44.7 | 47.4 | 41.4 | 39.9 | 40.1 | 38.1 | 38.5 | 15 | | EXAM | C-Eval | 37.8 | 37.5 | 31.9 | 33.3 | 32.5 | 29.9 | 27.3 | 16 | |   | AGIEval | 32.8 | 31.3 | 28.6 | 26.6 | 21.8 | 24.3 | 20.6 | 17 | |   | MMLU | 59.2 | 54.1 | 46.2 | 48.2 | 46.8 | 41.7 | 35.6 | 18 | |   | CMMLU | 38.8 | 37.2 | 31.5 | 33.5 | 31.8 | 28.7 | 26.8 | 19 | |   | GAOKAO-Bench | 23.9 | 17 | 16.1 | 22.4 | 18.9 | 19.6 | 21.3 | 20 | |   | ARC-c | 69.2 | 65.1 | 54.9 | 51.2 | 40.3 | 41.4 | 34.6 | 21 | |   | ARC-e | 82.9 | 76.5 | 71.6 | 68.4 | 56.1 | 61.4 | 52.2 | 22 | | LANGUAGE | WiC | 67.5 | 62.9 | 55 | 51.2 | 50 | 50.3 | 50.2 | 23 | |   | CHID | 35.1 | 42.1 | 44.1 | 29.2 | 46.5 | 26.7 | 31.2 | 24 | |   | AFQMC | 69 | 68.5 | 69 | 69 | 69 | 69 | 68.9 | 25 | |   | WSC | 68.3 | 66.3 | 69.2 | 61.5 | 66.3 | 65.4 | 65.4 | 26 | |   | TyDiQA | 22.6 | 24.8 | 21.6 | 21.7 | 26.8 | 20.7 | 22.5 | 27 | |   | Flores | 6.7 | 5.8 | 5.2 | 5 | 6 | 4.6 | 4.8 | 28 | | KNOWLEDGE | BoolQ | 85.7 | 85.6 | 81.2 | 78.3 | 74.9 | 79.5 | 75.4 | 29 | |   | CommonSenseQA | 71 | 69.9 | 69.9 | 66.4 | 66.5 | 69.4 | 64.9 | 30 | |   | TriviaQA | 61.2 | 56.1 | 46.4 | 45.6 | 52.8 | 25.1 | 46.3 | 31 | |   | NaturalQuestions | 27.3 | 26.2 | 19.6 | 16.4 | 19.1 | 4.4 | 11.7 | 32 | | REASONING | CMNLI | 30.6 | 42 | 36.1 | 40 | 34.9 | 33 | 34.6 | 33 | |   | OCNLI | 30.8 | 36.2 | 36.4 | 35.4 | 32.1 | 30.3 | 33.6 | 34 | |   | AX-b | 59.7 | 52.1 | 58.5 | 58.2 | 53.5 | 52.3 | 57.4 | 35 | |   | AX-g | 53.9 | 50 | 51.7 | 50 | 55.1 | 47.2 | 50 | 36 | |   | RTE | 52.7 | 56.7 | 55.2 | 48 | 52 | 55.6 | 50.9 | 37 | |   | COPA | 71 | 67 | 72 | 72 | 67 | 72 | 72 | 38 | |   | ReCoRD | 3.1 | 45 | 22.5 | 20.3 | 32.7 | 22.9 | 14.9 | 39 | |   | HellaSwag | 79 | 76.2 | 74.2 | 72.8 | 74 | 73.6 | 74.3 | 40 | |   | PIQA | 80.6 | 78.6 | 76.2 | 78.7 | 78.3 | 78.3 | 78.6 | 41 | |   | SIQA | 65.1 | 71.5 | 55.4 | 54.7 | 48.5 | 52.7 | 46.2 | 42 | |   | MATH | 6.8 | 4.6 | 3.9 | 3.1 | 3.3 | 3 | 2.9 | 43 | |   | GSM8K | 44 | 33 | 26.3 | 14.9 | 16.7 | 6 | 10 | 44 | |   | DROP | 44 | 44.4 | 28.7 | 28.2 | 27.2 | 23.1 | 27.9 | 45 | |   | HumanEval | 15.8 | 29.3 | 12.2 | 10.4 | 12.8 | 9.2 | 12.8 | 46 | |   | MBPP | 26.6 | 29.8 | 17.6 | 14.4 | 14.8 | 17.8 | 16.8 | 47 | |   | BBH | 52 | 42.4 | 35.6 | 37.5 | 38.2 | 32.8 | 33.5 | 48 | | UNDERSTANDING | C3 | 48.3 | 47.8 | 49.8 | 45.6 | 42.9 | 39.3 | 39 | 49 | |   | RACE(Middle) | 79 | 82.3 | 62 | 63.9 | 40.2 | 54.4 | 36.8 | 50 | |   | RACE(High) | 76.4 | 77.8 | 58.6 | 59.3 | 37.5 | 47.2 | 30.6 | 51 | |   | OpenbookQA | 80.4 | 84.6 | 74.4 | 73.2 | 57 | 65.4 | 34.6 | 52 | |   | CSL | 58.1 | 53.8 | 58.8 | 55 | 55.6 | 54.4 | 54.4 | 53 | |   | LCSTS | 11.1 | 4.2 | 9.6 | 11 | 9.1 | 4.3 | 6.7 | 54 | |   | XSum | 27.4 | 32.8 | 20.8 | 24.1 | 19.7 | 27.3 | 20.5 | 55 | |   | EPRSTMT | 73.8 | 46.2 | 57.5 | 53.1 | 46.2 | 46.2 | 46.2 | 56 | |   | LAMBADA | 72.1 | 69.6 | 66.9 | 69.2 | 73.3 | 67.1 | 73.3 | 57 | 58 | 59 | --------------------------------------------------------------------------------