├── .gitignore
├── docs
    └── opencompass_all.png
├── README.md
└── benchmark.md


/.gitignore:
--------------------------------------------------------------------------------
1 | ckpts/


--------------------------------------------------------------------------------
/docs/opencompass_all.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alpha-VLLM/WeMix-LLM/HEAD/docs/opencompass_all.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## WeMix-LLM
 2 | 
 3 | WeMix-LLM includes a series of LLMs and multimodal LLMs following the same paradigm. WeMix-LLM is built on [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory).
 4 | 
 5 | <img src="docs/opencompass_all.png" width="90%" />
 6 | 
 7 | ### Changelog
 8 | * **[2023-10-16]** WeMix-LLM-V2 is now avaliable at [WeMix-LLaMA2-V2-70B](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-V2-70B).
 9 | * **[2023-8-31]** Release WeMix-LLM!
10 | 
11 | ### Setup
12 | 
13 | Please follow the [Environment Setup](https://llama2-accessory.readthedocs.io/en/latest/install.html) of LLaMA2-Accessory.
14 | 
15 | ### Models
16 | 
17 | #### WeMix-LLaMA2: An Instruction-Following LLM
18 | * Weight: [WeMix-LLaMA2-7B](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-7B), [WeMix-LLaMA2-70B](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-70B), [WeMix-LLaMA2-V2-70B](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-V2-70B).
19 | * Demo:
20 |     ```bash
21 |     wemix_weight=path/to/WeMix-LLaMA2-[7B/70B]/
22 | 
23 |     python demos/multi_turn.py \
24 |     --llama_config ${wemix_weight}/params.json --tokenizer_path ${wemix_weight}/tokenizer.model \
25 |     --pretrained_path ${wemix_weight} --n_gpus [1/4]
26 |     ```
27 | * Benchmark (OpenCompass):
28 | 
29 | | Model         | WeMix-LLaMA2-70B | LLaMA2-70B | Vicuna-33B | WeMix-LLaMA2-7B | LLaMA-2-7B-Chat | Vicuna-7B | LLaMA-2-7B |
30 | |---------------|------------------|------------|------------|-----------------|-----------------|-----------|------------|
31 | | OVERALL       | 58.6             | 57.4       | 50         | 49.6            | 44.8            | 43.4      | 41.6       |
32 | | EXAM          | 62.3             | 57.3       | 49.2       | 45.5            | 40.1            | 40.5      | 35.5       |
33 | | LANGUAGE      | 52.6             | 51.6       | 44.9       | 45.1            | 44              | 39.6      | 44.1       |
34 | | KNOWLEDGE     | 69               | 67.7       | 61.3       | 59.4            | 54.3            | 51.7      | 53.3       |
35 | | UNDERSTANDING | 62.9             | 60.8       | 58.5       | 55.5            | 50.9            | 50.5      | 42.4       |
36 | | REASONING     | 54.1             | 55         | 44.7       | 47.4            | 41.4            | 39.9      | 40.1       |
37 | 
38 | > Please refer to [benchmark.md](./benchmark.md) for more details.
39 | 
40 | #### WeMix-LLaMA2-13B-MM: A Multimodal LLM
41 | 
42 | * Weight: [Alpha-VLLM/WeMix-LLaMA2-13B-MM](https://huggingface.co/Alpha-VLLM/WeMix-LLaMA2-13B-MM)
43 | * Demo:
44 | ```bash
45 | wemix_weight=path/to/WeMix-LLaMA2-13B-MM
46 | 
47 | torchrun --nproc-per-node=2 demos/single_turn_mm.py \
48 | --llama_config ${wemix_weight}/params.json --tokenizer_path ${wemix_weight}/tokenizer.model \
49 | --pretrained_path ${wemix_weight}
50 | ```
51 | * Multimodal Benchmark:
52 | 
53 | | Model                     | NoCaps               | Flickr30K |
54 | |---------------------------|----------------------|-----------|
55 | | Flamingo-9B               | -                    | 61.5      |
56 | | Flamingo-80B              | -                    | 67.2      |
57 | | Unified-IO-XL             | 100.0                | -         |
58 | | Kosmos-1                  | -                    | 67.1      |
59 | | Kosmos-2                  | -                    | 66.7      |
60 | | BLIP-2 (Vicuna-13B)       | 103.9                | 71.6      |
61 | | InstructBLIP (Vicuna-13B) | 121.9                | 82.8      |
62 | | Shikra (Vicuna-13B)       | -                    | 73.9      |
63 | | Qwen-VL (Qwen-7B)         | 121.4                | 85.8      |
64 | | Qwen-VL-Chat              | 120.2                | 81.0      |
65 | | WeMix-LLaMA2-13B-MM        | 114.7                | 86.0      |
66 | 
67 | > The multimodal benchmark is still in progress. Stay tuned!🎉
68 | 
69 | <!-- ### Contributors -->
70 | 
71 | ### Acknowledgement
72 | 
73 | [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), [LLaMA-Adapter](https://github.com/OpenGVLab/LLaMA-Adapter), [LLaMA](https://github.com/facebookresearch/llama).
74 | 
75 | ### License
76 | 
77 | Llama 2 is licensed under the [LLAMA 2 Community License](https://github.com/facebookresearch/llama/blob/main/LICENSE), Copyright (c) Meta Platforms, Inc. All Rights Reserved.
78 | 


--------------------------------------------------------------------------------
/benchmark.md:
--------------------------------------------------------------------------------
 1 | ## Benchmark
 2 | 
 3 | ### OpenCompass Benchmark
 4 | 
 5 | > Check [OpenCompass Leaderboard](https://opencompass.org.cn/leaderboard-llm) for more details.
 6 | 
 7 | |             Model                       |                         | Vicuna-33B | WeMix-LLaMA2-7B | LLaMA-2-7B-Chat | Vicuna-7B | LLaMA-2-7B | Alpaca-7B | LLaMA-7B |
 8 | |-----------------------------------------|-------------------------|------------|-----------------|-----------------|-----------|------------|-----------|----------|
 9 | |                                         | OVERALL                 | 50         | 49.6            | 44.8            | 43.4      | 41.6       | 39.9      | 38.5     |
10 | |                                         | EXAM                    | 49.2       | 45.5            | 40.1            | 40.5      | 35.5       | 35.3      | 31.2     |
11 | |                                         | LANGUAGE                | 44.9       | 45.1            | 44              | 39.6      | 44.1       | 39.5      | 40.5     |
12 | |                                         | KNOWLEDGE               | 61.3       | 59.4            | 54.3            | 51.7      | 53.3       | 44.6      | 49.6     |
13 | |                                         | UNDERSTANDING           | 58.5       | 55.5            | 50.9            | 50.5      | 42.4       | 45.1      | 38       |
14 | |                                         | REASONING               | 44.7       | 47.4            | 41.4            | 39.9      | 40.1       | 38.1      | 38.5     |
15 | | EXAM                                    | C-Eval                  | 37.8       | 37.5            | 31.9            | 33.3      | 32.5       | 29.9      | 27.3     |
16 | | 　                                      | AGIEval                 | 32.8       | 31.3            | 28.6            | 26.6      | 21.8       | 24.3      | 20.6     |
17 | | 　                                      | MMLU                    | 59.2       | 54.1            | 46.2            | 48.2      | 46.8       | 41.7      | 35.6     |
18 | | 　                                      | CMMLU                   | 38.8       | 37.2            | 31.5            | 33.5      | 31.8       | 28.7      | 26.8     |
19 | | 　                                      | GAOKAO-Bench            | 23.9       | 17              | 16.1            | 22.4      | 18.9       | 19.6      | 21.3     |
20 | | 　                                      | ARC-c                   | 69.2       | 65.1            | 54.9            | 51.2      | 40.3       | 41.4      | 34.6     |
21 | | 　                                      | ARC-e                   | 82.9       | 76.5            | 71.6            | 68.4      | 56.1       | 61.4      | 52.2     |
22 | | LANGUAGE                                |     WiC                 | 67.5       | 62.9            | 55              | 51.2      | 50         | 50.3      | 50.2     |
23 | | 　                                      |     CHID                | 35.1       | 42.1            | 44.1            | 29.2      | 46.5       | 26.7      | 31.2     |
24 | | 　                                      |     AFQMC               | 69         | 68.5            | 69              | 69        | 69         | 69        | 68.9     |
25 | | 　                                      |     WSC                 | 68.3       | 66.3            | 69.2            | 61.5      | 66.3       | 65.4      | 65.4     |
26 | | 　                                      |     TyDiQA              | 22.6       | 24.8            | 21.6            | 21.7      | 26.8       | 20.7      | 22.5     |
27 | | 　                                      |     Flores              | 6.7        | 5.8             | 5.2             | 5         | 6          | 4.6       | 4.8      |
28 | | KNOWLEDGE                               |     BoolQ               | 85.7       | 85.6            | 81.2            | 78.3      | 74.9       | 79.5      | 75.4     |
29 | | 　                                      |     CommonSenseQA       | 71         | 69.9            | 69.9            | 66.4      | 66.5       | 69.4      | 64.9     |
30 | | 　                                      |     TriviaQA            | 61.2       | 56.1            | 46.4            | 45.6      | 52.8       | 25.1      | 46.3     |
31 | | 　                                      |     NaturalQuestions    | 27.3       | 26.2            | 19.6            | 16.4      | 19.1       | 4.4       | 11.7     |
32 | | REASONING                               |     CMNLI               | 30.6       | 42              | 36.1            | 40        | 34.9       | 33        | 34.6     |
33 | | 　                                      |     OCNLI               | 30.8       | 36.2            | 36.4            | 35.4      | 32.1       | 30.3      | 33.6     |
34 | | 　                                      |     AX-b                | 59.7       | 52.1            | 58.5            | 58.2      | 53.5       | 52.3      | 57.4     |
35 | | 　                                      |     AX-g                | 53.9       | 50              | 51.7            | 50        | 55.1       | 47.2      | 50       |
36 | | 　                                      |     RTE                 | 52.7       | 56.7            | 55.2            | 48        | 52         | 55.6      | 50.9     |
37 | | 　                                      |     COPA                | 71         | 67              | 72              | 72        | 67         | 72        | 72       |
38 | | 　                                      |     ReCoRD              | 3.1        | 45              | 22.5            | 20.3      | 32.7       | 22.9      | 14.9     |
39 | | 　                                      |     HellaSwag           | 79         | 76.2            | 74.2            | 72.8      | 74         | 73.6      | 74.3     |
40 | | 　                                      |     PIQA                | 80.6       | 78.6            | 76.2            | 78.7      | 78.3       | 78.3      | 78.6     |
41 | | 　                                      |     SIQA                | 65.1       | 71.5            | 55.4            | 54.7      | 48.5       | 52.7      | 46.2     |
42 | | 　                                      |     MATH                | 6.8        | 4.6             | 3.9             | 3.1       | 3.3        | 3         | 2.9      |
43 | | 　                                      |     GSM8K               | 44         | 33              | 26.3            | 14.9      | 16.7       | 6         | 10       |
44 | | 　                                      |     DROP                | 44         | 44.4            | 28.7            | 28.2      | 27.2       | 23.1      | 27.9     |
45 | | 　                                      |     HumanEval           | 15.8       | 29.3            | 12.2            | 10.4      | 12.8       | 9.2       | 12.8     |
46 | | 　                                      |     MBPP                | 26.6       | 29.8            | 17.6            | 14.4      | 14.8       | 17.8      | 16.8     |
47 | | 　                                      |     BBH                 | 52         | 42.4            | 35.6            | 37.5      | 38.2       | 32.8      | 33.5     |
48 | | UNDERSTANDING                           |     C3                  | 48.3       | 47.8            | 49.8            | 45.6      | 42.9       | 39.3      | 39       |
49 | | 　                                      |     RACE(Middle)        | 79         | 82.3            | 62              | 63.9      | 40.2       | 54.4      | 36.8     |
50 | | 　                                      |     RACE(High)          | 76.4       | 77.8            | 58.6            | 59.3      | 37.5       | 47.2      | 30.6     |
51 | | 　                                      |     OpenbookQA          | 80.4       | 84.6            | 74.4            | 73.2      | 57         | 65.4      | 34.6     |
52 | | 　                                      |     CSL                 | 58.1       | 53.8            | 58.8            | 55        | 55.6       | 54.4      | 54.4     |
53 | | 　                                      |     LCSTS               | 11.1       | 4.2             | 9.6             | 11        | 9.1        | 4.3       | 6.7      |
54 | | 　                                      |     XSum                | 27.4       | 32.8            | 20.8            | 24.1      | 19.7       | 27.3      | 20.5     |
55 | | 　                                      |     EPRSTMT             | 73.8       | 46.2            | 57.5            | 53.1      | 46.2       | 46.2      | 46.2     |
56 | | 　                                      |     LAMBADA             | 72.1       | 69.6            | 66.9            | 69.2      | 73.3       | 67.1      | 73.3     |
57 | 
58 | 
59 | 


--------------------------------------------------------------------------------