├── README.md ├── README_zh.md ├── chat.py ├── chat_genai.py ├── convert.py └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | English | [简体中文](README_zh.md) 2 | 3 | # Qwen2.openvino Demo 4 | 5 | This sample shows how to deploy Qwen2 using OpenVINO 6 | 7 | ## 1. Environment configuration 8 | 9 | We recommend that you create a new virtual environment and then install the dependencies as follows. The 10 | recommended Python version is `3.10+`. 11 | 12 | Linux 13 | 14 | ``` 15 | python3 -m venv openvino_env 16 | 17 | source openvino_env/bin/activate 18 | 19 | python3 -m pip install --upgrade pip 20 | 21 | pip install wheel setuptools 22 | 23 | pip install -r requirements.txt 24 | ``` 25 | 26 | Windows Powershell 27 | 28 | ``` 29 | python3 -m venv openvino_env 30 | 31 | .\openvino_env\Scripts\activate 32 | 33 | python3 -m pip install --upgrade pip 34 | 35 | pip install wheel setuptools 36 | 37 | pip install -r requirements.txt 38 | ``` 39 | > Note: 40 | > If you are using an existing python environment, recommend following command to use all the dependencies with latest versions: 41 | > pip install -U --upgrade-strategy eager -r requirements.txt 42 | 43 | ## 2. Convert model 44 | 45 | Since the Hugging Face model needs to be converted to an OpenVINO IR model, you need to download the model and convert. 46 | 47 | ``` 48 | python3 convert.py --model_id qwen/Qwen2-7B-Instruct --precision int4 --output {your_path}/Qwen2-7B-Instruct-ov --modelscope 49 | ``` 50 | 51 | ### Parameters that can be selected 52 | 53 | * `--model_id` - path (absolute path) to be used from Huggngface_hub (https://huggingface.co/models) or the directory 54 | where the model is located. 55 | * `--precision` - model precision: fp16, int8 or int4. 56 | * `--output` - the path where the converted model is saved. 57 | * `--modelscope` - if downloading the model from Model Scope. 58 | 59 | ## 3. Run streaming chatbot 60 | 61 | ``` 62 | python3 chat.py --model_path {your_path}/Qwen2-7B-Instruct-ov --max_sequence_length 4096 --device CPU 63 | ``` 64 | 65 | or 66 | 67 | ``` 68 | python3 chat_genai.py --model_path {your_path}/Qwen2-7B-Instruct-ov --max_sequence_length 4096 --device CPU 69 | ``` 70 | 71 | ### Parameters that can be selected 72 | 73 | * `--model_path` - The path to the directory where the OpenVINO IR model is located. 74 | * `--max_sequence_length` - Maximum size of output tokens. 75 | * `--device` - The device to run inference on. e.g "CPU","GPU". 76 | 77 | ## Example 78 | 79 | ``` 80 | ====Starting conversation==== 81 | User: hello 82 | Qwen2-OpenVINO: Hello! How can I assist you today? 83 | 84 | User: who are you ? 85 | Qwen2-OpenVINO: I am an AI language model created by Alibaba Cloud. My purpose is to help users with their questions and provide them with accurate information. Is there anything specific you would like to know about me? 86 | 87 | User: could you tell me a story ? 88 | Qwen2-OpenVINO: Sure, here's a short story for you: 89 | 90 | Once upon a time, in a small village nestled in the mountains, there lived a young girl named Lily who loved nature. She spent most of her days exploring the forest and watching the birds singing. 91 | 92 | One day, while she was wandering through the woods, she stumbled upon a hidden cave deep within the forest. Inside, she found a beautiful crystal that sparkled with light. She picked it up and held it close to her heart, feeling a sense of joy and wonder. 93 | 94 | As she walked away from the cave, she felt a sense of peace wash over her. She realized that sometimes, the things we miss the most are the simple things in life, like the beauty of nature or the warmth of the sun on our skin. 95 | 96 | From that day forward, Lily made a habit of spending time in nature whenever she could. She would spend hours walking through the forest, watching the birds sing, and taking in the beauty around her. She knew that these moments were precious and that they would stay with her forever. 97 | 98 | And so, Lily continued to live her life with a sense of joy and wonder, always cherishing the simple things in life. 99 | 100 | User: please give this story a title 101 | Qwen2-OpenVINO: "Nature's Magic: A Journey Through the Forest Crystal" 102 | ``` 103 | 104 | ## FAQ 105 | 106 | 1. Do I need to install the OpenVINO C++ inference engine? 107 | - Unnecessary 108 | 109 | 2. Do I have to use Intel hardware? 110 | - It is recommended to use Intel x86 devices, and this is where we tested it. For example: 111 | - Intel CPU, including personal computer CPU and server CPU. 112 | - Intel's integrated GPU. For example: Arc™ Series and Iris® Series. 113 | - Intel's discrete graphics card. For example: ARC™ A770 graphics card. 114 | 115 | 3. Why can't OpenVINO find the GPU in my system(Linux)? 116 | - Ensure OpenCL drivers are installed correctly. 117 | - Ensure you enabled the right permissions for GPU device 118 | - More information can be found in [Install GPU drivers](https://github.com/openvinotoolkit/openvino_notebooks/wiki/Ubuntu#1-install-python-git-and-gpu-drivers-optional) 119 | 120 | 4. Is C++ supported ? 121 | - Please refer to this [example](https://github.com/openvinotoolkit/openvino.genai/tree/master/src) 122 | 123 | 124 | Post your questions [here](https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/bd-p/distribution-openvino-toolkit). 125 | -------------------------------------------------------------------------------- /README_zh.md: -------------------------------------------------------------------------------- 1 | 简体中文 | [English](README.md) 2 | 3 | # Qwen2.openvino Demo 4 | 5 | 这是如何使用 OpenVINO 部署 Qwen2 的示例 6 | 7 | ## 1. 环境配置 8 | 9 | 我们推荐您新建一个虚拟环境,然后按照以下安装依赖。 10 | 推荐在python3.10以上的环境下运行该示例。 11 | 12 | Linux 13 | 14 | ``` 15 | python3 -m venv openvino_env 16 | 17 | source openvino_env/bin/activate 18 | 19 | python3 -m pip install --upgrade pip 20 | 21 | pip install wheel setuptools 22 | 23 | pip install -r requirements.txt 24 | ``` 25 | 26 | Windows Powershell 27 | 28 | ``` 29 | python3 -m venv openvino_env 30 | 31 | .\openvino_env\Scripts\activate 32 | 33 | python3 -m pip install --upgrade pip 34 | 35 | pip install wheel setuptools 36 | 37 | pip install -r requirements.txt 38 | ``` 39 | > Note: 40 | > 如果你使用的是一个已经存在的python环境,请使用以下方法进行更新 41 | > pip install -U --upgrade-strategy eager -r requirements.txt 42 | 43 | ## 2. 转换模型 44 | 45 | 由于需要将Huggingface模型转换为OpenVINO IR模型,因此您需要下载模型并转换。 46 | 47 | ``` 48 | python3 convert.py --model_id qwen/Qwen2-7B-Instruct --precision int4 --output {your_path}/Qwen2-7B-Instruct-ov --modelscope 49 | ``` 50 | 51 | ### 可以选择的参数 52 | 53 | * `--model_id` - 用于从 Huggngface_hub (https://huggingface.co/models) 或 模型所在目录的路径(绝对路径) 54 | * `--precision` - 模型精度:fp16, int8 或 int4。 55 | * `--output` - 转换后模型保存的地址 56 | * `--modelscope` - 通过魔搭社区下载模型 57 | 58 | ## 3. 运行流式聊天机器人 59 | 60 | ``` 61 | python3 chat.py --model_path {your_path}/Qwen2-7B-Instruct-ov --max_sequence_length 4096 --device CPU 62 | ``` 63 | 64 | 或者 65 | 66 | ``` 67 | python3 chat_genai.py --model_path {your_path}/Qwen2-7B-Instruct-ov --max_sequence_length 4096 --device CPU 68 | ``` 69 | 70 | ### 可以选择的参数 71 | 72 | * `--model_path` - OpenVINO IR 模型所在目录的路径。 73 | * `--max_sequence_length` - 输出标记的最大大小。 74 | * `--device` - 运行推理的设备。例如:"CPU","GPU"。 75 | 76 | ## 例子 77 | 78 | ``` 79 | ====Starting conversation==== 80 | 用户: 你好 81 | Qwen2-OpenVINO: 你好!有什么我可以帮助你的吗? 82 | 83 | 用户: 你是谁? 84 | Qwen2-OpenVINO: 我是来自阿里云的超大规模语言模型,我叫通义千问。 85 | 86 | 用户: 请给我讲一个故事 87 | Qwen2-OpenVINO: 好的,这是一个关于一只小兔子和它的朋友的故事。 88 | 89 | 有一天,小兔子和他的朋友们决定去森林里探险。他们带上食物、水和一些工具,开始了他们的旅程。在旅途中,他们遇到了各种各样的动物,包括松鼠、狐狸、小鸟等等。他们一起玩耍、分享食物,还互相帮助解决问题。最后,他们在森林的深处找到了一个神秘的洞穴,里面藏着许多宝藏。他们带着所有的宝藏回到了家,庆祝这次愉快的冒险。 90 | 91 | 用户: 请为这个故事起个标题 92 | Qwen2-OpenVINO: "小兔子与朋友们的冒险之旅" 93 | ``` 94 | 95 | ## 常见问题 96 | 97 | 1. 需要安装 OpenVINO C++ 推理引擎吗 98 | - 不需要 99 | 100 | 2. 一定要使用 Intel 的硬件吗? 101 | - 我们仅在 Intel 设备上尝试,我们推荐使用x86架构的英特尔设备,包括但不限制于: 102 | - 英特尔的CPU,包括个人电脑CPU 和服务器CPU。 103 | - 英特尔的集成显卡。 例如:Arc™,Iris® 系列。 104 | - 英特尔的独立显卡。例如:ARC™ A770 显卡。 105 | 106 | 3. 为什么OpenVINO没检测到我系统上的GPU设备? 107 | - 确保OpenCL驱动是安装正确的。 108 | - 确保你有足够的权限访问GPU设备 109 | - 更多信息可以参考[Install GPU drivers](https://github.com/openvinotoolkit/openvino_notebooks/wiki/Ubuntu#1-install-python-git-and-gpu-drivers-optional) 110 | 111 | 4. 是否支持C++? 112 | - C++示例可以[参考](https://github.com/openvinotoolkit/openvino.genai/tree/master/src) 113 | 114 | 您也可以在这里提交 [问题](https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/bd-p/distribution-openvino-toolkit). 115 | -------------------------------------------------------------------------------- /chat.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from typing import List, Tuple 3 | from threading import Thread 4 | import torch 5 | from optimum.intel.openvino import OVModelForCausalLM 6 | from transformers import (AutoTokenizer, AutoConfig, 7 | TextIteratorStreamer, StoppingCriteriaList, StoppingCriteria) 8 | 9 | 10 | class StopOnTokens(StoppingCriteria): 11 | def __init__(self, token_ids): 12 | self.token_ids = token_ids 13 | 14 | def __call__( 15 | self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs 16 | ) -> bool: 17 | for stop_id in self.token_ids: 18 | if input_ids[0][-1] == stop_id: 19 | return True 20 | return False 21 | 22 | 23 | if __name__ == "__main__": 24 | parser = argparse.ArgumentParser(add_help=False) 25 | parser.add_argument('-h', 26 | '--help', 27 | action='help', 28 | help='Show this help message and exit.') 29 | parser.add_argument('-m', 30 | '--model_path', 31 | required=True, 32 | type=str, 33 | help='Required. model path') 34 | parser.add_argument('-l', 35 | '--max_sequence_length', 36 | default=256, 37 | required=False, 38 | type=int, 39 | help='Required. maximun length of output') 40 | parser.add_argument('-d', 41 | '--device', 42 | default='CPU', 43 | required=False, 44 | type=str, 45 | help='Required. device for inference') 46 | args = parser.parse_args() 47 | model_dir = args.model_path 48 | 49 | ov_config = {"PERFORMANCE_HINT": "LATENCY", 50 | "NUM_STREAMS": "1", "CACHE_DIR": ""} 51 | 52 | tokenizer = AutoTokenizer.from_pretrained( 53 | model_dir) 54 | print("====Compiling model====") 55 | ov_model = OVModelForCausalLM.from_pretrained( 56 | model_dir, 57 | device=args.device, 58 | ov_config=ov_config, 59 | config=AutoConfig.from_pretrained(model_dir), 60 | ) 61 | 62 | streamer = TextIteratorStreamer( 63 | tokenizer, timeout=60.0, skip_prompt=True, skip_special_tokens=True 64 | ) 65 | stop_tokens = [151643, 151645] 66 | stop_tokens = [StopOnTokens(stop_tokens)] 67 | 68 | def convert_history_to_token(history: List[Tuple[str, str]]): 69 | 70 | messages = [] 71 | for idx, (user_msg, model_msg) in enumerate(history): 72 | if idx == len(history) - 1 and not model_msg: 73 | messages.append({"role": "user", "content": user_msg}) 74 | break 75 | if user_msg: 76 | messages.append({"role": "user", "content": user_msg}) 77 | if model_msg: 78 | messages.append({"role": "assistant", "content": model_msg}) 79 | 80 | model_inputs = tokenizer.apply_chat_template(messages, 81 | add_generation_prompt=True, 82 | tokenize=True, 83 | return_tensors="pt") 84 | return model_inputs 85 | 86 | history = [] 87 | print("====Starting conversation====") 88 | while True: 89 | input_text = input("用户: ") 90 | if input_text.lower() == 'stop': 91 | break 92 | 93 | if input_text.lower() == 'clear': 94 | history = [] 95 | print("AI助手: 对话历史已清空") 96 | continue 97 | 98 | print("Qwen2-OpenVINO:", end=" ") 99 | history = history + [[input_text, ""]] 100 | model_inputs = convert_history_to_token(history) 101 | generate_kwargs = dict( 102 | input_ids=model_inputs, 103 | max_new_tokens=args.max_sequence_length, 104 | temperature=0.1, 105 | do_sample=True, 106 | top_p=1.0, 107 | top_k=50, 108 | repetition_penalty=1.1, 109 | streamer=streamer, 110 | stopping_criteria=StoppingCriteriaList(stop_tokens), 111 | pad_token_id=151645, 112 | ) 113 | 114 | t1 = Thread(target=ov_model.generate, kwargs=generate_kwargs) 115 | t1.start() 116 | 117 | partial_text = "" 118 | for new_text in streamer: 119 | print(new_text, end="", flush=True) 120 | partial_text += new_text 121 | print("\n") 122 | history[-1][1] = partial_text -------------------------------------------------------------------------------- /chat_genai.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import openvino_genai 3 | 4 | 5 | def streamer(subword): 6 | print(subword, end='', flush=True) 7 | return False 8 | 9 | if __name__ == "__main__": 10 | parser = argparse.ArgumentParser(add_help=False) 11 | parser.add_argument('-h', 12 | '--help', 13 | action='help', 14 | help='Show this help message and exit.') 15 | parser.add_argument('-m', 16 | '--model_path', 17 | required=True, 18 | type=str, 19 | help='Required. model path') 20 | parser.add_argument('-l', 21 | '--max_sequence_length', 22 | default=256, 23 | required=False, 24 | type=int, 25 | help='Required. maximun length of output') 26 | parser.add_argument('-d', 27 | '--device', 28 | default='CPU', 29 | required=False, 30 | type=str, 31 | help='Required. device for inference') 32 | args = parser.parse_args() 33 | pipe = openvino_genai.LLMPipeline(args.model_path, args.device) 34 | 35 | config = openvino_genai.GenerationConfig() 36 | config.max_new_tokens = args.max_sequence_length 37 | 38 | pipe.start_chat() 39 | while True: 40 | try: 41 | prompt = input('question:\n') 42 | except EOFError: 43 | break 44 | pipe.generate(prompt, config, streamer) 45 | print('\n----------') 46 | pipe.finish_chat() -------------------------------------------------------------------------------- /convert.py: -------------------------------------------------------------------------------- 1 | from transformers import AutoTokenizer 2 | from optimum.intel import OVWeightQuantizationConfig 3 | from optimum.intel.openvino import OVModelForCausalLM 4 | 5 | import os 6 | from pathlib import Path 7 | import argparse 8 | 9 | if __name__ == '__main__': 10 | parser = argparse.ArgumentParser(add_help=False) 11 | parser.add_argument('-h', 12 | '--help', 13 | action='help', 14 | help='Show this help message and exit.') 15 | parser.add_argument('-m', 16 | '--model_id', 17 | default='Qwen/Qwen1.5-0.5B-Chat', 18 | required=False, 19 | type=str, 20 | help='orignal model path') 21 | parser.add_argument('-p', 22 | '--precision', 23 | required=False, 24 | default="int4", 25 | type=str, 26 | choices=["fp16", "int8", "int4"], 27 | help='fp16, int8 or int4') 28 | parser.add_argument('-o', 29 | '--output', 30 | required=False, 31 | type=str, 32 | help='path to save the ir model') 33 | parser.add_argument('-ms', 34 | '--modelscope', 35 | action='store_true', 36 | help='download model from Model Scope') 37 | args = parser.parse_args() 38 | 39 | ir_model_path = Path(args.model_id.split( 40 | "/")[1] + '-ov') if args.output is None else Path(args.output) 41 | 42 | if not ir_model_path.exists(): 43 | os.mkdir(ir_model_path) 44 | 45 | compression_configs = { 46 | "sym": True, 47 | "group_size": 128, 48 | "ratio": 0.8, 49 | } 50 | if args.modelscope: 51 | from modelscope import snapshot_download 52 | 53 | print("====Downloading model from ModelScope=====") 54 | model_path = snapshot_download(args.model_id, cache_dir='./') 55 | else: 56 | model_path = args.model_id 57 | 58 | print("====Exporting IR=====") 59 | if args.precision == "int4": 60 | ov_model = OVModelForCausalLM.from_pretrained(model_path, export=True, 61 | compile=False, quantization_config=OVWeightQuantizationConfig( 62 | bits=4, **compression_configs)) 63 | elif args.precision == "int8": 64 | ov_model = OVModelForCausalLM.from_pretrained(model_path, export=True, 65 | compile=False, load_in_8bit=True) 66 | else: 67 | ov_model = OVModelForCausalLM.from_pretrained(model_path, export=True, 68 | compile=False, load_in_8bit=False) 69 | 70 | print("====Saving IR=====") 71 | ov_model.save_pretrained(ir_model_path) 72 | 73 | print("====Exporting tokenizer=====") 74 | tokenizer = AutoTokenizer.from_pretrained( 75 | model_path) 76 | tokenizer.save_pretrained(ir_model_path) 77 | 78 | print("====Exporting IR tokenizer=====") 79 | from optimum.exporters.openvino.convert import export_tokenizer 80 | export_tokenizer(tokenizer, ir_model_path) 81 | print("====Finished=====") 82 | del ov_model 83 | del model_path 84 | 85 | 86 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | --extra-index-url https://download.pytorch.org/whl/cpu 2 | numpy 3 | openvino==2024.4.0 4 | openvino-genai==2024.4.0.0 5 | nncf>=2.11.0 6 | optimum-intel>=1.17.0 7 | transformers>=4.40.0,<4.42.0 8 | onnx>=1.15.0 9 | huggingface-hub>=0.21.3 10 | torch>=2.1 11 | modelscope --------------------------------------------------------------------------------