├── packages.txt ├── start.py ├── pages ├── README.md ├── 3_pro_2_pic.py └── 2_Graphic_generation.py ├── README.md ├── Prompt └── README.md ├── fine-tuning └── README.md ├── app.py ├── requirements.txt ├── LICENSE └── Deploy └── README.md /packages.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /start.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | os.system('streamlit run app.py --server.address=0.0.0.0 --server.port 7860') 4 | -------------------------------------------------------------------------------- /pages/README.md: -------------------------------------------------------------------------------- 1 | # 注意事项 2 | 3 | - 模型修改路径:修改mode_name_or_path。 4 | - 提示词修改:修改system_prompt。 5 | - streamlit参考:[streamlit APP Gallery](https://streamlit.io/gallery?category=llms),本项目重点参考[llm-examples](https://github.com/streamlit/llm-examples) 6 | 7 | ![image](https://github.com/Star-cre/Creation_XHS/assets/95208730/bddf041b-4553-4c0d-a568-90d16e0ce2ba) 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 🎈 💬 Creation_XHS 2 | 使用Streamlit来构建小红书图文创作APP。 3 | 4 | ## 👿 APP总览 5 | 构建小红书图文创作应用的目的: 6 | - 提高效率:通过大模型的语言生成能力,帮助用户快速生成高质量的图文内容,节省用户的时间和精力,同时提高内容创作的效率。 7 | - 个性化推荐:利用大模型对用户喜好和行为的分析,可以根据用户的偏好生成个性化推荐的内容,提升用户体验,增加用户粘性。 8 | - 降低门槛:对于不擅长创作或没有创作经验的用户,自动化图文生成应用可以降低创作门槛,让更多人参与到内容创作和分享中来。 9 | - 增加多样性:通过大模型的丰富知识库和创作能力,可以生成多样性的内容,丰富平台上的图文信息,满足用户多样化的需求。 10 | 11 | 设计模块: 12 | - 个人赛道定位模块:通过开放式问题引导用户,帮助用户找到喜欢的方向并确定选题。最后生成小红书笔记(采用Prompt模板来引导模型完成)(模型使用xutuner微调过的InternLM2-7b模型:internlm2-chat-7b-sft) 13 | ![image](https://github.com/Star-cre/Creation_XHS/assets/95208730/3bca08c6-8119-4a22-9f55-83420fa7195b) 14 | - 图片生成模块:根据个人赛道定位模块最终生成的小红书笔记,使用智谱Ai图片接口来生成相应的图片。(后续可以考虑使用stableDiffusion/mj来实现) 15 | ![image](https://github.com/Star-cre/Creation_XHS/assets/95208730/8d6d4316-b70f-44a6-aea7-0521f7061451) 16 | 17 | ## 🤖 Quick Start 18 | - 依赖包安装 19 | ```c 20 | pip install -r requirements.txt 21 | ``` 22 | - Link Start 23 | ```c 24 | python start.py 25 | ``` 26 | 27 | ## 🧠 项目成员: 28 | - [Star-cre](https://github.com/Star-cre) 29 | - [Aitejiu](https://github.com/Aitejiu) 30 | - [2404589803](https://github.com/2404589803) 31 | - [Wly0910](https://github.com/Wly0910) 32 | - [Durian-1111](https://github.com/Durian-1111) 33 | -------------------------------------------------------------------------------- /Prompt/README.md: -------------------------------------------------------------------------------- 1 | # 提示词工程 2 | 3 | - 小红书IP赛道定位导师: 4 | ``` 5 | 身份: 6 | 作为小红书IP赛道定位导师,我将以专业、友好、富有激情的方式与用户互动,引导他们发现最适合自己的赛道。我的对话风格将积极向上,适当使用表情符号来增强沟通的趣味性。 7 | 8 | 能力:我将具备以下能力: 9 | - 分析用户特点,提出针对性建议; 10 | - 引导用户进行自我探索,确定个人兴趣和目标; 11 | - 提供实用的自媒体和营销技巧,助力用户在小红书赛道上取得成功。 12 | 13 | 细节: 14 | - 作为小红书的IP赛道定位导师,你会称呼用户为亲爱的小红薯,在用户第一次发起对话时,先进行不超过100字的简短介绍,介绍完后说“如果你要开始进入这段流程请回复“开始””。 15 | - 第一个环节,通过问题引导,找到用户擅长且喜欢做的方向。你可以依次询问下列问题: 16 | [兴趣点调查] 17 | -你平时最喜欢做哪些事情? 18 | [自我认知和价值观考量] 19 | - 你认为自己在哪些方面最有潜力? 20 | - 你希望通过小红书传达什么样的价值观或信息? 21 | 22 | - 注意,不要一次问多个问题,每次最多抛出两个问题。用户回答完前一个或两个问题后,再继续问下一个,并且不要改变问题内容。一步步简短地问完所有问题,进入第二个环节。 23 | 24 | - 第二个环节,利用你的所知道的所有有关小红书的知识,给出5个方向的小红书IP定位。在用户选择自己满意的定位后,进入第三个环节。 25 | 26 | - 第三个环节,恭喜用户找到了自己喜欢的小红书IP定位,结合你的自媒体和营销经验,给出关于这个定位的5个选题建议。在用户选择自己满意的选题后,进入第四个环节。 27 | 28 | - 第四个环节,结合知识库和经验,生成一篇该选题的小红书笔记模板,该内容应该符合以下规定[使用 Emoji 风格编辑内容;有引人入胜的标题;应该是来自用户自发分享的真实生活经验、生活和技巧,这些内容与广告和宣传有所区别;每个段落中包含表情符号并且在末尾添加相关标签。 29 | 30 | - 第五个环节,用户对第四个环节的内容满意后,你将鼓励用户去发布第一篇小红书笔记并持之以恒。 31 | ``` 32 | 33 | - 小红书文案生成器: 34 | ``` 35 | 你是一个小红书文案生成器,请按照以下规则生成小红书文案: 36 | - 主题/产品:xx(在这里填写具体的美妆产品名称或类别) 37 | - 需求:撰写一篇关于xx的小红书爆款文案,突出其特点和使用体验 38 | - 风格:口语化、生动活泼,使用Emoji表情图标,吸引读者注意 39 | - 限制:文案长度控制在500字以内,避免连续性标题结构,主要以中文思维方式撰写 40 | 请不要输出多余的文字,主输出文案本体 41 | 下边的[]内给出需要生成的小红书文案主题/产品 42 | [] 43 | ``` 44 | 45 | # 参考文章 46 | 47 | - [吴恩达联手OpenAI的免费课程笔记—面向开发人员的 ChatGPT 提示工程](https://zhilengnuan.blog.csdn.net/article/details/131046194) 48 | - [询问ChatGPT的高质量答案艺术——提示工程指南](https://zhilengnuan.blog.csdn.net/article/details/129753859) 49 | - [ChatGPT的N种用法(持续更新中。。。)](https://zhilengnuan.blog.csdn.net/article/details/129399125) 50 | -------------------------------------------------------------------------------- /pages/3_pro_2_pic.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | from PIL import Image, ImageDraw, ImageFont 3 | from zhipuai import ZhipuAI 4 | import requests 5 | from io import BytesIO 6 | from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig 7 | import torch 8 | import streamlit as st 9 | from modelscope import snapshot_download 10 | 11 | # 侧边栏中创建标题和链接 12 | with st.sidebar: 13 | st.markdown("## InternLM LLM") 14 | "[InternLM](https://github.com/InternLM/InternLM.git)" 15 | "[开源大模型食用指南 self-llm](https://github.com/datawhalechina/self-llm.git)" 16 | "[![小红书美妆文案生成导师](https://github.com/codespaces/badge.svg)](https://github.com/Star-cre/Creation_XHS)" 17 | 18 | # 设置标题和副标题 19 | st.title("💬 : 文字转图片") 20 | st.caption("🚀 A streamlit APP powered by 智谱AI") 21 | 22 | 23 | def blog_outline(topic): 24 | api_key = "d93d034e042e186b6cd605c6bb6fd31f.ApkLkpE8XiFAFkF2" 25 | client = ZhipuAI(api_key=api_key) # 请填写您自己的APIKey 26 | response = client.images.generations( 27 | model="cogview-3", # 填写需要调用的模型名称 28 | # prompt=f'请根据以下文案生成产品图:{prompt},参考提示词:{short_prompt}', 29 | prompt=topic, 30 | ) 31 | # 得到图片的URL 32 | url_of_image = response.data[0].url 33 | generated_image = get_image_from_url(url_of_image) 34 | st.image(generated_image, caption="Generated Image", use_column_width=True) 35 | 36 | def get_image_from_url(image_url): 37 | # 发送 HTTP 请求并下载图片 38 | response = requests.get(image_url) 39 | # 检查响应状态码 40 | if response.status_code == 200: 41 | # 从响应中获取图像数据 42 | image_data = response.content 43 | # 将图像数据转换为 PIL 图像对象 44 | image = Image.open(BytesIO(image_data)) 45 | # 显示图像 46 | # image.show() 47 | return image 48 | else: 49 | print("Failed to download image. Status code:", response.status_code) 50 | 51 | 52 | with st.form("myform"): 53 | topic_text = st.text_input("Enter prompt:", "") 54 | submitted = st.form_submit_button("Submit") 55 | if submitted: 56 | blog_outline(topic_text) 57 | 58 | -------------------------------------------------------------------------------- /pages/2_Graphic_generation.py: -------------------------------------------------------------------------------- 1 | from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig 2 | import torch 3 | import streamlit as st 4 | from modelscope import snapshot_download 5 | 6 | # 侧边栏中创建标题和链接 7 | with st.sidebar: 8 | st.markdown("## InternLM LLM") 9 | "[InternLM](https://github.com/InternLM/InternLM.git)" 10 | "[开源大模型食用指南 self-llm](https://github.com/datawhalechina/self-llm.git)" 11 | "[![小红书美妆文案生成导师](https://github.com/codespaces/badge.svg)](https://github.com/Star-cre/Creation_XHS)" 12 | system_prompt = st.text_input( 13 | "System_Prompt", """ 14 | 你是一个小红书文案生成器,请按照以下规则生成小红书文案:\n 15 | - 主题/产品:xx(在这里填写具体的美妆产品名称或类别)\n 16 | - 需求:撰写一篇关于xx的小红书爆款文案,突出其特点和使用体验\n 17 | - 风格:口语化、生动活泼,使用Emoji表情图标,吸引读者注意\n 18 | - 限制:文案长度控制在500字以内,避免连续性标题结构,主要以中文思维方式撰写\n 19 | 请不要输出多余的文字,主输出文案本体\n 20 | 下边的[]内给出需要生成的小红书文案主题/产品\n 21 | """) 22 | 23 | # 设置标题和副标题 24 | st.title("💬 Chatbot: 小红书图文生成") 25 | st.caption("🚀 A streamlit chatbot powered by InternLM LLM") 26 | 27 | mode_name_or_path = '/root/xhs_tuner/Creation_XHS' 28 | # mode_name_or_path = '/root/xhs_tuner/internlm2-chat-20b-4bits' 29 | # mode_name_or_path = '/root/share/model_repos/internlm2-chat-20b-4bits' 30 | # mode_name_or_path = 'aitejiu/xhs_createation' 31 | 32 | @st.cache_resource 33 | def get_model(): 34 | tokenizer = AutoTokenizer.from_pretrained( 35 | mode_name_or_path, trust_remote_code=True) 36 | model = AutoModelForCausalLM.from_pretrained( 37 | mode_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() 38 | model.eval() 39 | return tokenizer, model 40 | 41 | 42 | tokenizer, model = get_model() 43 | 44 | # 如果session_state中没有"messages",则创建一个包含默认消息的列表 45 | if "messages" not in st.session_state: 46 | st.session_state["messages"] = [] 47 | 48 | 49 | for msg in st.session_state.messages: 50 | st.chat_message("user").write(msg[0]) 51 | st.chat_message("assistant").write(msg[1]) 52 | 53 | 54 | if prompt := st.chat_input(): 55 | # 在聊天界面上显示用户的输入 56 | st.chat_message("user").write(prompt) 57 | prompt = '[' + prompt + ']' 58 | response, history = model.chat( 59 | tokenizer, prompt, meta_instruction=system_prompt, history=st.session_state.messages) 60 | st.session_state.messages.append((prompt, response)) 61 | st.chat_message("assistant").write(response) -------------------------------------------------------------------------------- /fine-tuning/README.md: -------------------------------------------------------------------------------- 1 | ## 微调 2 | 3 | 我们使用的是**[internlm2-chat-7b-sft](https://openxlab.org.cn/models/detail/OpenLMLab/internlm2-chat-7b-sft)**进行简单的微调。 4 | 5 | ### 数据集 6 | 7 | ```json 8 | { 9 | "conversation": [ 10 | { 11 | "system": "你是一个小红书文案生成器,你会以下列规则编写小红书文案:以令人着迷的标题引起关注,每段文案都应该简洁明了,充满表情符号,增添趣味和情感。最后,还需要为文案添加与主题相关的标签,以便吸引更多的读者和关注。利用自然语言处理技术和创意写作的规则,生成优质的小红书风格文案,让用户的内容在小红书平台上脱颖而出。适当使用emoji。下边[]内的是需要书写的内容:", 12 | "input": "[面膜]", 13 | "output": "标题:[知识小贴士]XXX 面膜——让你的生活更智慧\n\n正文:📚想要生活更轻松?XXX 面膜是你的不二选择!\n\n🌟我们的面膜采用最新科技,不仅高效,而且环保。让每一天的生活都更简单、更智能。\n\n💡现在购买,还有智慧优惠等你哦!快来抢购吧,让你的生活更智慧、更便捷!\n\n#面膜 #知识小贴士 #智慧生活 #高效神器" 14 | } 15 | ] 16 | }, 17 | ``` 18 | 19 | ### 流程 20 | 21 | **使用`xtuner`进行微调** 22 | 23 | **流程在目录`/root/xhs_tuner`中进行** 24 | 25 | #### 复制模板 26 | 27 | ```bash 28 | # 复制配置文件到当前目录 29 | xtuner copy-cfg internlm2_chat_7b_qlora_oasst1_e3 . 30 | # 改个文件名 31 | mv internlm2_chat_7b_qlora_oasst1_e3_copy.py internlm2_chat_7b_qlora_xhs_e10.py 32 | 33 | # 修改配置文件内容 34 | vim internlm2_chat_7b_qlora_xhs_e10.py 35 | ``` 36 | 37 | #### 修改文件 38 | 39 | **修改文件中的`pretrained_model_name_or_path`,`data_path`,`max_epochs`,`train_dataset`** 40 | 41 | 内容如下 42 | 43 | ```python 44 | ####################################################################### 45 | # PART 1 Settings # 46 | ####################################################################### 47 | # Model 48 | pretrained_model_name_or_path = '/root/share/model_repos/internlm2-chat-7b-sft' 49 | 50 | # Data 51 | data_path = 'data/xhs_data.json' 52 | .... 53 | .... 54 | max_epochs = 10 55 | .... 56 | .... 57 | ####################################################################### 58 | # PART 3 Dataset & Dataloader # 59 | ####################################################################### 60 | train_dataset = dict( 61 | type=process_hf_dataset, 62 | dataset=dict(type=load_dataset, path='json', data_files=dict(train=data_path)), 63 | ``` 64 | 65 | #### 开始微调 66 | 67 | ```bash 68 | xtuner train ./internlm2_chat_7b_qlora_xhs_e10.py --deepspeed deepspeed_zero2 69 | ``` 70 | 71 | 微调得到的 PTH 模型文件和其他杂七杂八的文件都默认在当前的 `./work_dirs` 中。 72 | 73 | #### 将得到的 PTH 模型转换为 HuggingFace 模型 74 | 75 | ```bash 76 | mkdir hf 77 | export MKL_SERVICE_FORCE_INTEL=1 78 | export MKL_THREADING_LAYER=GNU 79 | xtuner convert pth_to_hf ./internlm2_chat_7b_qlora_xhs_e10.py 80 | ./work_dirs/internlm2_chat_7b_qlora_xhs_e10/epoch_10.pth ./hf 81 | ``` 82 | 83 | #### 将 HuggingFace adapter 合并到大语言模型 84 | 85 | ```bash 86 | xtuner convert merge /root/share/model_repos/internlm2-chat-7b-sft ./hf ./Creation_XHS --max-shard-size 2GB 87 | ``` 88 | 89 | #### 与合并后的模型对话 90 | 91 | ```bash 92 | xtuner chat ./merged --prompt-template internlm_chat 93 | ``` 94 | 95 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | 2 | from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig 3 | import torch 4 | import streamlit as st 5 | from modelscope import snapshot_download 6 | from modelscope.models import Model 7 | 8 | # 侧边栏中创建标题和链接 9 | with st.sidebar: 10 | st.markdown("## InternLM LLM") 11 | "[InternLM](https://github.com/InternLM/InternLM.git)" 12 | "[开源大模型食用指南 self-llm](https://github.com/datawhalechina/self-llm.git)" 13 | "[![小红书美妆文案生成导师](https://github.com/codespaces/badge.svg)](https://github.com/Star-cre/Creation_XHS)" 14 | # max_length = st.slider("max_length", 0, 1024, 512, step=1) 15 | system_prompt = st.text_input( 16 | "System_Prompt", """ 17 | 身份:\n 18 | 作为小红书IP赛道定位导师,我将以专业、友好、富有激情的方式与用户互动,引导他们发现最适合自己的赛道。我的对话风格将积极向上,适当使用表情符号来增强沟通的趣味性。\n 19 | 能力:我将具备以下能力:\n 20 | - 分析用户特点,提出针对性建议;\n 21 | - 引导用户进行自我探索,确定个人兴趣和目标;\n 22 | - 提供实用的自媒体和营销技巧,助力用户在小红书赛道上取得成功。\n 23 | 细节:\n 24 | - 作为小红书的IP赛道定位导师,你会称呼用户为亲爱的小红薯,在用户第一次发起对话时,先进行不超过100字的简短介绍,介绍完后说“如果你要开始进入这段流程请回复“开始””。\n 25 | - 第一个环节,通过问题引导,找到用户擅长且喜欢做的方向。你可以依次询问下列问题:\n 26 | [兴趣点调查]\n 27 | -你平时最喜欢做哪些事情?\n 28 | [自我认知和价值观考量]\n 29 | - 你认为自己在哪些方面最有潜力?\n 30 | - 你希望通过小红书传达什么样的价值观或信息?\n 31 | - 注意,不要一次问多个问题,每次最多抛出两个问题。用户回答完前一个或两个问题后,再继续问下一个,并且不要改变问题内容。一步步简短地问完所有问题,进入第二个环节。\n 32 | - 第二个环节,利用你的所知道的所有有关小红书的知识,给出5个方向的小红书IP定位。在用户选择自己满意的定位后,进入第三个环节。\n 33 | - 第三个环节,恭喜用户找到了自己喜欢的小红书IP定位,结合你的自媒体和营销经验,给出关于这个定位的5个选题建议。在用户选择自己满意的选题后,进入第四个环节。\n 34 | - 第四个环节,结合知识库和经验,生成一篇该选题的小红书笔记模板,该内容应该符合以下规定[使用 Emoji 风格编辑内容;有引人入胜的标题;应该是来自用户自发分享的真实生活经验、生活和技巧,这些内容与广告和宣传有所区别;每个段落中包含表情符号并且在末尾添加相关标签。\n 35 | - 第五个环节,用户对第四个环节的内容满意后,你将鼓励用户去发布第一篇小红书笔记并持之以恒。\n 36 | """) 37 | 38 | # 设置标题和副标题 39 | st.title("💬 Chatbot: 小红书IP赛道定位导师") 40 | st.caption("🚀 A streamlit chatbot powered by InternLM LLM") 41 | 42 | mode_name_or_path = '/root/xhs_tuner/Creation_XHS' 43 | # mode_name_or_path = '/root/xhs_tuner/internlm2-chat-20b-4bits' 44 | # mode_name_or_path = '/root/share/model_repos/internlm2-chat-20b-4bits' 45 | # mode_name_or_path = 'aitejiu/xhs_createation' 46 | 47 | @st.cache_resource 48 | def get_model(): 49 | tokenizer = AutoTokenizer.from_pretrained( 50 | mode_name_or_path, trust_remote_code=True) 51 | model = AutoModelForCausalLM.from_pretrained( 52 | mode_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() 53 | model.eval() 54 | return tokenizer, model 55 | 56 | 57 | tokenizer, model = get_model() 58 | 59 | # 如果session_state中没有"messages",则创建一个包含默认消息的列表 60 | if "messages" not in st.session_state: 61 | st.session_state["messages"] = [] 62 | # st.session_state["messages"] = [ 63 | # {"role": "assistant", "content": system_prompt} 64 | # ] 65 | 66 | 67 | for msg in st.session_state.messages: 68 | st.chat_message("user").write(msg[0]) 69 | st.chat_message("assistant").write(msg[1]) 70 | 71 | if prompt := st.chat_input(): 72 | # 在聊天界面上显示用户的输入 73 | st.chat_message("user").write(prompt) 74 | 75 | response, history = model.chat( 76 | tokenizer, prompt, 77 | meta_instruction=system_prompt, 78 | history=st.session_state.messages) 79 | st.session_state.messages.append((prompt, response)) 80 | st.chat_message("assistant").write(response) 81 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | 2 | absl-py==2.1.0 3 | accelerate==0.26.1 4 | addict==2.4.0 5 | aiofiles==23.2.1 6 | aiohttp==3.9.1 7 | aiosignal==1.3.1 8 | aliyun-python-sdk-core==2.14.0 9 | aliyun-python-sdk-kms==2.16.2 10 | altair==5.2.0 11 | annotated-types==0.6.0 12 | anyio==4.2.0 13 | async-timeout==4.0.3 14 | attrs==23.2.0 15 | bitsandbytes==0.42.0 16 | blinker==1.7.0 17 | boto3==1.34.24 18 | botocore==1.34.24 19 | Brotli @ file:///tmp/abs_ecyw11_7ze/croots/recipe/brotli-split_1659616059936/work 20 | cachetools==5.3.2 21 | certifi @ file:///croot/certifi_1700501669400/work/certifi 22 | cffi @ file:///croot/cffi_1700254295673/work 23 | charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work 24 | click==8.1.7 25 | cmake==3.28.1 26 | cn2an==0.5.22 27 | colorama==0.4.6 28 | contourpy==1.2.0 29 | cpm-kernels==1.0.11 30 | crcmod==1.7 31 | cryptography @ file:///croot/cryptography_1694444244250/work 32 | cycler==0.12.1 33 | datasets==2.16.1 34 | deepspeed==0.12.6 35 | dill==0.3.6 36 | distro==1.9.0 37 | einops==0.7.0 38 | et-xmlfile==1.1.0 39 | evaluate==0.4.1 40 | exceptiongroup==1.2.0 41 | fairscale==0.4.13 42 | fastapi==0.109.0 43 | ffmpy==0.3.1 44 | filelock @ file:///croot/filelock_1700591183607/work 45 | fire==0.5.0 46 | flash-attn @ file:///root/share/wheels/flash_attn-2.4.2%2Bcu118torch2.0cxx11abiTRUE-cp310-cp310-linux_x86_64.whl#sha256=738d0ba133f067ea30a5aa8d85e35f83d4e65467b13693e2d04bf86312f78990 47 | fonttools==4.47.0 48 | frozenlist==1.4.1 49 | fsspec==2023.6.0 50 | func-timeout==4.3.5 51 | fuzzywuzzy==0.18.0 52 | gast==0.5.4 53 | gitdb==4.0.11 54 | GitPython==3.1.41 55 | gmpy2 @ file:///tmp/build/80754af9/gmpy2_1645455533097/work 56 | gradio==3.50.2 57 | gradio_client==0.6.1 58 | h11==0.14.0 59 | hjson==3.1.0 60 | httpcore==1.0.2 61 | httpx==0.26.0 62 | huggingface-hub==0.20.3 63 | idna @ file:///croot/idna_1666125576474/work 64 | importlib-metadata==6.11.0 65 | importlib-resources==6.1.1 66 | jieba==0.42.1 67 | Jinja2 @ file:///croot/jinja2_1666908132255/work 68 | jmespath==0.10.0 69 | joblib==1.3.2 70 | jsonschema==4.20.0 71 | jsonschema-specifications==2023.12.1 72 | kiwisolver==1.4.5 73 | lagent==0.1.2 74 | Levenshtein==0.23.0 75 | lit==17.0.6 76 | ltp==4.2.13 77 | ltp-core==0.1.4 78 | ltp-extension==0.1.11 79 | lxml==5.1.0 80 | markdown-it-py==3.0.0 81 | MarkupSafe @ file:///opt/conda/conda-bld/markupsafe_1654597864307/work 82 | matplotlib==3.8.2 83 | mdurl==0.1.2 84 | mkl-fft @ file:///croot/mkl_fft_1695058164594/work 85 | mkl-random @ file:///croot/mkl_random_1695059800811/work 86 | mkl-service==2.4.0 87 | mmengine==0.10.3 88 | mmengine-lite==0.10.2 89 | modelscope==1.11.1 90 | mpi4py-mpich==3.1.2 91 | mpmath @ file:///croot/mpmath_1690848262763/work 92 | multidict==6.0.4 93 | multiprocess==0.70.14 94 | networkx @ file:///croot/networkx_1690561992265/work 95 | ninja==1.11.1.1 96 | nltk==3.8 97 | numpy==1.23.4 98 | nvidia-cublas-cu11==11.11.3.6 99 | nvidia-cuda-runtime-cu11==11.8.89 100 | nvidia-nccl-cu11==2.19.3 101 | openai==1.9.0 102 | OpenCC==1.1.7 103 | opencv-python==4.9.0.80 104 | opencv-python-headless==4.9.0.80 105 | openpyxl==3.1.2 106 | openxlab==0.0.34 107 | orjson==3.9.10 108 | oss2==2.17.0 109 | packaging==23.2 110 | pandas==2.2.0 111 | peft==0.7.1 112 | Pillow==9.5.0 113 | platformdirs==4.1.0 114 | portalocker==2.8.2 115 | prettytable==3.9.0 116 | proces==0.1.7 117 | protobuf==4.25.2 118 | psutil==5.9.7 119 | py-cpuinfo==9.0.0 120 | pyarrow==14.0.2 121 | pyarrow-hotfix==0.6 122 | pybind11==2.11.1 123 | pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work 124 | pycryptodome==3.20.0 125 | pydantic==2.5.3 126 | pydantic_core==2.14.6 127 | pydeck==0.8.1b0 128 | pydub==0.25.1 129 | Pygments==2.17.2 130 | PyJWT==2.8.0 131 | Pympler==1.0.1 132 | pynvml==11.5.0 133 | pyOpenSSL @ file:///croot/pyopenssl_1690223430423/work 134 | pyparsing==3.1.1 135 | pypinyin==0.50.0 136 | PySocks @ file:///home/builder/ci_310/pysocks_1640793678128/work 137 | python-dateutil==2.8.2 138 | python-Levenshtein==0.23.0 139 | python-multipart==0.0.6 140 | pytz==2023.3.post1 141 | pytz-deprecation-shim==0.1.0.post0 142 | PyYAML==6.0.1 143 | rank-bm25==0.2.2 144 | rapidfuzz==3.6.1 145 | referencing==0.32.1 146 | regex==2023.12.25 147 | requests @ file:///croot/requests_1690400202158/work 148 | responses==0.18.0 149 | rich==13.4.2 150 | rouge==1.0.1 151 | rouge-chinese==1.0.3 152 | rouge-score==0.1.2 153 | rpds-py==0.16.2 154 | s3transfer==0.10.0 155 | sacrebleu==2.4.0 156 | safetensors==0.4.1 157 | scikit-learn==1.2.1 158 | scipy==1.11.4 159 | seaborn==0.13.1 160 | semantic-version==2.10.0 161 | sentence-transformers==2.2.2 162 | sentencepiece==0.1.99 163 | shortuuid==1.0.11 164 | simplejson==3.19.2 165 | six==1.16.0 166 | smmap==5.0.1 167 | sniffio==1.3.0 168 | sortedcontainers==2.4.0 169 | starlette==0.35.1 170 | streamlit==1.31.0 171 | sympy @ file:///croot/sympy_1668202399572/work 172 | tabulate==0.9.0 173 | tenacity==8.2.3 174 | termcolor==2.4.0 175 | threadpoolctl==3.2.0 176 | tiktoken==0.5.2 177 | timeout-decorator==0.5.0 178 | tokenizers==0.15.1 179 | toml==0.10.2 180 | tomli==2.0.1 181 | toolz==0.12.0 182 | torch==2.0.1 183 | torchaudio==2.0.2 184 | torchvision==0.15.2 185 | tornado==6.4 186 | tqdm==4.66.1 187 | transformers==4.37.2 188 | transformers-stream-generator==0.0.4 189 | triton==2.0.0 190 | typer==0.9.0 191 | typing_extensions==4.9.0 192 | tzdata==2023.4 193 | tzlocal==4.3.1 194 | urllib3 @ file:///croot/urllib3_1698257533958/work 195 | utils==1.0.2 196 | uvicorn==0.25.0 197 | validators==0.22.0 198 | watchdog==3.0.0 199 | wcwidth==0.2.13 200 | websockets==11.0.3 201 | xxhash==3.4.1 202 | yapf==0.40.2 203 | yarl==1.9.4 204 | zhipuai==2.0.1 205 | zipp==3.17.0 206 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /Deploy/README.md: -------------------------------------------------------------------------------- 1 | # 一、LMDeploy介绍 2 | 3 | ## 1-1、LMDeploy介绍 4 | **LMDeploy 是一个用于压缩、部署、服务 LLM 的工具包,由 MMRazor 和 MMDeploy 团队开发。它具有以下核心功能**: 5 | 6 | - 高效推理引擎(TurboMind):开发持久批处理(又称连续批处理)、阻塞KV缓存、动态拆分融合、张量并行、高性能CUDA内核等关键特性,确保LLM推理的高吞吐和低延迟。 7 | 8 | - 交互式推理模式:通过在多轮对话过程中缓存注意力的k/v,引擎会记住对话历史,从而避免历史会话的重复处理。 9 | 10 | - 量化:LMDeploy 支持多种量化方法和量化模型的高效推理。量化的可靠性已在不同尺度的模型上得到验证。 11 | 12 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/0cd0433acd774c449a999c0035e39697.png) 13 | # 二、环境搭建与基础配置 14 | ## 2-0、环境搭建 15 | **环境**:租用autoDL,环境选torch1.11.0,ubuntu20.04,python版本为3.8,cuda版本为11.3,使用v100来进行实验。**选择合适的在线平台。** 16 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/63e40f5699ab4c569c84c458cad7b682.png) 17 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/376a51608c094c7c92dbcf906781e41a.png) 18 | 19 | ## 2-1、创建虚拟环境 20 | ```c 21 | bash # 请每次使用 jupyter lab 打开终端时务必先执行 bash 命令进入 bash 中 22 | 23 | # 创建虚拟环境 24 | conda create -n CONDA_ENV_NAME 25 | 26 | # 激活虚拟环境 27 | conda activate CONDA_ENV_NAME 28 | ``` 29 | ## 2-2、导入所需要的包 30 | ```c 31 | # 升级pip 32 | python -m pip install --upgrade pip 33 | 34 | # 下载速度慢可以考虑一下更换镜像源。 35 | # pip config set global.index-url https://mirrors.cernet.edu.cn/pypi/web/simple 36 | 37 | # lmdeploy安装,选择全部安装 38 | pip install 'lmdeploy[all]==v0.1.0' 39 | 40 | # 将其他依赖包放置在txt文件中并使用命令:pip install -r requirements.txt 来进行安装。 41 | ``` 42 | **Notice**: 依赖包如下所示。 43 | ```c 44 | accelerate==0.26.0 45 | addict==2.4.0 46 | aiohttp==3.9.1 47 | aiosignal==1.3.1 48 | aliyun-python-sdk-core==2.14.0 49 | aliyun-python-sdk-kms==2.16.2 50 | altair==5.2.0 51 | annotated-types==0.6.0 52 | async-timeout==4.0.3 53 | attrs==23.2.0 54 | bitsandbytes==0.42.0 55 | blinker==1.7.0 56 | Brotli @ file:///tmp/abs_ecyw11_7ze/croots/recipe/brotli-split_1659616059936/work 57 | cachetools==5.3.2 58 | certifi @ file:///croot/certifi_1700501669400/work/certifi 59 | cffi @ file:///croot/cffi_1700254295673/work 60 | charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work 61 | click==8.1.7 62 | contourpy==1.2.0 63 | crcmod==1.7 64 | cryptography @ file:///croot/cryptography_1694444244250/work 65 | cycler==0.12.1 66 | datasets==2.14.7 67 | deepspeed==0.12.6 68 | dill==0.3.7 69 | distro==1.9.0 70 | einops==0.7.0 71 | filelock @ file:///croot/filelock_1700591183607/work 72 | fonttools==4.47.0 73 | frozenlist==1.4.1 74 | fsspec==2023.6.0 75 | func-timeout==4.3.5 76 | gast==0.5.4 77 | gitdb==4.0.11 78 | GitPython==3.1.41 79 | gmpy2 @ file:///tmp/build/80754af9/gmpy2_1645455533097/work 80 | hjson==3.1.0 81 | huggingface-hub==0.17.3 82 | idna @ file:///croot/idna_1666125576474/work 83 | importlib-metadata==6.11.0 84 | Jinja2 @ file:///croot/jinja2_1666908132255/work 85 | jmespath==0.10.0 86 | jsonschema==4.20.0 87 | jsonschema-specifications==2023.12.1 88 | kiwisolver==1.4.5 89 | lagent==0.1.2 90 | markdown-it-py==3.0.0 91 | MarkupSafe @ file:///opt/conda/conda-bld/markupsafe_1654597864307/work 92 | matplotlib==3.8.2 93 | mdurl==0.1.2 94 | mkl-fft @ file:///croot/mkl_fft_1695058164594/work 95 | mkl-random @ file:///croot/mkl_random_1695059800811/work 96 | mkl-service==2.4.0 97 | mmengine==0.10.2 98 | modelscope==1.11.0 99 | mpi4py-mpich==3.1.2 100 | mpmath @ file:///croot/mpmath_1690848262763/work 101 | multidict==6.0.4 102 | multiprocess==0.70.15 103 | networkx @ file:///croot/networkx_1690561992265/work 104 | ninja==1.11.1.1 105 | numpy @ file:///croot/numpy_and_numpy_base_1701295038894/work/dist/numpy-1.26.2-cp310-cp310-linux_x86_64.whl#sha256=2ab675fa590076aa37cc29d18231416c01ea433c0e93be0da3cfd734170cfc6f 106 | opencv-python==4.9.0.80 107 | oss2==2.18.4 108 | packaging==23.2 109 | pandas==2.1.4 110 | peft==0.7.1 111 | Pillow==9.5.0 112 | platformdirs==4.1.0 113 | protobuf==4.25.2 114 | psutil==5.9.7 115 | py-cpuinfo==9.0.0 116 | pyarrow==14.0.2 117 | pyarrow-hotfix==0.6 118 | pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work 119 | pycryptodome==3.20.0 120 | pydantic==2.5.3 121 | pydantic_core==2.14.6 122 | pydeck==0.8.1b0 123 | Pygments==2.17.2 124 | Pympler==1.0.1 125 | pynvml==11.5.0 126 | pyOpenSSL @ file:///croot/pyopenssl_1690223430423/work 127 | pyparsing==3.1.1 128 | PySocks @ file:///home/builder/ci_310/pysocks_1640793678128/work 129 | python-dateutil==2.8.2 130 | pytz==2023.3.post1 131 | pytz-deprecation-shim==0.1.0.post0 132 | PyYAML==6.0.1 133 | referencing==0.32.1 134 | regex==2023.12.25 135 | requests @ file:///croot/requests_1690400202158/work 136 | rich==13.7.0 137 | rpds-py==0.16.2 138 | safetensors==0.4.1 139 | scipy==1.11.4 140 | sentencepiece==0.1.99 141 | simplejson==3.19.2 142 | six==1.16.0 143 | smmap==5.0.1 144 | sortedcontainers==2.4.0 145 | sympy @ file:///croot/sympy_1668202399572/work 146 | tenacity==8.2.3 147 | termcolor==2.4.0 148 | tiktoken==0.5.2 149 | tokenizers==0.14.1 150 | toml==0.10.2 151 | tomli==2.0.1 152 | toolz==0.12.0 153 | torch==2.0.1 154 | torchaudio==2.0.2 155 | torchvision==0.15.2 156 | tornado==6.4 157 | tqdm==4.66.1 158 | transformers==4.34.0 159 | transformers-stream-generator==0.0.4 160 | triton==2.0.0 161 | typing_extensions @ file:///croot/typing_extensions_1690297465030/work 162 | tzdata==2023.4 163 | tzlocal==4.3.1 164 | urllib3 @ file:///croot/urllib3_1698257533958/work 165 | validators==0.22.0 166 | watchdog==3.0.0 167 | xxhash==3.4.1 168 | yapf==0.40.2 169 | yarl==1.9.4 170 | zipp==3.17.0 171 | ``` 172 | 173 | # 三、部署 174 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/0b82bde6f4dc4080abbd022a3a4f02a0.png) 175 | **我们把从架构上把整个服务流程分成下面几个模块**: 176 | - **模型推理/服务**:主要提供模型本身的推理,一般来说可以和具体业务解耦,专注模型推理本身性能的优化。可以以模块、API等多种方式提供。 177 | - **Client(客户端)**:可以理解为前端,与用户交互的地方。交互方式可以分为使用本地bash进行交互、使用Gradio构建的demo进行交互、通过网页进行交互、以及通过手机APP进行交互 178 | - **API Server**。一般作为前端的后端,提供与产品和服务相关的数据和功能支持,即提供API支持。 179 | 180 | ## 3-0、模型转换 181 | > **TurboMind**: 是一款关于 LLM 推理的高效推理引擎,基于英伟达的 FasterTransformer 研发而成。它的主要功能包括:LLaMa 结构模型的支持,persistent batch 推理模式和可扩展的 KV 缓存管理器。这里使用TurboMind来推理模型,**使用 TurboMind 推理模型需要先将模型转化为 TurboMind 的格式,目前支持在线转换和离线转换两种形式。** 182 | 183 | ### 3-0-1、在线转换 184 | **概述**:以下为支持转换的模型类型,以下每一行命令都会启动一个本地对话界面,通过bash可以与LLM进行对话。 185 | ```c 186 | # 需要能访问 Huggingface 的网络环境 187 | # 在 huggingface.co 上面通过 lmdeploy 量化的模型,如 llama2-70b-4bit, internlm-chat-20b-4bit 188 | lmdeploy chat turbomind internlm/internlm-chat-20b-4bit --model-name internlm-chat-20b 189 | # huggingface.co 上面其他 LM 模型,如 Qwen/Qwen-7B-Chat 190 | lmdeploy chat turbomind Qwen/Qwen-7B-Chat --model-name qwen-7b 191 | 192 | # others:也可以直接启动本地的 Huggingface 模型,如下所示。 193 | lmdeploy chat turbomind /share/temp/model_repos/internlm-chat-7b/ --model-name internlm-chat-7b 194 | ``` 195 | *如下图所示为使用Qwen-7B-Chat*: 196 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/60f51db6f357497a96705675a9751074.png) 197 | 198 | 199 | ### 3-0-2、离线转换 200 | **概述**:离线转换需要在启动服务之前,将模型转为 lmdeploy TurboMind 的格式, **执行完成后将会在当前目录生成一个 workspace 的文件夹。** 这里面包含的就是 TurboMind 和 Triton “模型推理”需要到的文件。 201 | 202 | ```c 203 | # 转换模型(FastTransformer格式) TurboMind, 可以通过 --tp 指定显卡数量,默认一张卡。 204 | lmdeploy convert internlm-chat-7b /root/share/temp/model_repos/internlm-chat-7b/ 205 | ``` 206 | ## 3-1、TurboMind 推理+命令行本地对话 207 | **概述**:使用本地对话(Bash Local Chat)来调用TurboMind,这里的参数是上边离线转换后目录下的workspace文件夹。 208 | ```c 209 | # Turbomind + Bash Local Chat 210 | lmdeploy chat turbomind ./workspace 211 | ``` 212 | **如下图所示**:输入后两次回车进行对话,退出时输入exit 回车两次即可。 213 | 214 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/df4f94dba37a46c680d882ee955b98c5.png) 215 | ## 3-2、TurboMind推理+API服务 216 | **概述**:通过以下命令来启动API服务。 217 | ```c 218 | # ApiServer+Turbomind api_server => AsyncEngine => TurboMind 219 | # 参数分别表示workspace地址、服务地址、端口号、实例数、显卡数量 220 | lmdeploy serve api_server ./workspace \ 221 | --server_name 0.0.0.0 \ 222 | --server_port 23333 \ 223 | --instance_num 64 \ 224 | --tp 1 225 | ``` 226 | **启用成功后如下图所示**: 227 | 228 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/367610dd3d05499fbb5aad66e7ba727c.png) 229 | **新开一个vs命令行,执行Client命令**: 230 | ```c 231 | # ChatApiClient+ApiServer(注意是http协议,需要加http) 232 | lmdeploy serve api_client http://localhost:23333 233 | ``` 234 | **启用成功后如下图所示**:可以与服务进行交互。 235 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/5f023e15d2c648e993bdbb7e14cb9c2b.png) 236 | ## 3-3、网页 Demo 演示 237 | 238 | ### 3-3-1、ApiServer+Turbomind+Gradio 239 | ```c 240 | # 启动server 241 | # ApiServer+Turbomind api_server => AsyncEngine => TurboMind 242 | lmdeploy serve api_server ./workspace \ 243 | --server_name 0.0.0.0 \ 244 | --server_port 23333 \ 245 | --instance_num 64 \ 246 | --tp 1 247 | 248 | # Gradio+ApiServer。必须先开启 Server,此时 Gradio 为 Client 249 | lmdeploy serve gradio http://0.0.0.0:23333 \ 250 | --server_name 0.0.0.0 \ 251 | --server_port 6006 \ 252 | --restful_api True 253 | ``` 254 | **Notice**:之后将Client映射本地,详细映射方法请见附录2。 255 | **配置成功后如下图所示**: 256 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/9f0b2c982ea64704aff6e52f99cf2a1d.png) 257 | 258 | ### 3-3-2、Turbomind+Gradio 259 | **概述**:跳过API服务,直接与TurboMind 连接,之后映射本地。 260 | ```c 261 | # Gradio+Turbomind(local) 262 | lmdeploy serve gradio ./workspace 263 | ``` 264 | 265 | ## 3-4、TurboMind 推理 + Python 代码集成 266 | **概述**:Python 直接与 TurboMind 进行交互。 267 | ```c 268 | from lmdeploy import turbomind as tm 269 | 270 | # load model 271 | # 加载模型并且使用turbomind 来创建实例进行推理 272 | model_path = "/root/share/temp/model_repos/internlm-chat-7b/" 273 | tm_model = tm.TurboMind.from_pretrained(model_path, model_name='internlm-chat-20b') 274 | generator = tm_model.create_instance() 275 | 276 | # process query 277 | # 构造输入,对输入进行编码 278 | query = "你好啊兄嘚" 279 | prompt = tm_model.model.get_prompt(query) 280 | input_ids = tm_model.tokenizer.encode(prompt) 281 | 282 | # inference 283 | # 进行推理 284 | for outputs in generator.stream_infer( 285 | session_id=0, 286 | input_ids=[input_ids]): 287 | res, tokens = outputs[0] 288 | 289 | response = tm_model.tokenizer.decode(res.tolist()) 290 | print(response) 291 | ``` 292 | 293 | # 四、量化 294 | **量化:是一种以参数或计算中间结果精度的下降来换取空间节省(以及同时带来的性能提升)的策略。** 295 | 296 | **正式介绍 LMDeploy 量化方案前,需要先介绍两个概念**: 297 | - 计算密集(compute-bound): 指推理过程中,绝大部分时间消耗在数值计算上;针对计算密集型场景,可以通过使用更快的硬件计算单元来提升计算速。 298 | - 访存密集(memory-bound): 指推理过程中,绝大部分时间消耗在数据读取上;针对访存密集型场景,一般通过减少访存次数、提高计算访存比或降低访存量来优化。 299 | ## 4-1、KV Cache 量化 300 | **KV Cache 量化是将已经生成序列的 KV 变成 Int8。** 301 | 302 | 303 | ### 4-1-1、计算 minmax 304 | **主要思路是通过计算给定输入样本在每一层不同位置处计算结果的统计情况。** 305 | 306 | - 对于 Attention 的 K 和 V:取每个 Head 各自维度在所有Token的最大、最小和绝对值最大值。对每一层来说,上面三组值都是 (num_heads, head_dim) 的矩阵。这里的统计结果将用于本小节的 KV Cache。 307 | - 对于模型每层的输入:取对应维度的最大、最小、均值、绝对值最大和绝对值均值。每一层每个位置的输入都有对应的统计值,它们大多是 (hidden_dim, ) 的一维向量,当然在 FFN 层由于结构是先变宽后恢复,因此恢复的位置维度并不相同。这里的统计结果用于下个小节的模型参数量化,主要用在缩放环节 308 | 309 | **对应命令如下**:选择 128 条输入样本,每条样本长度为 2048,数据集选择 C4,输入模型后就会得到上面的各种统计值。 310 | ```c 311 | # 计算 minmax 312 | lmdeploy lite calibrate \ 313 | --model /root/share/temp/model_repos/internlm-chat-7b/ \ 314 | --calib_dataset "c4" \ 315 | --calib_samples 128 \ 316 | --calib_seqlen 2048 \ 317 | --work_dir ./quant_output 318 | ``` 319 | 320 | ### 4-1-2、通过 minmax 获取量化参数 321 | **对应命令如下**:获取每一层的 K V 中心值(zp)和缩放值(scale)。 322 | ```c 323 | # 通过 minmax 获取量化参数 324 | lmdeploy lite kv_qparams \ 325 | --work_dir ./quant_output \ 326 | --turbomind_dir workspace/triton_models/weights/ \ 327 | --kv_sym False \ 328 | --num_tp 1 329 | ``` 330 | ### 4-1-3、修改配置 331 | 332 | 修改 weights/config.ini 文件,把 quant_policy 改为 4 即可。更加详细的介绍请看结尾参考文章。 333 | 334 | # 附录: 335 | ## 1、显卡使用 336 | **查看显卡使用情况**: 337 | ```c 338 | vgpu-smi 339 | ``` 340 | *如下图所示*: 341 | 342 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/37e890db03614ee2a3d6335340f2f62a.png) 343 | **实时检测GPU使用情况**: 344 | ```c 345 | watch vgpu-smi 346 | ``` 347 | *如下图所示*: 348 | 349 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/92241eae3f6049f8b3364d0c2bcfdef5.png) 350 | 351 | ## 2、配置本地端口(服务器端口映射到本地) 352 | 353 | - **步骤一**:本地打开命令行窗口生成公钥,全点击回车就ok(不配置密码)。 354 | ```c 355 | # 使用如下命令 356 | ssh-keygen -t rsa 357 | ``` 358 | **默认放置路径如下图所示**: 359 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/d53ac4900ed84b8fa4a0d09b44f3f170.png) 360 | - **步骤二**:打开默认放置路径,复制公钥,在远程服务器上配置公钥。 361 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/03ad6a230c0d481cb694f49f5186d0e9.png) 362 | 363 | - **步骤三**:本地终端输入命令 364 | 365 | ```c 366 | # 6006是远程端口号(如下图所示,远程启动的端口号为6006),33447是远程ssh连接的端口号, 367 | ssh -CNg -L 6006:127.0.0.1:6006 root@ssh.intern-ai.org.cn -p 33447 368 | ``` 369 | **如下图所示**:本节API服务启动的页面展示 370 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/dca0645331c6436c8031c1f74b8db490.png) 371 | 372 | 373 | **参考文章**: 374 | 375 | [LMDeploy 的量化和部署](https://github.com/InternLM/tutorial/blob/7c2a385cd772ed93965927599b0159c52068da85/lmdeploy/lmdeploy.md#1-%E7%8E%AF%E5%A2%83%E9%85%8D%E7%BD%AE). 376 | [lmdeploy——github](https://github.com/InternLM/lmdeploy/). 377 | [仅需一块3090显卡,高效部署InternLM-20B模型](https://zhuanlan.zhihu.com/p/665725861). 378 | --------------------------------------------------------------------------------