├── README.md
├── README_CN.md
├── api.py
├── prompt.md
└── requirements.txt
/README.md:
--------------------------------------------------------------------------------
1 | # chat2db-sqlcoder-deploy
2 |
3 | Languages: English | [中文](README_CN.md)
4 |
5 | ## 📖 Introduction
6 |
7 | This project introduces how to deploy the 8-bit quantized sqlcoder model on Alibaba Cloud for free, and apply the large model to the Chat2DB client.
8 |
9 | !!! Please note that the sqlcoder project is mainly for SQL generation, so it performs better in natural language to SQL, but slightly worse in SQL interpretation, optimization and transformation. Use it for reference only, do not blame the model or product.
10 |
11 | ## 📦 Hardware Requirements
12 |
13 | | Model | Minimum GPU Memory (Inference) | Minimum GPU Memory (Efficient Tuning) |
14 | |:------|:------------------------------|:-------------------------------------|
15 | | sqlcoder-int8 | 20GB | 20GB |
16 |
17 | ## 📦 Deployment
18 |
19 | ### 📦 Deploy 8-bit model on Alibaba Cloud DSW
20 |
21 | 1. Apply for free trial of [Alibaba Cloud DSW](https://www.alibabacloud.com/).
22 |
23 | 2. Create a DSW instance, select the resource group that can deduct resource package, and select the instance image pytorch:1.12-gpu-py39-cu113-ubuntu20.04
24 |
25 |
26 | 3. Install the dependencies in [requirements.txt](requirements.txt)
27 |
28 | ```bash
29 | pip install -r requirements.txt
30 | ```
31 |
32 | 4. Download the latest bitsandbytes package to support 8-bit models:
33 |
34 | ```bash
35 | pip install -i https://test.pypi.org/simple/ bitsandbytes
36 | ```
37 |
38 | 5. Create folders named sqlcoder-model and sqlcoder in DSW instance under the path "/mnt/workspace".
39 |
40 | 6. Download sqlcoder model under sqlcoder-model folder:
41 |
42 | ```bash
43 | git clone https://huggingface.co/defog/sqlcoder
44 | ```
45 |
46 | 7. Copy api.py and prompt.md to sqlcoder folder.
47 |
48 | 8. Install FastAPI related packages:
49 |
50 | ```bash
51 | pip install fastapi nest-asyncio pyngrok uvicorn
52 | ```
53 |
54 | 9. Start the API service under sqlcoder folder:
55 |
56 | ```bash
57 | python api.py
58 | ```
59 |
60 | 10. You will get an API url like `https://dfb1-34-87-2-137.ngrok.io`.
61 |
62 | 11. Configure the API url in Chat2DB client to use the model for SQL generation.
63 |
64 |
65 | ### 📦 Deploy fp16 model on Alibaba Cloud DSW
66 |
67 | * If resources permit, you can try deploying the non-quantized sqlcoder model, which will have slightly higher accuracy in SQL generation than the 8-bit model, but requires more GPU memory and longer inference time.
68 |
69 | * Just modify the model loading in api.py to fp16 model:
70 |
71 | ```python
72 | model = AutoModelForCausalLM.from_pretrained("/mnt/workspace/sqlcoder-model/sqlcoder",
73 | trust_remote_code=True,
74 | torch_dtype=torch.float16,
75 | device_map="auto",
76 | use_cache=True)
77 | ```
78 |
79 | ### 📦 Deploy on other cloud platforms
80 |
81 | * Although this tutorial uses Alibaba Cloud DSW as example, the scripts and commands have no customization. In theory, sqlcoder can be deployed on any cloud by following the steps above.
--------------------------------------------------------------------------------
/README_CN.md:
--------------------------------------------------------------------------------
1 | # chat2db-sqlcoder-deploy部署
2 |
3 | 语言:中文 | [English](README.md)
4 |
5 | ## 📖 简介
6 | 这个工程介绍了如何在阿里云上免费部署sqlcoder的8bit量化模型,并将大模型应用到Chat2DB客户端中。
7 |
8 | !!!请注意,sqlcoder项目主要是针对SQL生成的,所以在自然语言转SQL方面表现较好,但是在SQL解释、SQL优化和SQL转化方面表现略差,仅供大家实验参考,切勿迁怒于模型或产品。
9 |
10 | ## 📦 硬件要求
11 | | 模型 | 最低GPU显存(推理) | 最低GPU显存(高效参数微调) |
12 | |:-------------:|:-----------:|:---------------:|
13 | | sqlcoder-int8 | 20GB | 20GB |
14 |
15 |
16 | ## 📦 部署
17 | ### 📦 在阿里云DSW中部署8bit模型
18 |
19 | 1. [阿里云免费使用平台](https://free.aliyun.com/)申请DSW免费试用。
20 |
21 | 2. 创建一个DSW实例,资源组选择可以抵扣资源包的资源组,实例镜像选择pytorch:1.12-gpu-py39-cu113-ubuntu20.04
22 |
23 |
24 | 3. 安装本仓库中的[requirements.txt](requirements.txt)中的依赖包
25 |
26 | ```bash
27 | pip install -r requirements.txt
28 | ```
29 |
30 | 4. 因为要跑8bit的量化模型,所以还需要下载bitsandbytes包,执行下面的命令下载最新版本,否则cuda有可能会出现不兼容的情况
31 |
32 | ```bash
33 | pip install -i https://test.pypi.org/simple/ bitsandbytes
34 | ```
35 |
36 | 5. 在DSW实例中打开一个terminal,在目录/mnt/workspace下创建sqlcoder-model和sqlcoder文件夹
37 | 6. 在sqlcoder-model文件夹下下载sqlcoder模型,执行下面的命令,请确保模型里面的几个bin文件下载完整且正确
38 |
39 | ```bash
40 | git clone https://huggingface.co/defog/sqlcoder
41 | ```
42 |
43 | 7. 将本项目下的api.py和prompt.md文件拷贝到sqlcoder文件夹下
44 | 8. 安装fastapi相关包
45 |
46 | ```bash
47 | pip install fastapi nest-asyncio pyngrok uvicorn
48 | ```
49 |
50 | 9. 在sqlcoder文件夹下执行下面的命令,启动api服务
51 |
52 | ```bash
53 | python api.py
54 | ```
55 |
56 | 10. 执行以上步骤之后,你将得到一个api url,类似于`https://dfb1-34-87-2-137.ngrok.io`。
57 |
58 |
59 | 11. 将api url复制到chat2db客户端中,即可开始使用模型生成SQL了。参考下图进行配置
60 |
61 |
62 | - 实验结果如下
63 |
64 | * 注意: 模型推理时间可能会比较长,会有明显的卡顿。
65 |
66 | ### 📦 在阿里云DSW中部署非量化模型
67 | * 如果机器资源允许,可以尝试部署非量化的sqlcoder模型,在生成SQL的准确率上会比8bit的模型高一些,但是需要更多的显存和更长的推理时间。
68 | * 部署非量化模型的步骤同上,只需要将api.py文件中的模型加载改成float16的模型即可,具体如下:
69 |
70 | ```python
71 | model = AutoModelForCausalLM.from_pretrained("/mnt/workspace/sqlcoder-model/sqlcoder",
72 | trust_remote_code=True,
73 | torch_dtype=torch.float16,
74 | # load_in_8bit=True,
75 | device_map="auto",
76 | use_cache=True)
77 | ```
78 |
79 | ### 📦 在其他云资源上部署sqlcoder模型
80 | * 本教程虽然写的是在阿里云DSW环境上完成的,但是本教程中的脚本和命令并没有进行任何定制,理论上遵循以上步骤,可以在任何云资源上进行部署。
--------------------------------------------------------------------------------
/api.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM, pipeline
3 | import argparse
4 | from fastapi import FastAPI, Request
5 | import uvicorn, json, datetime
6 | import nest_asyncio
7 | from pyngrok import ngrok
8 |
9 | DEVICE = "cuda"
10 | DEVICE_ID = "0"
11 | CUDA_DEVICE = f"{DEVICE}:{DEVICE_ID}" if DEVICE_ID else DEVICE
12 |
13 |
14 | def torch_gc():
15 | if torch.cuda.is_available():
16 | with torch.cuda.device(CUDA_DEVICE):
17 | torch.cuda.empty_cache()
18 | torch.cuda.ipc_collect()
19 |
20 |
21 | app = FastAPI()
22 |
23 |
24 | @app.post("/")
25 | async def create_item(request: Request):
26 | global model, tokenizer, prompt_template
27 | json_post_raw = await request.json()
28 | json_post = json.dumps(json_post_raw)
29 | json_post_list = json.loads(json_post)
30 | question = json_post_list.get('prompt')
31 | prompt = prompt_template.format(
32 | user_question=question.replace("#","")
33 | )
34 | sql_type = "自然语言转换成SQL查询"
35 | if sql_type in prompt:
36 | prompt += "```sql"
37 | else:
38 | prompt += ">>>"
39 | history = json_post_list.get('history')
40 | max_length = json_post_list.get('max_length')
41 | top_p = json_post_list.get('top_p')
42 | temperature = json_post_list.get('temperature')
43 | eos_token_id = tokenizer.convert_tokens_to_ids(["```"])[0]
44 | print("Loading a model and generating a SQL query for answering your question...")
45 | pipe = pipeline(
46 | "text-generation",
47 | model=model,
48 | tokenizer=tokenizer,
49 | max_new_tokens=300,
50 | do_sample=False,
51 | num_beams=5, # do beam search with 5 beams for high quality results
52 | )
53 | print("==========input========")
54 | print(prompt)
55 | generated_query = (
56 | pipe(
57 | prompt,
58 | num_return_sequences=1,
59 | eos_token_id=eos_token_id,
60 | pad_token_id=eos_token_id,
61 | )[0]["generated_text"]
62 | )
63 |
64 | response = generated_query
65 |
66 | if sql_type in prompt:
67 | response = response.split("`sql")[-1].split("`")[0].split(";")[0].strip() + ";"
68 |
69 | else:
70 | response = response.split(">>>")[-1].split("`")[0].strip()
71 |
72 | print("========output========")
73 | print(response)
74 | torch_gc()
75 | return response
76 |
77 |
78 | if __name__ == '__main__':
79 | prompt_template = ""
80 | with open("prompt.md", "r") as f:
81 | prompt_template = f.read()
82 | tokenizer = AutoTokenizer.from_pretrained("/mnt/workspace/sqlcoder-model/sqlcoder", trust_remote_code=True)
83 | model = AutoModelForCausalLM.from_pretrained("/mnt/workspace/sqlcoder-model/sqlcoder",
84 | trust_remote_code=True,
85 | # torch_dtype=torch.float16,
86 | load_in_8bit=True,
87 | device_map="auto",
88 | use_cache=True)
89 | ngrok_tunnel = ngrok.connect(8000)
90 | print('Public URL:', ngrok_tunnel.public_url)
91 | nest_asyncio.apply()
92 | uvicorn.run(app, host='0.0.0.0', port=8000, workers=1)
93 |
94 |
95 |
96 |
--------------------------------------------------------------------------------
/prompt.md:
--------------------------------------------------------------------------------
1 | ### Instructions:
2 | {user_question}
3 | ### Response:
4 | Based on your instructions, here is the result I have generated to answer the question `{user_question}`:
5 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | tqdm==4.65.0
2 | transformers==4.28.1
3 | datasets==2.11.0
4 | huggingface-hub==0.13.4
5 | accelerate==0.18.0
--------------------------------------------------------------------------------