├── aia-activity-rag.csv
├── README.md
├── .gitignore
├── Llama-2-example.ipynb
├── LICENSE
├── demo-3-Mistral_7b_FastAPI_Service_Colab_Example_External_Version_v2.ipynb
├── Llama_2_FastAPI_Service_Colab_Example.ipynb
└── demo-1-llama_gguf_prediction.ipynb


/aia-activity-rag.csv:
--------------------------------------------------------------------------------
 1 | data
 2 | 台灣人工智慧共識協作工作坊 - AI & Web3 專場
 3 | 日期：2023/12/10（日）
 4 | 時間：13:00 - 17:00
 5 | 地點：松菸 W210 西向會議室
 6 | 主辦：台灣人工智慧學校、moda 數位發展部 及 da0
 7 | 協辦：Funding the Commons、 AIT 美國在台協會、美國創新中心 AIC 及 松菸創作者工廠
 8 | 報名網址：https://forms.gle/gGQk1Ay1suWpf6pQ9
 9 | "活動議程：
10 | 13:00-13:30 報到
11 | 13:30-13:40 開場短講(侯宜秀律師)
12 | 13:40-14:20 盤點 AI 目前衝擊跟擔憂
13 | 14:20-14:35 中場休息
14 | 14:35-15:25 盤點針對 AI 衝擊的 web3 解法
15 | 15:25-15:35 中場休息
16 | 15:35-15:50 a16z AIxWeb3 報告摘要(Noah Yeh)
17 | 15:50-16:30 覆盤 Web3 的解法真的可以解決問題嗎？
18 | 16:30-16:40 活動總回顧 10min"
19 | "備註：
20 | 本場活動將遵守「Chatham House Rule」紀錄活動過程產出。如果一個會議，或會議的一部分，是按照Chatham House規則進行的，則與會者可自由使用在會議中獲得的資訊，但不得透露演講者及其他與會者的身份與所屬機構。"


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Llama-2-cpp-example
 2 | An example to run Llama 2 cpp python in Colab environment.
 3 | 
 4 | # Link:
 5 | 
 6 | Part 1: How to use llama 2
 7 | Colab Link: [Link](https://colab.research.google.com/github/LiuYuWei/Llama-2-cpp-example/blob/main/Llama-2-example.ipynb)
 8 | 
 9 | Part 2: How to let llama 2 Model as a Fastapi Service
10 | Colab Link: [Link](https://colab.research.google.com/github/LiuYuWei/Llama-2-cpp-example/blob/main/Llama_2_FastAPI_Service_Colab_Example.ipynb)
11 | 
12 | Part 3: How to let mistral 7b Model as a Fastapi Service
13 | Colab Link: [Link](https://colab.research.google.com/github/LiuYuWei/Llama-2-cpp-example/blob/main/Mistral_7b_FastAPI_Service_Colab_Example_External_Version.ipynb)
14 | 
15 | # Course:
16 | 
17 | - Demo 1 - GGUF example code:
18 | Colab Link: [Link](https://colab.research.google.com/github/LiuYuWei/Llama-2-cpp-example/blob/main/demo-1-llama_gguf_prediction.ipynb)
19 | 
20 | - Demo 2 - Embedding Vector and RAG:
21 | Colab Link: [Link](https://colab.research.google.com/github/LiuYuWei/Llama-2-cpp-example/blob/main/demo-2-rag_example.ipynb)
22 | 
23 | - Demo 3 - Let mistral 7b Model as a Fastapi Service
24 | Colab Link: [Link](https://colab.research.google.com/github/LiuYuWei/Llama-2-cpp-example/blob/main/demo-3-Mistral_7b_FastAPI_Service_Colab_Example_External_Version_v2.ipynb)
25 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | share/python-wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | MANIFEST
 28 | 
 29 | # PyInstaller
 30 | #  Usually these files are written by a python script from a template
 31 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 32 | *.manifest
 33 | *.spec
 34 | 
 35 | # Installer logs
 36 | pip-log.txt
 37 | pip-delete-this-directory.txt
 38 | 
 39 | # Unit test / coverage reports
 40 | htmlcov/
 41 | .tox/
 42 | .nox/
 43 | .coverage
 44 | .coverage.*
 45 | .cache
 46 | nosetests.xml
 47 | coverage.xml
 48 | *.cover
 49 | *.py,cover
 50 | .hypothesis/
 51 | .pytest_cache/
 52 | cover/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | .pybuilder/
 76 | target/
 77 | 
 78 | # Jupyter Notebook
 79 | .ipynb_checkpoints
 80 | 
 81 | # IPython
 82 | profile_default/
 83 | ipython_config.py
 84 | 
 85 | # pyenv
 86 | #   For a library or package, you might want to ignore these files since the code is
 87 | #   intended to run in multiple environments; otherwise, check them in:
 88 | # .python-version
 89 | 
 90 | # pipenv
 91 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 92 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 93 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 94 | #   install all needed dependencies.
 95 | #Pipfile.lock
 96 | 
 97 | # poetry
 98 | #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
 99 | #   This is especially recommended for binary packages to ensure reproducibility, and is more
100 | #   commonly ignored for libraries.
101 | #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102 | #poetry.lock
103 | 
104 | # pdm
105 | #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106 | #pdm.lock
107 | #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108 | #   in version control.
109 | #   https://pdm.fming.dev/#use-with-ide
110 | .pdm.toml
111 | 
112 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113 | __pypackages__/
114 | 
115 | # Celery stuff
116 | celerybeat-schedule
117 | celerybeat.pid
118 | 
119 | # SageMath parsed files
120 | *.sage.py
121 | 
122 | # Environments
123 | .env
124 | .venv
125 | env/
126 | venv/
127 | ENV/
128 | env.bak/
129 | venv.bak/
130 | 
131 | # Spyder project settings
132 | .spyderproject
133 | .spyproject
134 | 
135 | # Rope project settings
136 | .ropeproject
137 | 
138 | # mkdocs documentation
139 | /site
140 | 
141 | # mypy
142 | .mypy_cache/
143 | .dmypy.json
144 | dmypy.json
145 | 
146 | # Pyre type checker
147 | .pyre/
148 | 
149 | # pytype static type analyzer
150 | .pytype/
151 | 
152 | # Cython debug symbols
153 | cython_debug/
154 | 
155 | # PyCharm
156 | #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
157 | #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158 | #  and can be added to the global gitignore or merged into this file.  For a more nuclear
159 | #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
160 | #.idea/
161 | 


--------------------------------------------------------------------------------
/Llama-2-example.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "bba801bf",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Llama 2 Introduction\n",
  9 |     "\n",
 10 |     "#### Made by SimonLiu\n",
 11 |     "\n",
 12 |     "1. My Linkedin: https://www.linkedin.com/in/simonliuyuwei/\n",
 13 |     "\n",
 14 |     "2. InfuseAI: https://infuseai.io"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "id": "668cc076",
 20 |    "metadata": {},
 21 |    "source": [
 22 |     "### Llama 2\n",
 23 |     "\n",
 24 |     "The next generation of our open source large language model\n",
 25 |     "\n",
 26 |     "1. Official Website: [Link](https://ai.meta.com/llama/)\n",
 27 |     "\n",
 28 |     "2. Download Model: [Link](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "id": "617852d9",
 34 |    "metadata": {},
 35 |    "source": [
 36 |     "### Related Website: \n",
 37 |     "\n",
 38 |     "1. llama.cpp: [Link](https://github.com/ggerganov/llama.cpp)\n",
 39 |     "\n",
 40 |     "2. llama-cpp-python: [Link](https://github.com/abetlen/llama-cpp-python)\n",
 41 |     "\n",
 42 |     "3. ggml: Tensor library for machine learning - [Link](https://github.com/ggerganov/ggml)"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "markdown",
 47 |    "id": "3a7a1d72",
 48 |    "metadata": {},
 49 |    "source": [
 50 |     "# Code"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "markdown",
 55 |    "id": "cd34727e",
 56 |    "metadata": {},
 57 |    "source": [
 58 |     "## Step 1: Install related package"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "code",
 63 |    "execution_count": null,
 64 |    "id": "b97cf78b",
 65 |    "metadata": {},
 66 |    "outputs": [],
 67 |    "source": [
 68 |     "# GPU llama-cpp-python\n",
 69 |     "!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": null,
 75 |    "id": "8b334357",
 76 |    "metadata": {},
 77 |    "outputs": [],
 78 |    "source": [
 79 |     "# For download the models\n",
 80 |     "!pip install huggingface_hub"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "markdown",
 85 |    "id": "b617a2d2",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "## Step 2: Import python libraries and Variable config"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "markdown",
 93 |    "id": "33bb302a",
 94 |    "metadata": {},
 95 |    "source": [
 96 |     "### Import Python Package"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": null,
102 |    "id": "c138f0cd",
103 |    "metadata": {},
104 |    "outputs": [],
105 |    "source": [
106 |     "from huggingface_hub import hf_hub_download\n",
107 |     "from llama_cpp import Llama"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "markdown",
112 |    "id": "4a79589c",
113 |    "metadata": {},
114 |    "source": [
115 |     "### Configure Variables"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "code",
120 |    "execution_count": null,
121 |    "id": "f2eaa426",
122 |    "metadata": {},
123 |    "outputs": [],
124 |    "source": [
125 |     "download_model_bool = True"
126 |    ]
127 |   },
128 |   {
129 |    "cell_type": "markdown",
130 |    "id": "0b6e85ad",
131 |    "metadata": {},
132 |    "source": [
133 |     "#### HuggingFace Llama-cpp Model Link:\n",
134 |     "\n",
135 |     "1. TheBloke/Llama-2-7B-chat-GGML: [Link](https://huggingface.co/TheBloke/Llama-2-7B-chat-GGML)\n",
136 |     "2. TheBloke/Llama-2-13B-chat-GGML: [Link](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML)\n",
137 |     "3. TheBloke/Llama-2-70B-chat-GGML: [Link](https://huggingface.co/TheBloke/Llama-2-70B-chat-GGML)\n",
138 |     "4. audreyt/Taiwan-LLaMa-v1.0-GGML: [Link](https://huggingface.co/audreyt/Taiwan-LLaMa-v1.0-GGML)"
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "code",
143 |    "execution_count": null,
144 |    "id": "0d430952",
145 |    "metadata": {},
146 |    "outputs": [],
147 |    "source": [
148 |     "model_name_or_path = \"TheBloke/Llama-2-13B-chat-GGML\""
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "markdown",
153 |    "id": "514dfe3b",
154 |    "metadata": {},
155 |    "source": [
156 |     "#### the model is in bin format:\n",
157 |     "Please Get the bin file from here: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/tree/main"
158 |    ]
159 |   },
160 |   {
161 |    "cell_type": "code",
162 |    "execution_count": null,
163 |    "id": "2657226c",
164 |    "metadata": {},
165 |    "outputs": [],
166 |    "source": [
167 |     "model_basename = \"llama-2-13b-chat.ggmlv3.q5_1.bin\""
168 |    ]
169 |   },
170 |   {
171 |    "cell_type": "markdown",
172 |    "id": "c54db323",
173 |    "metadata": {},
174 |    "source": [
175 |     "## Step 3: Download Model"
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "code",
180 |    "execution_count": null,
181 |    "id": "920b26e6",
182 |    "metadata": {},
183 |    "outputs": [],
184 |    "source": [
185 |     "model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "code",
190 |    "execution_count": null,
191 |    "id": "cad8e269",
192 |    "metadata": {},
193 |    "outputs": [],
194 |    "source": [
195 |     "print(model_path)"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "markdown",
200 |    "id": "17c2c078",
201 |    "metadata": {},
202 |    "source": [
203 |     "## Step 4: Loading the Model"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "code",
208 |    "execution_count": null,
209 |    "id": "e62c50a3",
210 |    "metadata": {},
211 |    "outputs": [],
212 |    "source": [
213 |     "# GPU\n",
214 |     "lcpp_llm = None\n",
215 |     "lcpp_llm = Llama(\n",
216 |     "    model_path=model_path,\n",
217 |     "    n_threads=2,             # CPU cores\n",
218 |     "    n_batch=512,             # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.\n",
219 |     "    n_gpu_layers=32          # Change this value based on your model and your GPU VRAM pool.\n",
220 |     ")"
221 |    ]
222 |   },
223 |   {
224 |    "cell_type": "markdown",
225 |    "id": "c235c196",
226 |    "metadata": {},
227 |    "source": [
228 |     "## Step 5: Create a Prompt"
229 |    ]
230 |   },
231 |   {
232 |    "cell_type": "code",
233 |    "execution_count": null,
234 |    "id": "24d261b8",
235 |    "metadata": {},
236 |    "outputs": [],
237 |    "source": [
238 |     "prompt = \"Write a linear regression in python\""
239 |    ]
240 |   },
241 |   {
242 |    "cell_type": "code",
243 |    "execution_count": null,
244 |    "id": "9764a441",
245 |    "metadata": {},
246 |    "outputs": [],
247 |    "source": [
248 |     "prompt_template=f'''SYSTEM: You are a helpful, respectful and honest assistant. Always answer as helpfully.\n",
249 |     "\n",
250 |     "USER: {prompt}\n",
251 |     "\n",
252 |     "ASSISTANT:\n",
253 |     "'''"
254 |    ]
255 |   },
256 |   {
257 |    "cell_type": "markdown",
258 |    "id": "1bcf7cb2",
259 |    "metadata": {},
260 |    "source": [
261 |     "## Step 6: Generating the Response"
262 |    ]
263 |   },
264 |   {
265 |    "cell_type": "code",
266 |    "execution_count": null,
267 |    "id": "879a2c22",
268 |    "metadata": {},
269 |    "outputs": [],
270 |    "source": [
271 |     "# Predict the Result\n",
272 |     "response = lcpp_llm(prompt=prompt_template, max_tokens=256, temperature=0.5, top_p=0.95,\n",
273 |     "                    repeat_penalty=1.2, top_k=150,\n",
274 |     "                    echo=True)"
275 |    ]
276 |   },
277 |   {
278 |    "cell_type": "code",
279 |    "execution_count": null,
280 |    "id": "0b6506b1",
281 |    "metadata": {},
282 |    "outputs": [],
283 |    "source": [
284 |     "# Print the json content.\n",
285 |     "response"
286 |    ]
287 |   },
288 |   {
289 |    "cell_type": "code",
290 |    "execution_count": null,
291 |    "id": "8a4f367d",
292 |    "metadata": {},
293 |    "outputs": [],
294 |    "source": [
295 |     "# Print the response answer.\n",
296 |     "print(response[\"choices\"][0][\"text\"])"
297 |    ]
298 |   }
299 |  ],
300 |  "metadata": {
301 |   "kernelspec": {
302 |    "display_name": "Python 3 (ipykernel)",
303 |    "language": "python",
304 |    "name": "python3"
305 |   },
306 |   "language_info": {
307 |    "codemirror_mode": {
308 |     "name": "ipython",
309 |     "version": 3
310 |    },
311 |    "file_extension": ".py",
312 |    "mimetype": "text/x-python",
313 |    "name": "python",
314 |    "nbconvert_exporter": "python",
315 |    "pygments_lexer": "ipython3",
316 |    "version": "3.7.12"
317 |   }
318 |  },
319 |  "nbformat": 4,
320 |  "nbformat_minor": 5
321 | }
322 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/demo-3-Mistral_7b_FastAPI_Service_Colab_Example_External_Version_v2.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "cells": [
  3 |     {
  4 |       "cell_type": "markdown",
  5 |       "metadata": {
  6 |         "id": "VUC9Opfd8Mpt"
  7 |       },
  8 |       "source": [
  9 |         "# Llama 2 Fastapi Service Example"
 10 |       ]
 11 |     },
 12 |     {
 13 |       "cell_type": "markdown",
 14 |       "metadata": {
 15 |         "id": "e9e9QOWD-l7k"
 16 |       },
 17 |       "source": [
 18 |         "推薦至少使用 T4 GPU 來作為你的服務啟用"
 19 |       ]
 20 |     },
 21 |     {
 22 |       "cell_type": "markdown",
 23 |       "metadata": {
 24 |         "id": "h4zPTxsL7rUn"
 25 |       },
 26 |       "source": [
 27 |         "## Step 0: Config Setting"
 28 |       ]
 29 |     },
 30 |     {
 31 |       "cell_type": "code",
 32 |       "execution_count": 1,
 33 |       "metadata": {
 34 |         "id": "8kZ_1n8U85GQ"
 35 |       },
 36 |       "outputs": [],
 37 |       "source": [
 38 |         "# 如果你想要在 Google Colab 長期測試，你可以使用 ngrok 來做服務器代理的處理\n",
 39 |         "# 到官方網站註冊帳號：https://ngrok.com/\n",
 40 |         "# 申請token，你就可以將以下變數進行更換。\n",
 41 |         "NGROK_TOKEN = None"
 42 |       ]
 43 |     },
 44 |     {
 45 |       "cell_type": "code",
 46 |       "execution_count": 2,
 47 |       "metadata": {
 48 |         "id": "Vru9EFFb9lmp"
 49 |       },
 50 |       "outputs": [],
 51 |       "source": [
 52 |         "# GGUF Model\n",
 53 |         "# 你可以到 HuggingFace 去找尋相關的 GGUF 模型\n",
 54 |         "# Example:\n",
 55 |         "# Llama: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF\n",
 56 |         "# Taiwan Llama: https://huggingface.co/audreyt/Taiwan-LLaMa-v1.0-GGUF\n",
 57 |         "GGUF_HUGGINGFACE_REPO = \"TheBloke/Mistral-7B-Instruct-v0.1-GGUF\"\n",
 58 |         "GGUF_HUGGINGFACE_BIN_FILE = \"mistral-7b-instruct-v0.1.Q5_0.gguf\""
 59 |       ]
 60 |     },
 61 |     {
 62 |       "cell_type": "markdown",
 63 |       "metadata": {
 64 |         "id": "BXt_MjHG9Fw-"
 65 |       },
 66 |       "source": [
 67 |         "## Step 1: Install python package"
 68 |       ]
 69 |     },
 70 |     {
 71 |       "cell_type": "code",
 72 |       "execution_count": 3,
 73 |       "metadata": {
 74 |         "colab": {
 75 |           "base_uri": "https://localhost:8080/"
 76 |         },
 77 |         "id": "iV9rhIbg6K9d",
 78 |         "outputId": "1bbb1a7c-490e-4762-c7a0-12f09f7a441d"
 79 |       },
 80 |       "outputs": [
 81 |         {
 82 |           "output_type": "stream",
 83 |           "name": "stdout",
 84 |           "text": [
 85 |             "Requirement already satisfied: fastapi in /usr/local/lib/python3.10/dist-packages (0.108.0)\n",
 86 |             "Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (1.5.8)\n",
 87 |             "Requirement already satisfied: pyngrok in /usr/local/lib/python3.10/dist-packages (7.0.4)\n",
 88 |             "Requirement already satisfied: uvicorn in /usr/local/lib/python3.10/dist-packages (0.25.0)\n",
 89 |             "Requirement already satisfied: accelerate in /usr/local/lib/python3.10/dist-packages (0.25.0)\n",
 90 |             "Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (4.35.2)\n",
 91 |             "Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from fastapi) (1.10.13)\n",
 92 |             "Requirement already satisfied: starlette<0.33.0,>=0.29.0 in /usr/local/lib/python3.10/dist-packages (from fastapi) (0.32.0.post1)\n",
 93 |             "Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from fastapi) (4.9.0)\n",
 94 |             "Requirement already satisfied: PyYAML in /usr/local/lib/python3.10/dist-packages (from pyngrok) (6.0.1)\n",
 95 |             "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn) (8.1.7)\n",
 96 |             "Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.10/dist-packages (from uvicorn) (0.14.0)\n",
 97 |             "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate) (1.26.2)\n",
 98 |             "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (23.2)\n",
 99 |             "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate) (5.9.5)\n",
100 |             "Requirement already satisfied: torch>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (2.1.0+cu121)\n",
101 |             "Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.10/dist-packages (from accelerate) (0.19.4)\n",
102 |             "Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from accelerate) (0.4.1)\n",
103 |             "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.13.1)\n",
104 |             "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (2023.6.3)\n",
105 |             "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers) (2.31.0)\n",
106 |             "Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.15.0)\n",
107 |             "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers) (4.66.1)\n",
108 |             "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->accelerate) (2023.6.0)\n",
109 |             "Requirement already satisfied: anyio<5,>=3.4.0 in /usr/local/lib/python3.10/dist-packages (from starlette<0.33.0,>=0.29.0->fastapi) (3.7.1)\n",
110 |             "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (1.12)\n",
111 |             "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.2.1)\n",
112 |             "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.1.2)\n",
113 |             "Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (2.1.0)\n",
114 |             "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.3.2)\n",
115 |             "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.6)\n",
116 |             "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2.0.7)\n",
117 |             "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2023.11.17)\n",
118 |             "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.4.0->starlette<0.33.0,>=0.29.0->fastapi) (1.3.0)\n",
119 |             "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.4.0->starlette<0.33.0,>=0.29.0->fastapi) (1.2.0)\n",
120 |             "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.10.0->accelerate) (2.1.3)\n",
121 |             "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.10.0->accelerate) (1.3.0)\n"
122 |           ]
123 |         }
124 |       ],
125 |       "source": [
126 |         "# 安裝 fastapi, nest-asyncio, pyngrok, uvicorn, accelerate 和 transformers 套件，以支援API開發和深度學習模型的操作。\n",
127 |         "!pip install fastapi nest-asyncio pyngrok uvicorn accelerate transformers"
128 |       ]
129 |     },
130 |     {
131 |       "cell_type": "code",
132 |       "execution_count": 4,
133 |       "metadata": {
134 |         "colab": {
135 |           "base_uri": "https://localhost:8080/"
136 |         },
137 |         "id": "4yK3DJgd66El",
138 |         "outputId": "2898f6b6-972d-42b8-bc64-e32610cdd1ff"
139 |       },
140 |       "outputs": [
141 |         {
142 |           "output_type": "stream",
143 |           "name": "stdout",
144 |           "text": [
145 |             "Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)\n",
146 |             "Collecting llama-cpp-python==0.2.6\n",
147 |             "  Downloading llama_cpp_python-0.2.6.tar.gz (1.6 MB)\n",
148 |             "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m10.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
149 |             "\u001b[?25h  Running command pip subprocess to install build dependencies\n",
150 |             "  Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)\n",
151 |             "  Collecting scikit-build-core[pyproject]>=0.5.0\n",
152 |             "    Using cached scikit_build_core-0.7.0-py3-none-any.whl (136 kB)\n",
153 |             "  Collecting exceptiongroup (from scikit-build-core[pyproject]>=0.5.0)\n",
154 |             "    Using cached exceptiongroup-1.2.0-py3-none-any.whl (16 kB)\n",
155 |             "  Collecting packaging>=20.9 (from scikit-build-core[pyproject]>=0.5.0)\n",
156 |             "    Using cached packaging-23.2-py3-none-any.whl (53 kB)\n",
157 |             "  Collecting tomli>=1.1 (from scikit-build-core[pyproject]>=0.5.0)\n",
158 |             "    Using cached tomli-2.0.1-py3-none-any.whl (12 kB)\n",
159 |             "  Collecting pathspec>=0.10.1 (from scikit-build-core[pyproject]>=0.5.0)\n",
160 |             "    Using cached pathspec-0.12.1-py3-none-any.whl (31 kB)\n",
161 |             "  Collecting pyproject-metadata>=0.5 (from scikit-build-core[pyproject]>=0.5.0)\n",
162 |             "    Using cached pyproject_metadata-0.7.1-py3-none-any.whl (7.4 kB)\n",
163 |             "  Installing collected packages: tomli, pathspec, packaging, exceptiongroup, scikit-build-core, pyproject-metadata\n",
164 |             "  ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
165 |             "  lida 0.0.10 requires kaleido, which is not installed.\n",
166 |             "  lida 0.0.10 requires python-multipart, which is not installed.\n",
167 |             "  Successfully installed exceptiongroup-1.2.0 packaging-23.2 pathspec-0.12.1 pyproject-metadata-0.7.1 scikit-build-core-0.7.0 tomli-2.0.1\n",
168 |             "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
169 |             "  Running command Getting requirements to build wheel\n",
170 |             "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
171 |             "  Running command pip subprocess to install backend dependencies\n",
172 |             "  Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)\n",
173 |             "  Collecting cmake>=3.12\n",
174 |             "    Using cached cmake-3.28.1-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (26.3 MB)\n",
175 |             "  Collecting ninja>=1.5\n",
176 |             "    Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)\n",
177 |             "  Installing collected packages: ninja, cmake\n",
178 |             "    Creating /tmp/pip-build-env-77ltovlp/normal/local/bin\n",
179 |             "    changing mode of /tmp/pip-build-env-77ltovlp/normal/local/bin/ninja to 755\n",
180 |             "    changing mode of /tmp/pip-build-env-77ltovlp/normal/local/bin/cmake to 755\n",
181 |             "    changing mode of /tmp/pip-build-env-77ltovlp/normal/local/bin/cpack to 755\n",
182 |             "    changing mode of /tmp/pip-build-env-77ltovlp/normal/local/bin/ctest to 755\n",
183 |             "  Successfully installed cmake-3.28.1 ninja-1.11.1.1\n",
184 |             "  Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n",
185 |             "  Running command Preparing metadata (pyproject.toml)\n",
186 |             "  *** scikit-build-core 0.7.0 using CMake 3.28.1 (metadata_wheel)\n",
187 |             "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
188 |             "Collecting typing-extensions>=4.5.0 (from llama-cpp-python==0.2.6)\n",
189 |             "  Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)\n",
190 |             "Collecting numpy>=1.20.0 (from llama-cpp-python==0.2.6)\n",
191 |             "  Downloading numpy-1.26.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)\n",
192 |             "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m18.2/18.2 MB\u001b[0m \u001b[31m70.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
193 |             "\u001b[?25hCollecting diskcache>=5.6.1 (from llama-cpp-python==0.2.6)\n",
194 |             "  Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)\n",
195 |             "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.5/45.5 kB\u001b[0m \u001b[31m124.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
196 |             "\u001b[?25hBuilding wheels for collected packages: llama-cpp-python\n",
197 |             "  Running command Building wheel for llama-cpp-python (pyproject.toml)\n",
198 |             "  *** scikit-build-core 0.7.0 using CMake 3.28.1 (wheel)\n",
199 |             "  *** Configuring CMake...\n",
200 |             "  loading initial cache file /tmp/tmp0a32c0oq/build/CMakeInit.txt\n",
201 |             "  -- The C compiler identification is GNU 11.4.0\n",
202 |             "  -- The CXX compiler identification is GNU 11.4.0\n",
203 |             "  -- Detecting C compiler ABI info\n",
204 |             "  -- Detecting C compiler ABI info - done\n",
205 |             "  -- Check for working C compiler: /usr/bin/cc - skipped\n",
206 |             "  -- Detecting C compile features\n",
207 |             "  -- Detecting C compile features - done\n",
208 |             "  -- Detecting CXX compiler ABI info\n",
209 |             "  -- Detecting CXX compiler ABI info - done\n",
210 |             "  -- Check for working CXX compiler: /usr/bin/c++ - skipped\n",
211 |             "  -- Detecting CXX compile features\n",
212 |             "  -- Detecting CXX compile features - done\n",
213 |             "  -- Found Git: /usr/bin/git (found version \"2.34.1\")\n",
214 |             "  fatal: not a git repository (or any of the parent directories): .git\n",
215 |             "  fatal: not a git repository (or any of the parent directories): .git\n",
216 |             "  CMake Warning at vendor/llama.cpp/CMakeLists.txt:125 (message):\n",
217 |             "    Git repository not found; to enable automatic generation of build info,\n",
218 |             "    make sure Git is installed and the project is a Git repository.\n",
219 |             "\n",
220 |             "\n",
221 |             "  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD\n",
222 |             "  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success\n",
223 |             "  -- Found Threads: TRUE\n",
224 |             "  -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version \"12.2.140\")\n",
225 |             "  -- cuBLAS found\n",
226 |             "  -- The CUDA compiler identification is NVIDIA 12.2.140\n",
227 |             "  -- Detecting CUDA compiler ABI info\n",
228 |             "  -- Detecting CUDA compiler ABI info - done\n",
229 |             "  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped\n",
230 |             "  -- Detecting CUDA compile features\n",
231 |             "  -- Detecting CUDA compile features - done\n",
232 |             "  -- Using CUDA architectures: 52;61;70\n",
233 |             "  -- CMAKE_SYSTEM_PROCESSOR: x86_64\n",
234 |             "  -- x86 detected\n",
235 |             "  CMake Warning (dev) at CMakeLists.txt:19 (install):\n",
236 |             "    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.\n",
237 |             "  This warning is for project developers.  Use -Wno-dev to suppress it.\n",
238 |             "\n",
239 |             "  CMake Warning (dev) at CMakeLists.txt:28 (install):\n",
240 |             "    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.\n",
241 |             "  This warning is for project developers.  Use -Wno-dev to suppress it.\n",
242 |             "\n",
243 |             "  -- Configuring done (3.0s)\n",
244 |             "  -- Generating done (0.0s)\n",
245 |             "  -- Build files have been written to: /tmp/tmp0a32c0oq/build\n",
246 |             "  *** Building project with Ninja...\n",
247 |             "  Change Dir: '/tmp/tmp0a32c0oq/build'\n",
248 |             "\n",
249 |             "  Run Build Command(s): /tmp/pip-build-env-77ltovlp/normal/local/lib/python3.10/dist-packages/ninja/data/bin/ninja -v\n",
250 |             "  [1/11] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -mf16c -mfma -mavx -mavx2 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -c /tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/ggml-alloc.c\n",
251 |             "  [2/11] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -mf16c -mfma -mavx -mavx2 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/k_quants.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/k_quants.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/k_quants.c.o -c /tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/k_quants.c\n",
252 |             "  [3/11] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Wno-unused-function -mf16c -mfma -mavx -mavx2 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/ggml.c\n",
253 |             "  [4/11] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/common/. -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -mf16c -mfma -mavx -mavx2 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/common/common.cpp\n",
254 |             "  [5/11] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/common/. -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -mf16c -mfma -mavx -mavx2 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/common/console.cpp\n",
255 |             "  [6/11] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -mf16c -mfma -mavx -mavx2 -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/llama.cpp\n",
256 |             "  [7/11] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/common/. -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -mf16c -mfma -mavx -mavx2 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/common/grammar-parser.cpp\n",
257 |             "  [8/11] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 \"--generate-code=arch=compute_52,code=[compute_52,sm_52]\" \"--generate-code=arch=compute_61,code=[compute_61,sm_61]\" \"--generate-code=arch=compute_70,code=[compute_70,sm_70]\" -Xcompiler=-fPIC -mf16c -mfma -mavx -mavx2 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o\n",
258 |             "  [9/11] : && /usr/bin/c++ -fPIC -O3 -DNDEBUG   -shared -Wl,-soname,libllama.so -o vendor/llama.cpp/libllama.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o vendor/llama.cpp/CMakeFiles/ggml.dir/k_quants.c.o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -L/usr/local/cuda/targets/x86_64-linux/lib -Wl,-rpath,/usr/local/cuda-12.2/targets/x86_64-linux/lib:  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libculibos.a  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl && :\n",
259 |             "  [10/11] : && /tmp/pip-build-env-77ltovlp/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/libggml_static.a && /usr/bin/ar qc vendor/llama.cpp/libggml_static.a  vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o vendor/llama.cpp/CMakeFiles/ggml.dir/k_quants.c.o && /usr/bin/ranlib vendor/llama.cpp/libggml_static.a && :\n",
260 |             "  [11/11] : && /usr/bin/g++ -fPIC  -shared -Wl,-soname,libggml_shared.so -o vendor/llama.cpp/libggml_shared.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o vendor/llama.cpp/CMakeFiles/ggml.dir/k_quants.c.o  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libculibos.a  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L\"/usr/local/cuda/targets/x86_64-linux/lib/stubs\" -L\"/usr/local/cuda/targets/x86_64-linux/lib\" && :\n",
261 |             "\n",
262 |             "  *** Installing project into wheel...\n",
263 |             "  -- Install configuration: \"Release\"\n",
264 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/lib/libggml_shared.so\n",
265 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/lib/cmake/Llama/LlamaConfig.cmake\n",
266 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/lib/cmake/Llama/LlamaConfigVersion.cmake\n",
267 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/include/ggml.h\n",
268 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/include/ggml-cuda.h\n",
269 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/include/k_quants.h\n",
270 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/lib/libllama.so\n",
271 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/tmp0a32c0oq/wheel/platlib/lib/libllama.so\" to \"\"\n",
272 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/include/llama.h\n",
273 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/bin/convert.py\n",
274 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/bin/convert-lora-to-ggml.py\n",
275 |             "  -- Installing: /tmp/tmp0a32c0oq/wheel/platlib/llama_cpp/libllama.so\n",
276 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/tmp0a32c0oq/wheel/platlib/llama_cpp/libllama.so\" to \"\"\n",
277 |             "  -- Installing: /tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/llama_cpp/libllama.so\n",
278 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/pip-install-dyvotvmv/llama-cpp-python_9ec4ea79d9674f8db922d78ab341a2ab/llama_cpp/libllama.so\" to \"\"\n",
279 |             "  *** Making wheel...\n",
280 |             "  *** Created llama_cpp_python-0.2.6-cp310-cp310-manylinux_2_35_x86_64.whl...\n",
281 |             "  Building wheel for llama-cpp-python (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
282 |             "  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.6-cp310-cp310-manylinux_2_35_x86_64.whl size=6192165 sha256=1774b011f68125ed03876b4c243e15e8a7f1dde3ee09dcaf6e6aeb4973425b48\n",
283 |             "  Stored in directory: /tmp/pip-ephem-wheel-cache-b_c8sr65/wheels/6c/ae/75/c2ad88ef0d1e219f981c51367b8533025345d1a14aa2f09662\n",
284 |             "Successfully built llama-cpp-python\n",
285 |             "Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python\n",
286 |             "  Attempting uninstall: typing-extensions\n",
287 |             "    Found existing installation: typing_extensions 4.9.0\n",
288 |             "    Uninstalling typing_extensions-4.9.0:\n",
289 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/__pycache__/typing_extensions.cpython-310.pyc\n",
290 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/typing_extensions-4.9.0.dist-info/\n",
291 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/typing_extensions.py\n",
292 |             "      Successfully uninstalled typing_extensions-4.9.0\n",
293 |             "  Attempting uninstall: numpy\n",
294 |             "    Found existing installation: numpy 1.26.2\n",
295 |             "    Uninstalling numpy-1.26.2:\n",
296 |             "      Removing file or directory /usr/local/bin/f2py\n",
297 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/numpy-1.26.2.dist-info/\n",
298 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/numpy.libs/\n",
299 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/numpy/\n",
300 |             "      Successfully uninstalled numpy-1.26.2\n",
301 |             "  changing mode of /usr/local/bin/f2py to 755\n",
302 |             "  Attempting uninstall: diskcache\n",
303 |             "    Found existing installation: diskcache 5.6.3\n",
304 |             "    Uninstalling diskcache-5.6.3:\n",
305 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/diskcache-5.6.3.dist-info/\n",
306 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/diskcache/\n",
307 |             "      Successfully uninstalled diskcache-5.6.3\n",
308 |             "  Attempting uninstall: llama-cpp-python\n",
309 |             "    Found existing installation: llama_cpp_python 0.2.6\n",
310 |             "    Uninstalling llama_cpp_python-0.2.6:\n",
311 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/bin/__pycache__/convert-lora-to-ggml.cpython-310.pyc\n",
312 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/bin/__pycache__/convert.cpython-310.pyc\n",
313 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/bin/convert-lora-to-ggml.py\n",
314 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/bin/convert.py\n",
315 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/include/\n",
316 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/lib/\n",
317 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/llama_cpp/\n",
318 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/llama_cpp_python-0.2.6.dist-info/\n",
319 |             "      Successfully uninstalled llama_cpp_python-0.2.6\n",
320 |             "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
321 |             "lida 0.0.10 requires kaleido, which is not installed.\n",
322 |             "lida 0.0.10 requires python-multipart, which is not installed.\n",
323 |             "llmx 0.0.15a0 requires cohere, which is not installed.\n",
324 |             "llmx 0.0.15a0 requires openai, which is not installed.\n",
325 |             "llmx 0.0.15a0 requires tiktoken, which is not installed.\n",
326 |             "tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.9.0 which is incompatible.\u001b[0m\u001b[31m\n",
327 |             "\u001b[0mSuccessfully installed diskcache-5.6.3 llama-cpp-python-0.2.6 numpy-1.26.2 typing-extensions-4.9.0\n"
328 |           ]
329 |         }
330 |       ],
331 |       "source": [
332 |         "# 安裝特定版本的 llama-cpp-python 套件，並啟用 CUDA 的 cuBLAS 功能。\n",
333 |         "# `--force-reinstall` 會強制重新安裝，`--upgrade` 會確保安裝最新版本，而 `--no-cache-dir` 會避免使用本地快取，`--verbose` 提供詳細的輸出。\n",
334 |         "!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.25 --force-reinstall --upgrade --no-cache-dir --verbose"
335 |       ]
336 |     },
337 |     {
338 |       "cell_type": "code",
339 |       "execution_count": 5,
340 |       "metadata": {
341 |         "colab": {
342 |           "base_uri": "https://localhost:8080/"
343 |         },
344 |         "id": "dRoUDXE666sk",
345 |         "outputId": "c769c39a-5989-4a7e-aae3-7591639832be"
346 |       },
347 |       "outputs": [
348 |         {
349 |           "output_type": "stream",
350 |           "name": "stdout",
351 |           "text": [
352 |             "Requirement already satisfied: huggingface_hub in /usr/local/lib/python3.10/dist-packages (0.19.4)\n",
353 |             "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (3.13.1)\n",
354 |             "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (2023.6.0)\n",
355 |             "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (2.31.0)\n",
356 |             "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (4.66.1)\n",
357 |             "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (6.0.1)\n",
358 |             "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (4.9.0)\n",
359 |             "Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (23.2)\n",
360 |             "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (3.3.2)\n",
361 |             "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (3.6)\n",
362 |             "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (2.0.7)\n",
363 |             "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (2023.11.17)\n"
364 |           ]
365 |         }
366 |       ],
367 |       "source": [
368 |         "# 安裝 huggingface_hub 套件，此套件可支援與 Hugging Face Model Hub 進行交互。\n",
369 |         "!pip install huggingface_hub"
370 |       ]
371 |     },
372 |     {
373 |       "cell_type": "markdown",
374 |       "metadata": {
375 |         "id": "k7ye1-jg7x8r"
376 |       },
377 |       "source": [
378 |         "## Step 2: Download GGUF Model and predict the result"
379 |       ]
380 |     },
381 |     {
382 |       "cell_type": "code",
383 |       "execution_count": 6,
384 |       "metadata": {
385 |         "id": "u4LMufzw6jUU"
386 |       },
387 |       "outputs": [],
388 |       "source": [
389 |         "import json\n",
390 |         "import logging\n",
391 |         "from huggingface_hub import hf_hub_download\n",
392 |         "from llama_cpp import Llama\n",
393 |         "\n",
394 |         "# 設定日誌\n",
395 |         "logging.basicConfig(level=logging.INFO,\n",
396 |         "                    format='%(asctime)s [%(levelname)s] %(message)s',\n",
397 |         "                    datefmt='%Y-%m-%d %H:%M:%S')\n",
398 |         "logger = logging.getLogger(__name__)\n",
399 |         "\n",
400 |         "class Model:\n",
401 |         "    def __init__(self):\n",
402 |         "        self.loaded = False        # 模型是否已經加載的標志\n",
403 |         "        self.lcpp_llm = None       # 儲存 Llama 模型的變數\n",
404 |         "        self.model_path = \"\"       # 模型的路徑\n",
405 |         "\n",
406 |         "    def load(self, model_name_or_path = \"TheBloke/Llama-2-13B-chat-GGUF\", model_basename = \"llama-2-13b-chat.Q5_0.gguf\"):\n",
407 |         "        # 從 Hugging Face Model Hub 下載模型並設定其路徑\n",
408 |         "        self.model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)\n",
409 |         "        logger.info(\"Finish: Load Llama 2 model.\")  # 輸出模型加載完成的信息\n",
410 |         "\n",
411 |         "    def predict(self, data):\n",
412 |         "        # 如果模型還沒有被加載，則加載模型\n",
413 |         "        if not self.loaded:\n",
414 |         "            self.loaded = True\n",
415 |         "            self.lcpp_llm = Llama(\n",
416 |         "                model_path=self.model_path,\n",
417 |         "                n_threads=2,             # 使用的執行緒數量\n",
418 |         "                n_batch=1024,            # 批次大小\n",
419 |         "                n_gpu_layers=32          # 使用的GPU層數\n",
420 |         "            )\n",
421 |         "        logger.info(\"========== Start ==========\")\n",
422 |         "        # 將 JSON 字符串反序列化成字典\n",
423 |         "        # data_dict = json.loads(data)\n",
424 |         "        # logger.info(\"Input: {}.\".format(data_dict))\n",
425 |         "        # 使用 Llama 模型進行預測\n",
426 |         "        response = self.lcpp_llm.create_chat_completion(**data)\n",
427 |         "        # logger.info(\"Response: {}.\".format(response))\n",
428 |         "        logger.info(\"==========  End  ==========\")\n",
429 |         "\n",
430 |         "        return response  # 返回模型的預測結果"
431 |       ]
432 |     },
433 |     {
434 |       "cell_type": "code",
435 |       "execution_count": 7,
436 |       "metadata": {
437 |         "id": "uiHZFZGf6kSd"
438 |       },
439 |       "outputs": [],
440 |       "source": [
441 |         "model_instance = Model()\n",
442 |         "model_instance.load(model_name_or_path = GGUF_HUGGINGFACE_REPO, model_basename = GGUF_HUGGINGFACE_BIN_FILE)"
443 |       ]
444 |     },
445 |     {
446 |       "cell_type": "code",
447 |       "source": [
448 |         "data = {\n",
449 |         "  \"messages\": [\n",
450 |         "    {\n",
451 |         "      \"role\": \"system\",\n",
452 |         "      \"content\": \"你是一個有幫助的問答機器人，請用繁體中文回覆。\"\n",
453 |         "    },\n",
454 |         "    {\n",
455 |         "      \"role\": \"user\",\n",
456 |         "      \"content\": \"台灣首都在哪裏？\"\n",
457 |         "    }\n",
458 |         "  ]\n",
459 |         "}"
460 |       ],
461 |       "metadata": {
462 |         "id": "XMAyC2ijkQgr"
463 |       },
464 |       "execution_count": 8,
465 |       "outputs": []
466 |     },
467 |     {
468 |       "cell_type": "code",
469 |       "source": [
470 |         "model_instance.predict(data)"
471 |       ],
472 |       "metadata": {
473 |         "colab": {
474 |           "base_uri": "https://localhost:8080/"
475 |         },
476 |         "id": "2NSVUPVDiy81",
477 |         "outputId": "2f22bdbf-e704-4670-e439-f27b87bf153a"
478 |       },
479 |       "execution_count": 9,
480 |       "outputs": [
481 |         {
482 |           "output_type": "stream",
483 |           "name": "stderr",
484 |           "text": [
485 |             "AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | \n"
486 |           ]
487 |         },
488 |         {
489 |           "output_type": "execute_result",
490 |           "data": {
491 |             "text/plain": [
492 |               "{'id': 'chatcmpl-bdca1b41-963b-4510-adf8-2666fe409b4c',\n",
493 |               " 'object': 'chat.completion',\n",
494 |               " 'created': 1703734844,\n",
495 |               " 'model': '/root/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.1-GGUF/snapshots/731a9fc8f06f5f5e2db8a0cf9d256197eb6e05d1/mistral-7b-instruct-v0.1.Q5_0.gguf',\n",
496 |               " 'choices': [{'index': 0,\n",
497 |               "   'message': {'role': 'assistant', 'content': '台灣首都位於台北市中心。'},\n",
498 |               "   'finish_reason': 'stop'}],\n",
499 |               " 'usage': {'prompt_tokens': 45, 'completion_tokens': 12, 'total_tokens': 57}}"
500 |             ]
501 |           },
502 |           "metadata": {},
503 |           "execution_count": 9
504 |         }
505 |       ]
506 |     },
507 |     {
508 |       "cell_type": "markdown",
509 |       "metadata": {
510 |         "id": "-W4IKvhb7-E4"
511 |       },
512 |       "source": [
513 |         "## Step 3: Build the fastapi service"
514 |       ]
515 |     },
516 |     {
517 |       "cell_type": "code",
518 |       "execution_count": 10,
519 |       "metadata": {
520 |         "id": "BoM5mJIe6kP7"
521 |       },
522 |       "outputs": [],
523 |       "source": [
524 |         "from fastapi import FastAPI\n",
525 |         "from fastapi.middleware.cors import CORSMiddleware\n",
526 |         "\n",
527 |         "# 初始化 FastAPI 應用\n",
528 |         "app = FastAPI()\n",
529 |         "\n",
530 |         "# 為 FastAPI 應用加入 CORS 中間件，允許跨域請求\n",
531 |         "app.add_middleware(\n",
532 |         "    CORSMiddleware,\n",
533 |         "    allow_origins=['*'],             # 允許所有來源的跨域請求\n",
534 |         "    allow_credentials=True,          # 允許憑證（例如 cookies、HTTP認證）的傳遞\n",
535 |         "    allow_methods=['*'],             # 允許所有的 HTTP 方法\n",
536 |         "    allow_headers=['*'],             # 允許所有的 HTTP 頭部\n",
537 |         ")\n",
538 |         "\n",
539 |         "@app.post(\"/predict\")                # 定義一個 POST 路由，用於模型預測\n",
540 |         "async def predict_text(json_input: dict):  # 接收一個字典格式的 JSON 輸入\n",
541 |         "    result = model_instance.predict(json_input)  # 使用模型實例進行預測\n",
542 |         "    return result                           # 返回預測結果\n"
543 |       ]
544 |     },
545 |     {
546 |       "cell_type": "markdown",
547 |       "metadata": {
548 |         "id": "bQ7WeIyg8C73"
549 |       },
550 |       "source": [
551 |         "## Step 4: Start the fastapi service"
552 |       ]
553 |     },
554 |     {
555 |       "cell_type": "code",
556 |       "execution_count": 11,
557 |       "metadata": {
558 |         "colab": {
559 |           "base_uri": "https://localhost:8080/"
560 |         },
561 |         "id": "sMC363JK82jq",
562 |         "outputId": "5d23fe8c-77ac-4131-e824-48075f10eedb"
563 |       },
564 |       "outputs": [
565 |         {
566 |           "output_type": "stream",
567 |           "name": "stderr",
568 |           "text": [
569 |             "INFO:     Started server process [4371]\n",
570 |             "INFO:     Waiting for application startup.\n",
571 |             "INFO:     Application startup complete.\n",
572 |             "INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)\n"
573 |           ]
574 |         },
575 |         {
576 |           "output_type": "stream",
577 |           "name": "stdout",
578 |           "text": [
579 |             "Public URL: https://d415-35-237-125-93.ngrok-free.app\n",
580 |             "You can use https://d415-35-237-125-93.ngrok-free.app/predict to get the assistant result.\n"
581 |           ]
582 |         },
583 |         {
584 |           "output_type": "stream",
585 |           "name": "stderr",
586 |           "text": [
587 |             "WARNING:pyngrok.process.ngrok:t=2023-12-28T03:41:06+0000 lvl=warn msg=\"Stopping forwarder\" name=http-8000-bbf6b256-55b9-40e3-a412-dce1a5e0ded2 acceptErr=\"failed to accept connection: Listener closed\"\n",
588 |             "INFO:     Shutting down\n",
589 |             "INFO:     Waiting for application shutdown.\n",
590 |             "INFO:     Application shutdown complete.\n",
591 |             "INFO:     Finished server process [4371]\n"
592 |           ]
593 |         }
594 |       ],
595 |       "source": [
596 |         "import nest_asyncio\n",
597 |         "from pyngrok import ngrok\n",
598 |         "import uvicorn\n",
599 |         "\n",
600 |         "# 設定 ngrok 的授權令牌\n",
601 |         "if NGROK_TOKEN is not None:\n",
602 |         "    ngrok.set_auth_token(NGROK_TOKEN)\n",
603 |         "\n",
604 |         "# 建立與 ngrok 的隧道，使外部可以訪問本地的 8000 端口\n",
605 |         "ngrok_tunnel = ngrok.connect(8000)\n",
606 |         "public_url = ngrok_tunnel.public_url\n",
607 |         "\n",
608 |         "print('Public URL:', public_url)  # 輸出公開的 URL\n",
609 |         "print(\"You can use {}/predict to get the assistant result.\".format(public_url))\n",
610 |         "\n",
611 |         "\n",
612 |         "# 使用 nest_asyncio 修正異步事件循環的問題\n",
613 |         "nest_asyncio.apply()\n",
614 |         "\n",
615 |         "# 啟動 uvicorn 伺服器，使 FastAPI 應用運行在 8000 端口\n",
616 |         "uvicorn.run(app, port=8000)\n"
617 |       ]
618 |     },
619 |     {
620 |       "cell_type": "markdown",
621 |       "metadata": {
622 |         "id": "PuWit7ePf2om"
623 |       },
624 |       "source": [
625 |         "### Example CURL command line:\n",
626 |         "\n",
627 |         "```bash\n",
628 |         "curl --location 'https://f1b8-35-184-42-82.ngrok-free.app/predict' \\\n",
629 |         "--header 'Content-Type: application/json' \\\n",
630 |         "--data '{\"prompt\": \"test\", \"max_tokens\": 2}'\n",
631 |         "```"
632 |       ]
633 |     }
634 |   ],
635 |   "metadata": {
636 |     "accelerator": "GPU",
637 |     "colab": {
638 |       "gpuType": "T4",
639 |       "provenance": []
640 |     },
641 |     "kernelspec": {
642 |       "display_name": "Python 3",
643 |       "name": "python3"
644 |     },
645 |     "language_info": {
646 |       "name": "python"
647 |     }
648 |   },
649 |   "nbformat": 4,
650 |   "nbformat_minor": 0
651 | }


--------------------------------------------------------------------------------
/Llama_2_FastAPI_Service_Colab_Example.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "cells": [
  3 |     {
  4 |       "cell_type": "markdown",
  5 |       "metadata": {
  6 |         "id": "VUC9Opfd8Mpt"
  7 |       },
  8 |       "source": [
  9 |         "# Llama 2 Fastapi Service Example"
 10 |       ]
 11 |     },
 12 |     {
 13 |       "cell_type": "markdown",
 14 |       "metadata": {
 15 |         "id": "e9e9QOWD-l7k"
 16 |       },
 17 |       "source": [
 18 |         "推薦至少使用 T4 GPU 來作為你的服務啟用"
 19 |       ]
 20 |     },
 21 |     {
 22 |       "cell_type": "markdown",
 23 |       "metadata": {
 24 |         "id": "h4zPTxsL7rUn"
 25 |       },
 26 |       "source": [
 27 |         "## Step 0: Config Setting"
 28 |       ]
 29 |     },
 30 |     {
 31 |       "cell_type": "code",
 32 |       "execution_count": 1,
 33 |       "metadata": {
 34 |         "id": "8kZ_1n8U85GQ"
 35 |       },
 36 |       "outputs": [],
 37 |       "source": [
 38 |         "# 如果你想要在 Google Colab 長期測試，你可以使用 ngrok 來做服務器代理的處理\n",
 39 |         "# 到官方網站註冊帳號：https://ngrok.com/\n",
 40 |         "# 申請token，你就可以將以下變數進行更換。\n",
 41 |         "NGROK_TOKEN = None"
 42 |       ]
 43 |     },
 44 |     {
 45 |       "cell_type": "code",
 46 |       "execution_count": 2,
 47 |       "metadata": {
 48 |         "id": "Vru9EFFb9lmp"
 49 |       },
 50 |       "outputs": [],
 51 |       "source": [
 52 |         "# GGML Model\n",
 53 |         "# 你可以到 HuggingFace 去找尋相關的 GGML 模型\n",
 54 |         "# Example:\n",
 55 |         "# Llama: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML\n",
 56 |         "# Taiwan Llama: https://huggingface.co/audreyt/Taiwan-LLaMa-v1.0-GGML\n",
 57 |         "GGML_HUGGINGFACE_REPO = \"audreyt/Taiwan-LLaMa-v1.0-GGML\"\n",
 58 |         "GGML_HUGGINGFACE_BIN_FILE = \"Taiwan-LLaMa-13b-1.0.ggmlv3.q5_1.bin\""
 59 |       ]
 60 |     },
 61 |     {
 62 |       "cell_type": "markdown",
 63 |       "metadata": {
 64 |         "id": "BXt_MjHG9Fw-"
 65 |       },
 66 |       "source": [
 67 |         "## Step 1: Install python package"
 68 |       ]
 69 |     },
 70 |     {
 71 |       "cell_type": "code",
 72 |       "execution_count": 3,
 73 |       "metadata": {
 74 |         "colab": {
 75 |           "base_uri": "https://localhost:8080/"
 76 |         },
 77 |         "id": "iV9rhIbg6K9d",
 78 |         "outputId": "14ed91c8-cc60-4ee4-b0a9-191d9a3a762f"
 79 |       },
 80 |       "outputs": [
 81 |         {
 82 |           "name": "stdout",
 83 |           "output_type": "stream",
 84 |           "text": [
 85 |             "Requirement already satisfied: fastapi in /usr/local/lib/python3.10/dist-packages (0.103.1)\n",
 86 |             "Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (1.5.7)\n",
 87 |             "Requirement already satisfied: pyngrok in /usr/local/lib/python3.10/dist-packages (6.0.0)\n",
 88 |             "Requirement already satisfied: uvicorn in /usr/local/lib/python3.10/dist-packages (0.23.2)\n",
 89 |             "Requirement already satisfied: accelerate in /usr/local/lib/python3.10/dist-packages (0.22.0)\n",
 90 |             "Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (4.33.1)\n",
 91 |             "Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from fastapi) (3.7.1)\n",
 92 |             "Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from fastapi) (1.10.12)\n",
 93 |             "Requirement already satisfied: starlette<0.28.0,>=0.27.0 in /usr/local/lib/python3.10/dist-packages (from fastapi) (0.27.0)\n",
 94 |             "Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.10/dist-packages (from fastapi) (4.7.1)\n",
 95 |             "Requirement already satisfied: PyYAML in /usr/local/lib/python3.10/dist-packages (from pyngrok) (6.0.1)\n",
 96 |             "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn) (8.1.7)\n",
 97 |             "Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.10/dist-packages (from uvicorn) (0.14.0)\n",
 98 |             "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate) (1.25.2)\n",
 99 |             "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (23.1)\n",
100 |             "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate) (5.9.5)\n",
101 |             "Requirement already satisfied: torch>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (2.0.1+cu118)\n",
102 |             "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.12.2)\n",
103 |             "Requirement already satisfied: huggingface-hub<1.0,>=0.15.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.16.4)\n",
104 |             "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (2023.6.3)\n",
105 |             "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers) (2.31.0)\n",
106 |             "Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.13.3)\n",
107 |             "Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.3.3)\n",
108 |             "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers) (4.66.1)\n",
109 |             "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<4.0.0,>=3.7.1->fastapi) (3.4)\n",
110 |             "Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio<4.0.0,>=3.7.1->fastapi) (1.3.0)\n",
111 |             "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<4.0.0,>=3.7.1->fastapi) (1.1.3)\n",
112 |             "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.15.1->transformers) (2023.6.0)\n",
113 |             "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (1.12)\n",
114 |             "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.1)\n",
115 |             "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.1.2)\n",
116 |             "Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (2.0.0)\n",
117 |             "Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.10.0->accelerate) (3.27.4.1)\n",
118 |             "Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.10.0->accelerate) (16.0.6)\n",
119 |             "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.2.0)\n",
120 |             "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2.0.4)\n",
121 |             "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2023.7.22)\n",
122 |             "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.10.0->accelerate) (2.1.3)\n",
123 |             "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.10.0->accelerate) (1.3.0)\n"
124 |           ]
125 |         }
126 |       ],
127 |       "source": [
128 |         "# 安裝 fastapi, nest-asyncio, pyngrok, uvicorn, accelerate 和 transformers 套件，以支援API開發和深度學習模型的操作。\n",
129 |         "!pip install fastapi nest-asyncio pyngrok uvicorn accelerate transformers"
130 |       ]
131 |     },
132 |     {
133 |       "cell_type": "code",
134 |       "execution_count": 4,
135 |       "metadata": {
136 |         "colab": {
137 |           "base_uri": "https://localhost:8080/"
138 |         },
139 |         "id": "4yK3DJgd66El",
140 |         "outputId": "02dc946a-23cb-49fa-dacd-067e21499b32"
141 |       },
142 |       "outputs": [
143 |         {
144 |           "name": "stdout",
145 |           "output_type": "stream",
146 |           "text": [
147 |             "Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)\n",
148 |             "Collecting llama-cpp-python==0.1.77\n",
149 |             "  Downloading llama_cpp_python-0.1.77.tar.gz (1.6 MB)\n",
150 |             "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m11.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
151 |             "\u001b[?25h  Running command pip subprocess to install build dependencies\n",
152 |             "  Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)\n",
153 |             "  Collecting setuptools>=42\n",
154 |             "    Using cached setuptools-68.2.0-py3-none-any.whl (807 kB)\n",
155 |             "  Collecting scikit-build>=0.13\n",
156 |             "    Using cached scikit_build-0.17.6-py3-none-any.whl (84 kB)\n",
157 |             "  Collecting cmake>=3.18\n",
158 |             "    Using cached cmake-3.27.4.1-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (26.1 MB)\n",
159 |             "  Collecting ninja\n",
160 |             "    Using cached ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)\n",
161 |             "  Collecting distro (from scikit-build>=0.13)\n",
162 |             "    Using cached distro-1.8.0-py3-none-any.whl (20 kB)\n",
163 |             "  Collecting packaging (from scikit-build>=0.13)\n",
164 |             "    Using cached packaging-23.1-py3-none-any.whl (48 kB)\n",
165 |             "  Collecting tomli (from scikit-build>=0.13)\n",
166 |             "    Using cached tomli-2.0.1-py3-none-any.whl (12 kB)\n",
167 |             "  Collecting wheel>=0.32.0 (from scikit-build>=0.13)\n",
168 |             "    Using cached wheel-0.41.2-py3-none-any.whl (64 kB)\n",
169 |             "  Installing collected packages: ninja, cmake, wheel, tomli, setuptools, packaging, distro, scikit-build\n",
170 |             "    Creating /tmp/pip-build-env-48s4me10/overlay/local/bin\n",
171 |             "    changing mode of /tmp/pip-build-env-48s4me10/overlay/local/bin/ninja to 755\n",
172 |             "    changing mode of /tmp/pip-build-env-48s4me10/overlay/local/bin/cmake to 755\n",
173 |             "    changing mode of /tmp/pip-build-env-48s4me10/overlay/local/bin/cpack to 755\n",
174 |             "    changing mode of /tmp/pip-build-env-48s4me10/overlay/local/bin/ctest to 755\n",
175 |             "    changing mode of /tmp/pip-build-env-48s4me10/overlay/local/bin/wheel to 755\n",
176 |             "    changing mode of /tmp/pip-build-env-48s4me10/overlay/local/bin/distro to 755\n",
177 |             "  ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
178 |             "  ipython 7.34.0 requires jedi>=0.16, which is not installed.\n",
179 |             "  numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.25.2 which is incompatible.\n",
180 |             "  tensorflow 2.13.0 requires numpy<=1.24.3,>=1.22, but you have numpy 1.25.2 which is incompatible.\n",
181 |             "  tensorflow 2.13.0 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.7.1 which is incompatible.\n",
182 |             "  Successfully installed cmake-3.27.4.1 distro-1.8.0 ninja-1.11.1 packaging-23.1 scikit-build-0.17.6 setuptools-68.2.0 tomli-2.0.1 wheel-0.41.2\n",
183 |             "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
184 |             "  Running command Getting requirements to build wheel\n",
185 |             "  running egg_info\n",
186 |             "  writing llama_cpp_python.egg-info/PKG-INFO\n",
187 |             "  writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt\n",
188 |             "  writing requirements to llama_cpp_python.egg-info/requires.txt\n",
189 |             "  writing top-level names to llama_cpp_python.egg-info/top_level.txt\n",
190 |             "  reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'\n",
191 |             "  adding license file 'LICENSE.md'\n",
192 |             "  writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'\n",
193 |             "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
194 |             "  Running command Preparing metadata (pyproject.toml)\n",
195 |             "  running dist_info\n",
196 |             "  creating /tmp/pip-modern-metadata-qm_wd844/llama_cpp_python.egg-info\n",
197 |             "  writing /tmp/pip-modern-metadata-qm_wd844/llama_cpp_python.egg-info/PKG-INFO\n",
198 |             "  writing dependency_links to /tmp/pip-modern-metadata-qm_wd844/llama_cpp_python.egg-info/dependency_links.txt\n",
199 |             "  writing requirements to /tmp/pip-modern-metadata-qm_wd844/llama_cpp_python.egg-info/requires.txt\n",
200 |             "  writing top-level names to /tmp/pip-modern-metadata-qm_wd844/llama_cpp_python.egg-info/top_level.txt\n",
201 |             "  writing manifest file '/tmp/pip-modern-metadata-qm_wd844/llama_cpp_python.egg-info/SOURCES.txt'\n",
202 |             "  reading manifest file '/tmp/pip-modern-metadata-qm_wd844/llama_cpp_python.egg-info/SOURCES.txt'\n",
203 |             "  adding license file 'LICENSE.md'\n",
204 |             "  writing manifest file '/tmp/pip-modern-metadata-qm_wd844/llama_cpp_python.egg-info/SOURCES.txt'\n",
205 |             "  creating '/tmp/pip-modern-metadata-qm_wd844/llama_cpp_python-0.1.77.dist-info'\n",
206 |             "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
207 |             "Collecting typing-extensions>=4.5.0 (from llama-cpp-python==0.1.77)\n",
208 |             "  Downloading typing_extensions-4.7.1-py3-none-any.whl (33 kB)\n",
209 |             "Collecting numpy>=1.20.0 (from llama-cpp-python==0.1.77)\n",
210 |             "  Downloading numpy-1.25.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)\n",
211 |             "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m18.2/18.2 MB\u001b[0m \u001b[31m70.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
212 |             "\u001b[?25hCollecting diskcache>=5.6.1 (from llama-cpp-python==0.1.77)\n",
213 |             "  Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)\n",
214 |             "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.5/45.5 kB\u001b[0m \u001b[31m173.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
215 |             "\u001b[?25hBuilding wheels for collected packages: llama-cpp-python\n",
216 |             "  Running command Building wheel for llama-cpp-python (pyproject.toml)\n",
217 |             "\n",
218 |             "\n",
219 |             "  --------------------------------------------------------------------------------\n",
220 |             "  -- Trying 'Ninja' generator\n",
221 |             "  --------------------------------\n",
222 |             "  ---------------------------\n",
223 |             "  ----------------------\n",
224 |             "  -----------------\n",
225 |             "  ------------\n",
226 |             "  -------\n",
227 |             "  --\n",
228 |             "  CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):\n",
229 |             "    Compatibility with CMake < 3.5 will be removed from a future version of\n",
230 |             "    CMake.\n",
231 |             "\n",
232 |             "    Update the VERSION argument <min> value or use a ...<max> suffix to tell\n",
233 |             "    CMake that the project does not need compatibility with older versions.\n",
234 |             "\n",
235 |             "  Not searching for unused variables given on the command line.\n",
236 |             "\n",
237 |             "  -- The C compiler identification is GNU 11.4.0\n",
238 |             "  -- Detecting C compiler ABI info\n",
239 |             "  -- Detecting C compiler ABI info - done\n",
240 |             "  -- Check for working C compiler: /usr/bin/cc - skipped\n",
241 |             "  -- Detecting C compile features\n",
242 |             "  -- Detecting C compile features - done\n",
243 |             "  -- The CXX compiler identification is GNU 11.4.0\n",
244 |             "  -- Detecting CXX compiler ABI info\n",
245 |             "  -- Detecting CXX compiler ABI info - done\n",
246 |             "  -- Check for working CXX compiler: /usr/bin/c++ - skipped\n",
247 |             "  -- Detecting CXX compile features\n",
248 |             "  -- Detecting CXX compile features - done\n",
249 |             "  -- Configuring done (0.9s)\n",
250 |             "  -- Generating done (0.0s)\n",
251 |             "  -- Build files have been written to: /tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_cmake_test_compile/build\n",
252 |             "  --\n",
253 |             "  -------\n",
254 |             "  ------------\n",
255 |             "  -----------------\n",
256 |             "  ----------------------\n",
257 |             "  ---------------------------\n",
258 |             "  --------------------------------\n",
259 |             "  -- Trying 'Ninja' generator - success\n",
260 |             "  --------------------------------------------------------------------------------\n",
261 |             "\n",
262 |             "  Configuring Project\n",
263 |             "    Working directory:\n",
264 |             "      /tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-build\n",
265 |             "    Command:\n",
266 |             "      /tmp/pip-build-env-48s4me10/overlay/local/lib/python3.10/dist-packages/cmake/data/bin/cmake /tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682 -G Ninja -DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-48s4me10/overlay/local/lib/python3.10/dist-packages/ninja/data/bin/ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-install -DPYTHON_VERSION_STRING:STRING=3.10.12 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/tmp/pip-build-env-48s4me10/overlay/local/lib/python3.10/dist-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/usr/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/usr/include/python3.10 -DPYTHON_LIBRARY:PATH=/usr/lib/x86_64-linux-gnu/libpython3.10.so -DPython_EXECUTABLE:PATH=/usr/bin/python3 -DPython_ROOT_DIR:PATH=/usr -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/usr/include/python3.10 -DPython3_EXECUTABLE:PATH=/usr/bin/python3 -DPython3_ROOT_DIR:PATH=/usr -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/usr/include/python3.10 -DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-48s4me10/overlay/local/lib/python3.10/dist-packages/ninja/data/bin/ninja -DLLAMA_CUBLAS=on -DCMAKE_BUILD_TYPE:STRING=Release -DLLAMA_CUBLAS=on\n",
267 |             "\n",
268 |             "  Not searching for unused variables given on the command line.\n",
269 |             "  -- The C compiler identification is GNU 11.4.0\n",
270 |             "  -- The CXX compiler identification is GNU 11.4.0\n",
271 |             "  -- Detecting C compiler ABI info\n",
272 |             "  -- Detecting C compiler ABI info - done\n",
273 |             "  -- Check for working C compiler: /usr/bin/cc - skipped\n",
274 |             "  -- Detecting C compile features\n",
275 |             "  -- Detecting C compile features - done\n",
276 |             "  -- Detecting CXX compiler ABI info\n",
277 |             "  -- Detecting CXX compiler ABI info - done\n",
278 |             "  -- Check for working CXX compiler: /usr/bin/c++ - skipped\n",
279 |             "  -- Detecting CXX compile features\n",
280 |             "  -- Detecting CXX compile features - done\n",
281 |             "  -- Found Git: /usr/bin/git (found version \"2.34.1\")\n",
282 |             "  fatal: not a git repository (or any of the parent directories): .git\n",
283 |             "  fatal: not a git repository (or any of the parent directories): .git\n",
284 |             "  CMake Warning at vendor/llama.cpp/CMakeLists.txt:116 (message):\n",
285 |             "    Git repository not found; to enable automatic generation of build info,\n",
286 |             "    make sure Git is installed and the project is a Git repository.\n",
287 |             "\n",
288 |             "\n",
289 |             "  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD\n",
290 |             "  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success\n",
291 |             "  -- Found Threads: TRUE\n",
292 |             "  -- Found CUDAToolkit: /usr/local/cuda/include (found version \"11.8.89\")\n",
293 |             "  -- cuBLAS found\n",
294 |             "  -- The CUDA compiler identification is NVIDIA 11.8.89\n",
295 |             "  -- Detecting CUDA compiler ABI info\n",
296 |             "  -- Detecting CUDA compiler ABI info - done\n",
297 |             "  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped\n",
298 |             "  -- Detecting CUDA compile features\n",
299 |             "  -- Detecting CUDA compile features - done\n",
300 |             "  -- Using CUDA architectures: 52;61\n",
301 |             "  -- CMAKE_SYSTEM_PROCESSOR: x86_64\n",
302 |             "  -- x86 detected\n",
303 |             "  -- Configuring done (4.3s)\n",
304 |             "  -- Generating done (0.0s)\n",
305 |             "  -- Build files have been written to: /tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-build\n",
306 |             "  [1/8] Building C object vendor/llama.cpp/CMakeFiles/ggml.dir/k_quants.c.o\n",
307 |             "  [2/8] Building CUDA object vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o\n",
308 |             "  [3/8] Building C object vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o\n",
309 |             "  [4/8] Linking CUDA shared library vendor/llama.cpp/libggml_shared.so\n",
310 |             "  [5/8] Linking CUDA static library vendor/llama.cpp/libggml_static.a\n",
311 |             "  [6/8] Building CXX object vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o\n",
312 |             "  [7/8] Linking CXX shared library vendor/llama.cpp/libllama.so\n",
313 |             "  [7/8] Install the project...\n",
314 |             "  -- Install configuration: \"Release\"\n",
315 |             "  -- Installing: /tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-install/lib/libggml_shared.so\n",
316 |             "  -- Installing: /tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-install/lib/libllama.so\n",
317 |             "  -- Set runtime path of \"/tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-install/lib/libllama.so\" to \"\"\n",
318 |             "  -- Installing: /tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-install/bin/convert.py\n",
319 |             "  -- Installing: /tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-install/bin/convert-lora-to-ggml.py\n",
320 |             "  -- Installing: /tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/libllama.so\n",
321 |             "  -- Set runtime path of \"/tmp/pip-install-7e36n2_7/llama-cpp-python_cb12d93090b04ba89682ace3a2944682/_skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/libllama.so\" to \"\"\n",
322 |             "\n",
323 |             "  copying llama_cpp/llama_types.py -> _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/llama_types.py\n",
324 |             "  copying llama_cpp/llama.py -> _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/llama.py\n",
325 |             "  copying llama_cpp/llama_cpp.py -> _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/llama_cpp.py\n",
326 |             "  copying llama_cpp/__init__.py -> _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/__init__.py\n",
327 |             "  creating directory _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/server\n",
328 |             "  copying llama_cpp/server/__main__.py -> _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/server/__main__.py\n",
329 |             "  copying llama_cpp/server/app.py -> _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/server/app.py\n",
330 |             "  copying llama_cpp/server/__init__.py -> _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/server/__init__.py\n",
331 |             "\n",
332 |             "  running bdist_wheel\n",
333 |             "  running build\n",
334 |             "  running build_py\n",
335 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310\n",
336 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp\n",
337 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/llama_types.py -> _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp\n",
338 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/llama.py -> _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp\n",
339 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/llama_cpp.py -> _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp\n",
340 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/__init__.py -> _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp\n",
341 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/server\n",
342 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/server/__main__.py -> _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/server\n",
343 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/server/app.py -> _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/server\n",
344 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/server/__init__.py -> _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/server\n",
345 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/llama_cpp/libllama.so -> _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp\n",
346 |             "  copied 7 files\n",
347 |             "  running build_ext\n",
348 |             "  installing to _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel\n",
349 |             "  running install\n",
350 |             "  running install_lib\n",
351 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64\n",
352 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel\n",
353 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp\n",
354 |             "  copying _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/libllama.so -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp\n",
355 |             "  copying _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/llama_types.py -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp\n",
356 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp/server\n",
357 |             "  copying _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/server/__main__.py -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp/server\n",
358 |             "  copying _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/server/app.py -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp/server\n",
359 |             "  copying _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/server/__init__.py -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp/server\n",
360 |             "  copying _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/llama.py -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp\n",
361 |             "  copying _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/llama_cpp.py -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp\n",
362 |             "  copying _skbuild/linux-x86_64-3.10/setuptools/lib.linux-x86_64-cpython-310/llama_cpp/__init__.py -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp\n",
363 |             "  copied 8 files\n",
364 |             "  running install_data\n",
365 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77.data\n",
366 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77.data/data\n",
367 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77.data/data/lib\n",
368 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/lib/libllama.so -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77.data/data/lib\n",
369 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/lib/libggml_shared.so -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77.data/data/lib\n",
370 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77.data/data/bin\n",
371 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/bin/convert.py -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77.data/data/bin\n",
372 |             "  copying _skbuild/linux-x86_64-3.10/cmake-install/bin/convert-lora-to-ggml.py -> _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77.data/data/bin\n",
373 |             "  running install_egg_info\n",
374 |             "  running egg_info\n",
375 |             "  writing llama_cpp_python.egg-info/PKG-INFO\n",
376 |             "  writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt\n",
377 |             "  writing requirements to llama_cpp_python.egg-info/requires.txt\n",
378 |             "  writing top-level names to llama_cpp_python.egg-info/top_level.txt\n",
379 |             "  reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'\n",
380 |             "  adding license file 'LICENSE.md'\n",
381 |             "  writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'\n",
382 |             "  Copying llama_cpp_python.egg-info to _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77-py3.10.egg-info\n",
383 |             "  running install_scripts\n",
384 |             "  copied 0 files\n",
385 |             "  creating _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel/llama_cpp_python-0.1.77.dist-info/WHEEL\n",
386 |             "  creating '/tmp/pip-wheel-fu7bsbze/.tmp-vfrdieph/llama_cpp_python-0.1.77-cp310-cp310-linux_x86_64.whl' and adding '_skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel' to it\n",
387 |             "  adding 'llama_cpp/__init__.py'\n",
388 |             "  adding 'llama_cpp/libllama.so'\n",
389 |             "  adding 'llama_cpp/llama.py'\n",
390 |             "  adding 'llama_cpp/llama_cpp.py'\n",
391 |             "  adding 'llama_cpp/llama_types.py'\n",
392 |             "  adding 'llama_cpp/server/__init__.py'\n",
393 |             "  adding 'llama_cpp/server/__main__.py'\n",
394 |             "  adding 'llama_cpp/server/app.py'\n",
395 |             "  adding 'llama_cpp_python-0.1.77.data/data/bin/convert-lora-to-ggml.py'\n",
396 |             "  adding 'llama_cpp_python-0.1.77.data/data/bin/convert.py'\n",
397 |             "  adding 'llama_cpp_python-0.1.77.data/data/lib/libggml_shared.so'\n",
398 |             "  adding 'llama_cpp_python-0.1.77.data/data/lib/libllama.so'\n",
399 |             "  adding 'llama_cpp_python-0.1.77.dist-info/LICENSE.md'\n",
400 |             "  adding 'llama_cpp_python-0.1.77.dist-info/METADATA'\n",
401 |             "  adding 'llama_cpp_python-0.1.77.dist-info/WHEEL'\n",
402 |             "  adding 'llama_cpp_python-0.1.77.dist-info/top_level.txt'\n",
403 |             "  adding 'llama_cpp_python-0.1.77.dist-info/RECORD'\n",
404 |             "  removing _skbuild/linux-x86_64-3.10/setuptools/bdist.linux-x86_64/wheel\n",
405 |             "  Building wheel for llama-cpp-python (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
406 |             "  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.1.77-cp310-cp310-linux_x86_64.whl size=1368844 sha256=47ac034b15811c29178985ef36f54386cc9292d179dec7ad6bb4787e069b790d\n",
407 |             "  Stored in directory: /tmp/pip-ephem-wheel-cache-nuedusa9/wheels/aa/ed/39/87f2ad350dbbf13b600ac744899186b8647c5323c62e2bb348\n",
408 |             "Successfully built llama-cpp-python\n",
409 |             "Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python\n",
410 |             "  Attempting uninstall: typing-extensions\n",
411 |             "    Found existing installation: typing_extensions 4.7.1\n",
412 |             "    Uninstalling typing_extensions-4.7.1:\n",
413 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/__pycache__/typing_extensions.cpython-310.pyc\n",
414 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/typing_extensions-4.7.1.dist-info/\n",
415 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/typing_extensions.py\n",
416 |             "      Successfully uninstalled typing_extensions-4.7.1\n",
417 |             "  Attempting uninstall: numpy\n",
418 |             "    Found existing installation: numpy 1.25.2\n",
419 |             "    Uninstalling numpy-1.25.2:\n",
420 |             "      Removing file or directory /usr/local/bin/f2py\n",
421 |             "      Removing file or directory /usr/local/bin/f2py3\n",
422 |             "      Removing file or directory /usr/local/bin/f2py3.10\n",
423 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/numpy-1.25.2.dist-info/\n",
424 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/numpy.libs/\n",
425 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/numpy/\n",
426 |             "      Successfully uninstalled numpy-1.25.2\n",
427 |             "  changing mode of /usr/local/bin/f2py to 755\n",
428 |             "  changing mode of /usr/local/bin/f2py3 to 755\n",
429 |             "  changing mode of /usr/local/bin/f2py3.10 to 755\n",
430 |             "  Attempting uninstall: diskcache\n",
431 |             "    Found existing installation: diskcache 5.6.3\n",
432 |             "    Uninstalling diskcache-5.6.3:\n",
433 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/diskcache-5.6.3.dist-info/\n",
434 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/diskcache/\n",
435 |             "      Successfully uninstalled diskcache-5.6.3\n",
436 |             "  Attempting uninstall: llama-cpp-python\n",
437 |             "    Found existing installation: llama-cpp-python 0.1.77\n",
438 |             "    Uninstalling llama-cpp-python-0.1.77:\n",
439 |             "      Removing file or directory /usr/local/bin/__pycache__/convert-lora-to-ggml.cpython-310.pyc\n",
440 |             "      Removing file or directory /usr/local/bin/__pycache__/convert.cpython-310.pyc\n",
441 |             "      Removing file or directory /usr/local/bin/convert-lora-to-ggml.py\n",
442 |             "      Removing file or directory /usr/local/bin/convert.py\n",
443 |             "      Removing file or directory /usr/local/lib/libggml_shared.so\n",
444 |             "      Removing file or directory /usr/local/lib/libllama.so\n",
445 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/llama_cpp/\n",
446 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/llama_cpp_python-0.1.77.dist-info/\n",
447 |             "      Successfully uninstalled llama-cpp-python-0.1.77\n",
448 |             "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
449 |             "numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.25.2 which is incompatible.\n",
450 |             "tensorflow 2.13.0 requires numpy<=1.24.3,>=1.22, but you have numpy 1.25.2 which is incompatible.\n",
451 |             "tensorflow 2.13.0 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.7.1 which is incompatible.\u001b[0m\u001b[31m\n",
452 |             "\u001b[0mSuccessfully installed diskcache-5.6.3 llama-cpp-python-0.1.77 numpy-1.25.2 typing-extensions-4.7.1\n"
453 |           ]
454 |         }
455 |       ],
456 |       "source": [
457 |         "# 安裝特定版本的 llama-cpp-python 套件，並啟用 CUDA 的 cuBLAS 功能。\n",
458 |         "# `--force-reinstall` 會強制重新安裝，`--upgrade` 會確保安裝最新版本，而 `--no-cache-dir` 會避免使用本地快取，`--verbose` 提供詳細的輸出。\n",
459 |         "!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.77 --force-reinstall --upgrade --no-cache-dir --verbose"
460 |       ]
461 |     },
462 |     {
463 |       "cell_type": "code",
464 |       "execution_count": 5,
465 |       "metadata": {
466 |         "colab": {
467 |           "base_uri": "https://localhost:8080/"
468 |         },
469 |         "id": "dRoUDXE666sk",
470 |         "outputId": "a0d532e9-2f64-470c-94a7-d3d9f416501a"
471 |       },
472 |       "outputs": [
473 |         {
474 |           "name": "stdout",
475 |           "output_type": "stream",
476 |           "text": [
477 |             "Requirement already satisfied: huggingface_hub in /usr/local/lib/python3.10/dist-packages (0.16.4)\n",
478 |             "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (3.12.2)\n",
479 |             "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (2023.6.0)\n",
480 |             "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (2.31.0)\n",
481 |             "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (4.66.1)\n",
482 |             "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (6.0.1)\n",
483 |             "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (4.7.1)\n",
484 |             "Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (23.1)\n",
485 |             "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (3.2.0)\n",
486 |             "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (3.4)\n",
487 |             "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (2.0.4)\n",
488 |             "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (2023.7.22)\n"
489 |           ]
490 |         }
491 |       ],
492 |       "source": [
493 |         "# 安裝 huggingface_hub 套件，此套件可支援與 Hugging Face Model Hub 進行交互。\n",
494 |         "!pip install huggingface_hub"
495 |       ]
496 |     },
497 |     {
498 |       "cell_type": "markdown",
499 |       "metadata": {
500 |         "id": "k7ye1-jg7x8r"
501 |       },
502 |       "source": [
503 |         "## Step 2: Download GGML Model and predict the result"
504 |       ]
505 |     },
506 |     {
507 |       "cell_type": "code",
508 |       "execution_count": 6,
509 |       "metadata": {
510 |         "id": "u4LMufzw6jUU"
511 |       },
512 |       "outputs": [],
513 |       "source": [
514 |         "import json\n",
515 |         "import logging\n",
516 |         "from huggingface_hub import hf_hub_download\n",
517 |         "from llama_cpp import Llama\n",
518 |         "\n",
519 |         "# 設定日誌\n",
520 |         "logging.basicConfig(level=logging.INFO,\n",
521 |         "                    format='%(asctime)s [%(levelname)s] %(message)s',\n",
522 |         "                    datefmt='%Y-%m-%d %H:%M:%S')\n",
523 |         "logger = logging.getLogger(__name__)\n",
524 |         "\n",
525 |         "class Model:\n",
526 |         "    def __init__(self):\n",
527 |         "        self.loaded = False        # 模型是否已經加載的標志\n",
528 |         "        self.lcpp_llm = None       # 儲存 Llama 模型的變數\n",
529 |         "        self.model_path = \"\"       # 模型的路徑\n",
530 |         "\n",
531 |         "    def load(self, model_name_or_path = \"TheBloke/Llama-2-13B-chat-GGML\", model_basename = \"llama-2-13b-chat.ggmlv3.q5_1.bin\"):\n",
532 |         "        # 從 Hugging Face Model Hub 下載模型並設定其路徑\n",
533 |         "        self.model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)\n",
534 |         "        logger.info(\"Finish: Load Llama 2 model.\")  # 輸出模型加載完成的信息\n",
535 |         "\n",
536 |         "    def predict(self, data):\n",
537 |         "        # 如果模型還沒有被加載，則加載模型\n",
538 |         "        if not self.loaded:\n",
539 |         "            self.loaded = True\n",
540 |         "            self.lcpp_llm = Llama(\n",
541 |         "                model_path=self.model_path,\n",
542 |         "                n_threads=2,             # 使用的執行緒數量\n",
543 |         "                n_batch=1024,            # 批次大小\n",
544 |         "                n_gpu_layers=32          # 使用的GPU層數\n",
545 |         "            )\n",
546 |         "        logger.info(\"========== Start ==========\")\n",
547 |         "        # 將 JSON 字符串反序列化成字典\n",
548 |         "        data_dict = json.loads(data)\n",
549 |         "        logger.info(\"Input: {}.\".format(data_dict))\n",
550 |         "        # 使用 Llama 模型進行預測\n",
551 |         "        response = self.lcpp_llm(prompt=data_dict['prompt'], max_tokens=data_dict['max_tokens'], temperature=0.5, top_p=0.95, repeat_penalty=1.2, top_k=150, echo=True)\n",
552 |         "        logger.info(\"Response: {}.\".format(response))\n",
553 |         "        logger.info(\"==========  End  ==========\")\n",
554 |         "\n",
555 |         "        return {\"answer\": response[\"choices\"][0][\"text\"]}  # 返回模型的預測結果"
556 |       ]
557 |     },
558 |     {
559 |       "cell_type": "code",
560 |       "execution_count": 7,
561 |       "metadata": {
562 |         "colab": {
563 |           "base_uri": "https://localhost:8080/"
564 |         },
565 |         "id": "uiHZFZGf6kSd",
566 |         "outputId": "08c5abec-4bbf-4022-9899-c7dc006aabaa"
567 |       },
568 |       "outputs": [
569 |         {
570 |           "name": "stderr",
571 |           "output_type": "stream",
572 |           "text": [
573 |             "AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | \n"
574 |           ]
575 |         },
576 |         {
577 |           "data": {
578 |             "text/plain": [
579 |               "{'answer': 'test 1'}"
580 |             ]
581 |           },
582 |           "execution_count": 7,
583 |           "metadata": {},
584 |           "output_type": "execute_result"
585 |         }
586 |       ],
587 |       "source": [
588 |         "model_instance = Model()\n",
589 |         "model_instance.load(model_name_or_path = GGML_HUGGINGFACE_REPO, model_basename = GGML_HUGGINGFACE_BIN_FILE)\n",
590 |         "model_instance.predict(json.dumps({\"prompt\": \"test\", \"max_tokens\": 2}))"
591 |       ]
592 |     },
593 |     {
594 |       "cell_type": "markdown",
595 |       "metadata": {
596 |         "id": "-W4IKvhb7-E4"
597 |       },
598 |       "source": [
599 |         "## Step 3: Build the fastapi service"
600 |       ]
601 |     },
602 |     {
603 |       "cell_type": "code",
604 |       "execution_count": 8,
605 |       "metadata": {
606 |         "id": "BoM5mJIe6kP7"
607 |       },
608 |       "outputs": [],
609 |       "source": [
610 |         "from fastapi import FastAPI\n",
611 |         "from fastapi.middleware.cors import CORSMiddleware\n",
612 |         "\n",
613 |         "# 初始化 FastAPI 應用\n",
614 |         "app = FastAPI()\n",
615 |         "\n",
616 |         "# 為 FastAPI 應用加入 CORS 中間件，允許跨域請求\n",
617 |         "app.add_middleware(\n",
618 |         "    CORSMiddleware,\n",
619 |         "    allow_origins=['*'],             # 允許所有來源的跨域請求\n",
620 |         "    allow_credentials=True,          # 允許憑證（例如 cookies、HTTP認證）的傳遞\n",
621 |         "    allow_methods=['*'],             # 允許所有的 HTTP 方法\n",
622 |         "    allow_headers=['*'],             # 允許所有的 HTTP 頭部\n",
623 |         ")\n",
624 |         "\n",
625 |         "@app.post(\"/predict\")                # 定義一個 POST 路由，用於模型預測\n",
626 |         "async def predict_text(json_input: dict):  # 接收一個字典格式的 JSON 輸入\n",
627 |         "    result = model_instance.predict(json.dumps(json_input))  # 使用模型實例進行預測\n",
628 |         "    return result                           # 返回預測結果\n"
629 |       ]
630 |     },
631 |     {
632 |       "cell_type": "markdown",
633 |       "metadata": {
634 |         "id": "bQ7WeIyg8C73"
635 |       },
636 |       "source": [
637 |         "## Step 4: Start the fastapi service"
638 |       ]
639 |     },
640 |     {
641 |       "cell_type": "code",
642 |       "execution_count": null,
643 |       "metadata": {
644 |         "colab": {
645 |           "base_uri": "https://localhost:8080/"
646 |         },
647 |         "id": "sMC363JK82jq",
648 |         "outputId": "5ad9e348-8445-4e58-ea28-a861ce64c446"
649 |       },
650 |       "outputs": [
651 |         {
652 |           "name": "stderr",
653 |           "output_type": "stream",
654 |           "text": [
655 |             "WARNING:pyngrok.process.ngrok:t=2023-09-10T03:10:08+0000 lvl=warn msg=\"ngrok config file found at legacy location, move to XDG location\" xdg_path=/root/.config/ngrok/ngrok.yml legacy_path=/root/.ngrok2/ngrok.yml\n",
656 |             "INFO:     Started server process [3121]\n",
657 |             "INFO:     Waiting for application startup.\n",
658 |             "INFO:     Application startup complete.\n",
659 |             "INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)\n"
660 |           ]
661 |         },
662 |         {
663 |           "name": "stdout",
664 |           "output_type": "stream",
665 |           "text": [
666 |             "Public URL: https://f1b8-35-184-42-82.ngrok-free.app\n",
667 |             "You can use https://f1b8-35-184-42-82.ngrok-free.app/predict to get the assistant result.\n"
668 |           ]
669 |         },
670 |         {
671 |           "name": "stderr",
672 |           "output_type": "stream",
673 |           "text": [
674 |             "Llama.generate: prefix-match hit\n"
675 |           ]
676 |         },
677 |         {
678 |           "name": "stdout",
679 |           "output_type": "stream",
680 |           "text": [
681 |             "INFO:     112.104.26.9:0 - \"POST /predict HTTP/1.1\" 200 OK\n"
682 |           ]
683 |         }
684 |       ],
685 |       "source": [
686 |         "import nest_asyncio\n",
687 |         "from pyngrok import ngrok\n",
688 |         "import uvicorn\n",
689 |         "\n",
690 |         "# 設定 ngrok 的授權令牌\n",
691 |         "if NGROK_TOKEN is not None:\n",
692 |         "    ngrok.set_auth_token(NGROK_TOKEN)\n",
693 |         "\n",
694 |         "# 建立與 ngrok 的隧道，使外部可以訪問本地的 8000 端口\n",
695 |         "ngrok_tunnel = ngrok.connect(8000)\n",
696 |         "public_url = ngrok_tunnel.public_url\n",
697 |         "\n",
698 |         "print('Public URL:', public_url)  # 輸出公開的 URL\n",
699 |         "print(\"You can use {}/predict to get the assistant result.\".format(public_url))\n",
700 |         "\n",
701 |         "\n",
702 |         "# 使用 nest_asyncio 修正異步事件循環的問題\n",
703 |         "nest_asyncio.apply()\n",
704 |         "\n",
705 |         "# 啟動 uvicorn 伺服器，使 FastAPI 應用運行在 8000 端口\n",
706 |         "uvicorn.run(app, port=8000)\n"
707 |       ]
708 |     },
709 |     {
710 |       "cell_type": "markdown",
711 |       "metadata": {},
712 |       "source": [
713 |         "### Example CURL command line:\n",
714 |         "\n",
715 |         "```bash\n",
716 |         "curl --location 'https://f1b8-35-184-42-82.ngrok-free.app/predict' \\\n",
717 |         "--header 'Content-Type: application/json' \\\n",
718 |         "--data '{\"prompt\": \"test\", \"max_tokens\": 2}'\n",
719 |         "```"
720 |       ]
721 |     }
722 |   ],
723 |   "metadata": {
724 |     "accelerator": "GPU",
725 |     "colab": {
726 |       "gpuType": "T4",
727 |       "provenance": []
728 |     },
729 |     "kernelspec": {
730 |       "display_name": "Python 3",
731 |       "name": "python3"
732 |     },
733 |     "language_info": {
734 |       "name": "python"
735 |     }
736 |   },
737 |   "nbformat": 4,
738 |   "nbformat_minor": 0
739 | }
740 | 


--------------------------------------------------------------------------------
/demo-1-llama_gguf_prediction.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "provenance": [],
  7 |       "gpuType": "T4"
  8 |     },
  9 |     "kernelspec": {
 10 |       "name": "python3",
 11 |       "display_name": "Python 3"
 12 |     },
 13 |     "language_info": {
 14 |       "name": "python"
 15 |     },
 16 |     "accelerator": "GPU",
 17 |     "widgets": {
 18 |       "application/vnd.jupyter.widget-state+json": {
 19 |         "4bb480c16b9a41bfa0227ec094e1415c": {
 20 |           "model_module": "@jupyter-widgets/controls",
 21 |           "model_name": "HBoxModel",
 22 |           "model_module_version": "1.5.0",
 23 |           "state": {
 24 |             "_dom_classes": [],
 25 |             "_model_module": "@jupyter-widgets/controls",
 26 |             "_model_module_version": "1.5.0",
 27 |             "_model_name": "HBoxModel",
 28 |             "_view_count": null,
 29 |             "_view_module": "@jupyter-widgets/controls",
 30 |             "_view_module_version": "1.5.0",
 31 |             "_view_name": "HBoxView",
 32 |             "box_style": "",
 33 |             "children": [
 34 |               "IPY_MODEL_f145b14de79340328e6ae0c561deea89",
 35 |               "IPY_MODEL_5131d58c260842ccbbf663b493a34c81",
 36 |               "IPY_MODEL_b1d81c8ad8bd40358f5f9b0f731d14df"
 37 |             ],
 38 |             "layout": "IPY_MODEL_e5c6918b22af4407ad5a4498342ed8f0"
 39 |           }
 40 |         },
 41 |         "f145b14de79340328e6ae0c561deea89": {
 42 |           "model_module": "@jupyter-widgets/controls",
 43 |           "model_name": "HTMLModel",
 44 |           "model_module_version": "1.5.0",
 45 |           "state": {
 46 |             "_dom_classes": [],
 47 |             "_model_module": "@jupyter-widgets/controls",
 48 |             "_model_module_version": "1.5.0",
 49 |             "_model_name": "HTMLModel",
 50 |             "_view_count": null,
 51 |             "_view_module": "@jupyter-widgets/controls",
 52 |             "_view_module_version": "1.5.0",
 53 |             "_view_name": "HTMLView",
 54 |             "description": "",
 55 |             "description_tooltip": null,
 56 |             "layout": "IPY_MODEL_56a77d51814a4d418ed6b8c148808ac6",
 57 |             "placeholder": "​",
 58 |             "style": "IPY_MODEL_2c7ad5881e684b9894c6e6a6f573fd22",
 59 |             "value": "Taiwan-LLM-7B-v2.1-chat-Q5_1.gguf: 100%"
 60 |           }
 61 |         },
 62 |         "5131d58c260842ccbbf663b493a34c81": {
 63 |           "model_module": "@jupyter-widgets/controls",
 64 |           "model_name": "FloatProgressModel",
 65 |           "model_module_version": "1.5.0",
 66 |           "state": {
 67 |             "_dom_classes": [],
 68 |             "_model_module": "@jupyter-widgets/controls",
 69 |             "_model_module_version": "1.5.0",
 70 |             "_model_name": "FloatProgressModel",
 71 |             "_view_count": null,
 72 |             "_view_module": "@jupyter-widgets/controls",
 73 |             "_view_module_version": "1.5.0",
 74 |             "_view_name": "ProgressView",
 75 |             "bar_style": "success",
 76 |             "description": "",
 77 |             "description_tooltip": null,
 78 |             "layout": "IPY_MODEL_a6b32ef03b5344e6bc9b3cb76647dba3",
 79 |             "max": 5064634208,
 80 |             "min": 0,
 81 |             "orientation": "horizontal",
 82 |             "style": "IPY_MODEL_15a5e9800661418c85f7edba054fa0bc",
 83 |             "value": 5064634208
 84 |           }
 85 |         },
 86 |         "b1d81c8ad8bd40358f5f9b0f731d14df": {
 87 |           "model_module": "@jupyter-widgets/controls",
 88 |           "model_name": "HTMLModel",
 89 |           "model_module_version": "1.5.0",
 90 |           "state": {
 91 |             "_dom_classes": [],
 92 |             "_model_module": "@jupyter-widgets/controls",
 93 |             "_model_module_version": "1.5.0",
 94 |             "_model_name": "HTMLModel",
 95 |             "_view_count": null,
 96 |             "_view_module": "@jupyter-widgets/controls",
 97 |             "_view_module_version": "1.5.0",
 98 |             "_view_name": "HTMLView",
 99 |             "description": "",
100 |             "description_tooltip": null,
101 |             "layout": "IPY_MODEL_77b4ce39574449e2a6d635bb536d943f",
102 |             "placeholder": "​",
103 |             "style": "IPY_MODEL_33dba6865d7b4d38b0c6fa09f8a443e8",
104 |             "value": " 5.06G/5.06G [01:00&lt;00:00, 85.1MB/s]"
105 |           }
106 |         },
107 |         "e5c6918b22af4407ad5a4498342ed8f0": {
108 |           "model_module": "@jupyter-widgets/base",
109 |           "model_name": "LayoutModel",
110 |           "model_module_version": "1.2.0",
111 |           "state": {
112 |             "_model_module": "@jupyter-widgets/base",
113 |             "_model_module_version": "1.2.0",
114 |             "_model_name": "LayoutModel",
115 |             "_view_count": null,
116 |             "_view_module": "@jupyter-widgets/base",
117 |             "_view_module_version": "1.2.0",
118 |             "_view_name": "LayoutView",
119 |             "align_content": null,
120 |             "align_items": null,
121 |             "align_self": null,
122 |             "border": null,
123 |             "bottom": null,
124 |             "display": null,
125 |             "flex": null,
126 |             "flex_flow": null,
127 |             "grid_area": null,
128 |             "grid_auto_columns": null,
129 |             "grid_auto_flow": null,
130 |             "grid_auto_rows": null,
131 |             "grid_column": null,
132 |             "grid_gap": null,
133 |             "grid_row": null,
134 |             "grid_template_areas": null,
135 |             "grid_template_columns": null,
136 |             "grid_template_rows": null,
137 |             "height": null,
138 |             "justify_content": null,
139 |             "justify_items": null,
140 |             "left": null,
141 |             "margin": null,
142 |             "max_height": null,
143 |             "max_width": null,
144 |             "min_height": null,
145 |             "min_width": null,
146 |             "object_fit": null,
147 |             "object_position": null,
148 |             "order": null,
149 |             "overflow": null,
150 |             "overflow_x": null,
151 |             "overflow_y": null,
152 |             "padding": null,
153 |             "right": null,
154 |             "top": null,
155 |             "visibility": null,
156 |             "width": null
157 |           }
158 |         },
159 |         "56a77d51814a4d418ed6b8c148808ac6": {
160 |           "model_module": "@jupyter-widgets/base",
161 |           "model_name": "LayoutModel",
162 |           "model_module_version": "1.2.0",
163 |           "state": {
164 |             "_model_module": "@jupyter-widgets/base",
165 |             "_model_module_version": "1.2.0",
166 |             "_model_name": "LayoutModel",
167 |             "_view_count": null,
168 |             "_view_module": "@jupyter-widgets/base",
169 |             "_view_module_version": "1.2.0",
170 |             "_view_name": "LayoutView",
171 |             "align_content": null,
172 |             "align_items": null,
173 |             "align_self": null,
174 |             "border": null,
175 |             "bottom": null,
176 |             "display": null,
177 |             "flex": null,
178 |             "flex_flow": null,
179 |             "grid_area": null,
180 |             "grid_auto_columns": null,
181 |             "grid_auto_flow": null,
182 |             "grid_auto_rows": null,
183 |             "grid_column": null,
184 |             "grid_gap": null,
185 |             "grid_row": null,
186 |             "grid_template_areas": null,
187 |             "grid_template_columns": null,
188 |             "grid_template_rows": null,
189 |             "height": null,
190 |             "justify_content": null,
191 |             "justify_items": null,
192 |             "left": null,
193 |             "margin": null,
194 |             "max_height": null,
195 |             "max_width": null,
196 |             "min_height": null,
197 |             "min_width": null,
198 |             "object_fit": null,
199 |             "object_position": null,
200 |             "order": null,
201 |             "overflow": null,
202 |             "overflow_x": null,
203 |             "overflow_y": null,
204 |             "padding": null,
205 |             "right": null,
206 |             "top": null,
207 |             "visibility": null,
208 |             "width": null
209 |           }
210 |         },
211 |         "2c7ad5881e684b9894c6e6a6f573fd22": {
212 |           "model_module": "@jupyter-widgets/controls",
213 |           "model_name": "DescriptionStyleModel",
214 |           "model_module_version": "1.5.0",
215 |           "state": {
216 |             "_model_module": "@jupyter-widgets/controls",
217 |             "_model_module_version": "1.5.0",
218 |             "_model_name": "DescriptionStyleModel",
219 |             "_view_count": null,
220 |             "_view_module": "@jupyter-widgets/base",
221 |             "_view_module_version": "1.2.0",
222 |             "_view_name": "StyleView",
223 |             "description_width": ""
224 |           }
225 |         },
226 |         "a6b32ef03b5344e6bc9b3cb76647dba3": {
227 |           "model_module": "@jupyter-widgets/base",
228 |           "model_name": "LayoutModel",
229 |           "model_module_version": "1.2.0",
230 |           "state": {
231 |             "_model_module": "@jupyter-widgets/base",
232 |             "_model_module_version": "1.2.0",
233 |             "_model_name": "LayoutModel",
234 |             "_view_count": null,
235 |             "_view_module": "@jupyter-widgets/base",
236 |             "_view_module_version": "1.2.0",
237 |             "_view_name": "LayoutView",
238 |             "align_content": null,
239 |             "align_items": null,
240 |             "align_self": null,
241 |             "border": null,
242 |             "bottom": null,
243 |             "display": null,
244 |             "flex": null,
245 |             "flex_flow": null,
246 |             "grid_area": null,
247 |             "grid_auto_columns": null,
248 |             "grid_auto_flow": null,
249 |             "grid_auto_rows": null,
250 |             "grid_column": null,
251 |             "grid_gap": null,
252 |             "grid_row": null,
253 |             "grid_template_areas": null,
254 |             "grid_template_columns": null,
255 |             "grid_template_rows": null,
256 |             "height": null,
257 |             "justify_content": null,
258 |             "justify_items": null,
259 |             "left": null,
260 |             "margin": null,
261 |             "max_height": null,
262 |             "max_width": null,
263 |             "min_height": null,
264 |             "min_width": null,
265 |             "object_fit": null,
266 |             "object_position": null,
267 |             "order": null,
268 |             "overflow": null,
269 |             "overflow_x": null,
270 |             "overflow_y": null,
271 |             "padding": null,
272 |             "right": null,
273 |             "top": null,
274 |             "visibility": null,
275 |             "width": null
276 |           }
277 |         },
278 |         "15a5e9800661418c85f7edba054fa0bc": {
279 |           "model_module": "@jupyter-widgets/controls",
280 |           "model_name": "ProgressStyleModel",
281 |           "model_module_version": "1.5.0",
282 |           "state": {
283 |             "_model_module": "@jupyter-widgets/controls",
284 |             "_model_module_version": "1.5.0",
285 |             "_model_name": "ProgressStyleModel",
286 |             "_view_count": null,
287 |             "_view_module": "@jupyter-widgets/base",
288 |             "_view_module_version": "1.2.0",
289 |             "_view_name": "StyleView",
290 |             "bar_color": null,
291 |             "description_width": ""
292 |           }
293 |         },
294 |         "77b4ce39574449e2a6d635bb536d943f": {
295 |           "model_module": "@jupyter-widgets/base",
296 |           "model_name": "LayoutModel",
297 |           "model_module_version": "1.2.0",
298 |           "state": {
299 |             "_model_module": "@jupyter-widgets/base",
300 |             "_model_module_version": "1.2.0",
301 |             "_model_name": "LayoutModel",
302 |             "_view_count": null,
303 |             "_view_module": "@jupyter-widgets/base",
304 |             "_view_module_version": "1.2.0",
305 |             "_view_name": "LayoutView",
306 |             "align_content": null,
307 |             "align_items": null,
308 |             "align_self": null,
309 |             "border": null,
310 |             "bottom": null,
311 |             "display": null,
312 |             "flex": null,
313 |             "flex_flow": null,
314 |             "grid_area": null,
315 |             "grid_auto_columns": null,
316 |             "grid_auto_flow": null,
317 |             "grid_auto_rows": null,
318 |             "grid_column": null,
319 |             "grid_gap": null,
320 |             "grid_row": null,
321 |             "grid_template_areas": null,
322 |             "grid_template_columns": null,
323 |             "grid_template_rows": null,
324 |             "height": null,
325 |             "justify_content": null,
326 |             "justify_items": null,
327 |             "left": null,
328 |             "margin": null,
329 |             "max_height": null,
330 |             "max_width": null,
331 |             "min_height": null,
332 |             "min_width": null,
333 |             "object_fit": null,
334 |             "object_position": null,
335 |             "order": null,
336 |             "overflow": null,
337 |             "overflow_x": null,
338 |             "overflow_y": null,
339 |             "padding": null,
340 |             "right": null,
341 |             "top": null,
342 |             "visibility": null,
343 |             "width": null
344 |           }
345 |         },
346 |         "33dba6865d7b4d38b0c6fa09f8a443e8": {
347 |           "model_module": "@jupyter-widgets/controls",
348 |           "model_name": "DescriptionStyleModel",
349 |           "model_module_version": "1.5.0",
350 |           "state": {
351 |             "_model_module": "@jupyter-widgets/controls",
352 |             "_model_module_version": "1.5.0",
353 |             "_model_name": "DescriptionStyleModel",
354 |             "_view_count": null,
355 |             "_view_module": "@jupyter-widgets/base",
356 |             "_view_module_version": "1.2.0",
357 |             "_view_name": "StyleView",
358 |             "description_width": ""
359 |           }
360 |         }
361 |       }
362 |     }
363 |   },
364 |   "cells": [
365 |     {
366 |       "cell_type": "markdown",
367 |       "source": [
368 |         "# Llama 2 gguf Example\n",
369 |         "推薦至少使用 T4 GPU 來作為你的服務啟用"
370 |       ],
371 |       "metadata": {
372 |         "id": "sBv4jxXPJKha"
373 |       }
374 |     },
375 |     {
376 |       "cell_type": "markdown",
377 |       "source": [
378 |         "## Step 0: Config Setting"
379 |       ],
380 |       "metadata": {
381 |         "id": "SUIz08O8NPsp"
382 |       }
383 |     },
384 |     {
385 |       "cell_type": "code",
386 |       "source": [
387 |         "GGUF_HUGGINGFACE_REPO = \"audreyt/Taiwan-LLM-7B-v2.1-chat-GGUF\"\n",
388 |         "GGUF_HUGGINGFACE_BIN_FILE = \"Taiwan-LLM-7B-v2.1-chat-Q5_1.gguf\""
389 |       ],
390 |       "metadata": {
391 |         "id": "64oUtFgwGko8"
392 |       },
393 |       "execution_count": 1,
394 |       "outputs": []
395 |     },
396 |     {
397 |       "cell_type": "markdown",
398 |       "source": [
399 |         "## Step 1: Install python package\n"
400 |       ],
401 |       "metadata": {
402 |         "id": "Wz_9tYIBNYwj"
403 |       }
404 |     },
405 |     {
406 |       "cell_type": "code",
407 |       "execution_count": 2,
408 |       "metadata": {
409 |         "colab": {
410 |           "base_uri": "https://localhost:8080/"
411 |         },
412 |         "id": "fSwCrQsEEAtm",
413 |         "outputId": "205fbf85-95b9-4d51-e3fb-caa75c0448dd"
414 |       },
415 |       "outputs": [
416 |         {
417 |           "output_type": "stream",
418 |           "name": "stdout",
419 |           "text": [
420 |             "Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)\n",
421 |             "Collecting llama-cpp-python==0.2.26\n",
422 |             "  Downloading llama_cpp_python-0.2.26.tar.gz (8.8 MB)\n",
423 |             "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m8.8/8.8 MB\u001b[0m \u001b[31m24.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
424 |             "\u001b[?25h  Running command pip subprocess to install build dependencies\n",
425 |             "  Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)\n",
426 |             "  Collecting scikit-build-core[pyproject]>=0.5.1\n",
427 |             "    Downloading scikit_build_core-0.7.0-py3-none-any.whl (136 kB)\n",
428 |             "       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 136.6/136.6 kB 2.2 MB/s eta 0:00:00\n",
429 |             "  Collecting exceptiongroup (from scikit-build-core[pyproject]>=0.5.1)\n",
430 |             "    Downloading exceptiongroup-1.2.0-py3-none-any.whl (16 kB)\n",
431 |             "  Collecting packaging>=20.9 (from scikit-build-core[pyproject]>=0.5.1)\n",
432 |             "    Downloading packaging-23.2-py3-none-any.whl (53 kB)\n",
433 |             "       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.0/53.0 kB 3.6 MB/s eta 0:00:00\n",
434 |             "  Collecting tomli>=1.1 (from scikit-build-core[pyproject]>=0.5.1)\n",
435 |             "    Downloading tomli-2.0.1-py3-none-any.whl (12 kB)\n",
436 |             "  Collecting pathspec>=0.10.1 (from scikit-build-core[pyproject]>=0.5.1)\n",
437 |             "    Downloading pathspec-0.12.1-py3-none-any.whl (31 kB)\n",
438 |             "  Collecting pyproject-metadata>=0.5 (from scikit-build-core[pyproject]>=0.5.1)\n",
439 |             "    Downloading pyproject_metadata-0.7.1-py3-none-any.whl (7.4 kB)\n",
440 |             "  Installing collected packages: tomli, pathspec, packaging, exceptiongroup, scikit-build-core, pyproject-metadata\n",
441 |             "  ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
442 |             "  lida 0.0.10 requires fastapi, which is not installed.\n",
443 |             "  lida 0.0.10 requires kaleido, which is not installed.\n",
444 |             "  lida 0.0.10 requires python-multipart, which is not installed.\n",
445 |             "  lida 0.0.10 requires uvicorn, which is not installed.\n",
446 |             "  Successfully installed exceptiongroup-1.2.0 packaging-23.2 pathspec-0.12.1 pyproject-metadata-0.7.1 scikit-build-core-0.7.0 tomli-2.0.1\n",
447 |             "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
448 |             "  Running command Getting requirements to build wheel\n",
449 |             "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
450 |             "  Running command pip subprocess to install backend dependencies\n",
451 |             "  Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)\n",
452 |             "  Collecting cmake>=3.21\n",
453 |             "    Downloading cmake-3.28.1-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (26.3 MB)\n",
454 |             "       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.3/26.3 MB 17.6 MB/s eta 0:00:00\n",
455 |             "  Collecting ninja>=1.5\n",
456 |             "    Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)\n",
457 |             "       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 kB 15.9 MB/s eta 0:00:00\n",
458 |             "  Installing collected packages: ninja, cmake\n",
459 |             "    Creating /tmp/pip-build-env-mmxt792p/normal/local/bin\n",
460 |             "    changing mode of /tmp/pip-build-env-mmxt792p/normal/local/bin/ninja to 755\n",
461 |             "    changing mode of /tmp/pip-build-env-mmxt792p/normal/local/bin/cmake to 755\n",
462 |             "    changing mode of /tmp/pip-build-env-mmxt792p/normal/local/bin/cpack to 755\n",
463 |             "    changing mode of /tmp/pip-build-env-mmxt792p/normal/local/bin/ctest to 755\n",
464 |             "  Successfully installed cmake-3.28.1 ninja-1.11.1.1\n",
465 |             "  Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n",
466 |             "  Running command Preparing metadata (pyproject.toml)\n",
467 |             "  *** scikit-build-core 0.7.0 using CMake 3.28.1 (metadata_wheel)\n",
468 |             "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
469 |             "Collecting typing-extensions>=4.5.0 (from llama-cpp-python==0.2.26)\n",
470 |             "  Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)\n",
471 |             "Collecting numpy>=1.20.0 (from llama-cpp-python==0.2.26)\n",
472 |             "  Downloading numpy-1.26.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)\n",
473 |             "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m18.2/18.2 MB\u001b[0m \u001b[31m146.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
474 |             "\u001b[?25hCollecting diskcache>=5.6.1 (from llama-cpp-python==0.2.26)\n",
475 |             "  Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)\n",
476 |             "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.5/45.5 kB\u001b[0m \u001b[31m163.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
477 |             "\u001b[?25hBuilding wheels for collected packages: llama-cpp-python\n",
478 |             "  Running command Building wheel for llama-cpp-python (pyproject.toml)\n",
479 |             "  *** scikit-build-core 0.7.0 using CMake 3.28.1 (wheel)\n",
480 |             "  *** Configuring CMake...\n",
481 |             "  loading initial cache file /tmp/tmp6qatxtsg/build/CMakeInit.txt\n",
482 |             "  -- The C compiler identification is GNU 11.4.0\n",
483 |             "  -- The CXX compiler identification is GNU 11.4.0\n",
484 |             "  -- Detecting C compiler ABI info\n",
485 |             "  -- Detecting C compiler ABI info - done\n",
486 |             "  -- Check for working C compiler: /usr/bin/cc - skipped\n",
487 |             "  -- Detecting C compile features\n",
488 |             "  -- Detecting C compile features - done\n",
489 |             "  -- Detecting CXX compiler ABI info\n",
490 |             "  -- Detecting CXX compiler ABI info - done\n",
491 |             "  -- Check for working CXX compiler: /usr/bin/c++ - skipped\n",
492 |             "  -- Detecting CXX compile features\n",
493 |             "  -- Detecting CXX compile features - done\n",
494 |             "  -- Found Git: /usr/bin/git (found version \"2.34.1\")\n",
495 |             "  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD\n",
496 |             "  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success\n",
497 |             "  -- Found Threads: TRUE\n",
498 |             "  -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version \"12.2.140\")\n",
499 |             "  -- cuBLAS found\n",
500 |             "  -- The CUDA compiler identification is NVIDIA 12.2.140\n",
501 |             "  -- Detecting CUDA compiler ABI info\n",
502 |             "  -- Detecting CUDA compiler ABI info - done\n",
503 |             "  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped\n",
504 |             "  -- Detecting CUDA compile features\n",
505 |             "  -- Detecting CUDA compile features - done\n",
506 |             "  -- Using CUDA architectures: 52;61;70\n",
507 |             "  -- CUDA host compiler is GNU 11.4.0\n",
508 |             "\n",
509 |             "  -- CMAKE_SYSTEM_PROCESSOR: x86_64\n",
510 |             "  -- x86 detected\n",
511 |             "  CMake Warning (dev) at CMakeLists.txt:21 (install):\n",
512 |             "    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.\n",
513 |             "  This warning is for project developers.  Use -Wno-dev to suppress it.\n",
514 |             "\n",
515 |             "  CMake Warning (dev) at CMakeLists.txt:30 (install):\n",
516 |             "    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.\n",
517 |             "  This warning is for project developers.  Use -Wno-dev to suppress it.\n",
518 |             "\n",
519 |             "  -- Configuring done (4.2s)\n",
520 |             "  -- Generating done (0.0s)\n",
521 |             "  -- Build files have been written to: /tmp/tmp6qatxtsg/build\n",
522 |             "  *** Building project with Ninja...\n",
523 |             "  Change Dir: '/tmp/tmp6qatxtsg/build'\n",
524 |             "\n",
525 |             "  Run Build Command(s): /tmp/pip-build-env-mmxt792p/normal/local/lib/python3.10/dist-packages/ninja/data/bin/ninja -v\n",
526 |             "  [1/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/ggml-alloc.c\n",
527 |             "  [2/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/ggml-backend.c\n",
528 |             "  [3/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/ggml-quants.c\n",
529 |             "  [4/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/ggml.c\n",
530 |             "  [5/23] cd /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp && /tmp/pip-build-env-mmxt792p/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake -DMSVC= -DCMAKE_C_COMPILER_VERSION=11.4.0 -DCMAKE_C_COMPILER_ID=GNU -DCMAKE_VS_PLATFORM_NAME= -DCMAKE_C_COMPILER=/usr/bin/cc -P /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/../scripts/gen-build-info-cpp.cmake\n",
531 |             "  -- Found Git: /usr/bin/git (found version \"2.34.1\")\n",
532 |             "  [6/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600  -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/build-info.cpp\n",
533 |             "  [7/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/common.cpp\n",
534 |             "  [8/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/sampling.cpp\n",
535 |             "  [9/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/console.cpp\n",
536 |             "  [10/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/grammar-parser.cpp\n",
537 |             "  [11/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/llama.cpp\n",
538 |             "  [12/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/train.cpp\n",
539 |             "  [13/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/llava.cpp\n",
540 |             "  [14/23] /usr/bin/c++  -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/common/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/../../common -O3 -DNDEBUG -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/llava-cli.cpp\n",
541 |             "  [15/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/examples/llava/clip.cpp\n",
542 |             "  [16/23] : && /tmp/pip-build-env-mmxt792p/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/examples/llava/libllava_static.a && /usr/bin/ar qc vendor/llama.cpp/examples/llava/libllava_static.a  vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o && /usr/bin/ranlib vendor/llama.cpp/examples/llava/libllava_static.a && :\n",
543 |             "  [17/23] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++11 \"--generate-code=arch=compute_52,code=[compute_52,sm_52]\" \"--generate-code=arch=compute_61,code=[compute_61,sm_61]\" \"--generate-code=arch=compute_70,code=[compute_70,sm_70]\" -Xcompiler=-fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler \"-Wno-array-bounds -Wno-format-truncation -Wextra-semi\" -march=native -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o\n",
544 |             "  [18/23] : && /usr/bin/g++ -fPIC  -shared -Wl,-soname,libggml_shared.so -o vendor/llama.cpp/libggml_shared.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so  /usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libculibos.a  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl -L\"/usr/local/cuda/targets/x86_64-linux/lib/stubs\" -L\"/usr/local/cuda/targets/x86_64-linux/lib\" && :\n",
545 |             "  [19/23] : && /tmp/pip-build-env-mmxt792p/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/libggml_static.a && /usr/bin/ar qc vendor/llama.cpp/libggml_static.a  vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o && /usr/bin/ranlib vendor/llama.cpp/libggml_static.a && :\n",
546 |             "  [20/23] : && /usr/bin/c++ -fPIC -O3 -DNDEBUG   -shared -Wl,-soname,libllama.so -o vendor/llama.cpp/libllama.so vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -L/usr/local/cuda/targets/x86_64-linux/lib -Wl,-rpath,/usr/local/cuda-12.2/targets/x86_64-linux/lib:  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so  /usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libculibos.a  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl && :\n",
547 |             "  [21/23] : && /tmp/pip-build-env-mmxt792p/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake -E rm -f vendor/llama.cpp/common/libcommon.a && /usr/bin/ar qc vendor/llama.cpp/common/libcommon.a  vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o && /usr/bin/ranlib vendor/llama.cpp/common/libcommon.a && :\n",
548 |             "  [22/23] : && /usr/bin/c++ -fPIC -O3 -DNDEBUG   -shared -Wl,-soname,libllava.so -o vendor/llama.cpp/examples/llava/libllava.so vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o  -Wl,-rpath,/tmp/tmp6qatxtsg/build/vendor/llama.cpp:/usr/local/cuda-12.2/targets/x86_64-linux/lib:  vendor/llama.cpp/libllama.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libculibos.a  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so  /usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so && :\n",
549 |             "  [23/23] : && /usr/bin/c++ -O3 -DNDEBUG  vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -o vendor/llama.cpp/examples/llava/llava-cli  -Wl,-rpath,/tmp/tmp6qatxtsg/build/vendor/llama.cpp:/usr/local/cuda-12.2/targets/x86_64-linux/lib:  vendor/llama.cpp/common/libcommon.a  vendor/llama.cpp/libllama.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libculibos.a  /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so  /usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so && :\n",
550 |             "\n",
551 |             "  *** Installing project into wheel...\n",
552 |             "  -- Install configuration: \"Release\"\n",
553 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/lib/libggml_shared.so\n",
554 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/lib/cmake/Llama/LlamaConfig.cmake\n",
555 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/lib/cmake/Llama/LlamaConfigVersion.cmake\n",
556 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/include/ggml.h\n",
557 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/include/ggml-cuda.h\n",
558 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/lib/libllama.so\n",
559 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/tmp6qatxtsg/wheel/platlib/lib/libllama.so\" to \"\"\n",
560 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/include/llama.h\n",
561 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/bin/convert.py\n",
562 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/bin/convert-lora-to-ggml.py\n",
563 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/llama_cpp/libllama.so\n",
564 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/tmp6qatxtsg/wheel/platlib/llama_cpp/libllama.so\" to \"\"\n",
565 |             "  -- Installing: /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/llama_cpp/libllama.so\n",
566 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/llama_cpp/libllama.so\" to \"\"\n",
567 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/lib/libllava.so\n",
568 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/tmp6qatxtsg/wheel/platlib/lib/libllava.so\" to \"\"\n",
569 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/bin/llava-cli\n",
570 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/tmp6qatxtsg/wheel/platlib/bin/llava-cli\" to \"\"\n",
571 |             "  -- Installing: /tmp/tmp6qatxtsg/wheel/platlib/llama_cpp/libllava.so\n",
572 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/tmp6qatxtsg/wheel/platlib/llama_cpp/libllava.so\" to \"\"\n",
573 |             "  -- Installing: /tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/llama_cpp/libllava.so\n",
574 |             "  -- Set non-toolchain portion of runtime path of \"/tmp/pip-install-wast9ri8/llama-cpp-python_098752e1fe944b4c88f36b19fee61ca6/llama_cpp/libllava.so\" to \"\"\n",
575 |             "  *** Making wheel...\n",
576 |             "  *** Created llama_cpp_python-0.2.26-cp310-cp310-manylinux_2_35_x86_64.whl...\n",
577 |             "  Building wheel for llama-cpp-python (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
578 |             "  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.26-cp310-cp310-manylinux_2_35_x86_64.whl size=8129912 sha256=6241094aacd8dc73f7a0133d15c419695f48e0639fa2858590975822dd3a059d\n",
579 |             "  Stored in directory: /tmp/pip-ephem-wheel-cache-mhfgccsy/wheels/91/80/ce/ac6afea8c1d6fbcec7e14183033a5b2796c742d4f470010c72\n",
580 |             "Successfully built llama-cpp-python\n",
581 |             "Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python\n",
582 |             "  Attempting uninstall: typing-extensions\n",
583 |             "    Found existing installation: typing_extensions 4.5.0\n",
584 |             "    Uninstalling typing_extensions-4.5.0:\n",
585 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/__pycache__/typing_extensions.cpython-310.pyc\n",
586 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/typing_extensions-4.5.0.dist-info/\n",
587 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/typing_extensions.py\n",
588 |             "      Successfully uninstalled typing_extensions-4.5.0\n",
589 |             "  Attempting uninstall: numpy\n",
590 |             "    Found existing installation: numpy 1.23.5\n",
591 |             "    Uninstalling numpy-1.23.5:\n",
592 |             "      Removing file or directory /usr/local/bin/f2py\n",
593 |             "      Removing file or directory /usr/local/bin/f2py3\n",
594 |             "      Removing file or directory /usr/local/bin/f2py3.10\n",
595 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/numpy-1.23.5.dist-info/\n",
596 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/numpy.libs/\n",
597 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/numpy/\n",
598 |             "      Successfully uninstalled numpy-1.23.5\n",
599 |             "  changing mode of /usr/local/bin/f2py to 755\n",
600 |             "  Attempting uninstall: diskcache\n",
601 |             "    Found existing installation: diskcache 5.6.3\n",
602 |             "    Uninstalling diskcache-5.6.3:\n",
603 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/diskcache-5.6.3.dist-info/\n",
604 |             "      Removing file or directory /usr/local/lib/python3.10/dist-packages/diskcache/\n",
605 |             "      Successfully uninstalled diskcache-5.6.3\n",
606 |             "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
607 |             "lida 0.0.10 requires fastapi, which is not installed.\n",
608 |             "lida 0.0.10 requires kaleido, which is not installed.\n",
609 |             "lida 0.0.10 requires python-multipart, which is not installed.\n",
610 |             "lida 0.0.10 requires uvicorn, which is not installed.\n",
611 |             "llmx 0.0.15a0 requires cohere, which is not installed.\n",
612 |             "llmx 0.0.15a0 requires openai, which is not installed.\n",
613 |             "llmx 0.0.15a0 requires tiktoken, which is not installed.\n",
614 |             "tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.9.0 which is incompatible.\u001b[0m\u001b[31m\n",
615 |             "\u001b[0mSuccessfully installed diskcache-5.6.3 llama-cpp-python-0.2.26 numpy-1.26.2 typing-extensions-4.9.0\n"
616 |           ]
617 |         }
618 |       ],
619 |       "source": [
620 |         "!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.25 --force-reinstall --upgrade --no-cache-dir --verbose"
621 |       ]
622 |     },
623 |     {
624 |       "cell_type": "code",
625 |       "source": [
626 |         "!pip install huggingface_hub"
627 |       ],
628 |       "metadata": {
629 |         "colab": {
630 |           "base_uri": "https://localhost:8080/"
631 |         },
632 |         "id": "WI4udGDKF9Pe",
633 |         "outputId": "a24476f8-7d73-4511-ac72-b9da8132e996"
634 |       },
635 |       "execution_count": 3,
636 |       "outputs": [
637 |         {
638 |           "output_type": "stream",
639 |           "name": "stdout",
640 |           "text": [
641 |             "Requirement already satisfied: huggingface_hub in /usr/local/lib/python3.10/dist-packages (0.19.4)\n",
642 |             "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (3.13.1)\n",
643 |             "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (2023.6.0)\n",
644 |             "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (2.31.0)\n",
645 |             "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (4.66.1)\n",
646 |             "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (6.0.1)\n",
647 |             "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (4.9.0)\n",
648 |             "Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface_hub) (23.2)\n",
649 |             "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (3.3.2)\n",
650 |             "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (3.6)\n",
651 |             "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (2.0.7)\n",
652 |             "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface_hub) (2023.11.17)\n"
653 |           ]
654 |         }
655 |       ]
656 |     },
657 |     {
658 |       "cell_type": "markdown",
659 |       "source": [
660 |         "## Step 2: Download GGML LLM Model"
661 |       ],
662 |       "metadata": {
663 |         "id": "pbNa1w_SNcaj"
664 |       }
665 |     },
666 |     {
667 |       "cell_type": "code",
668 |       "source": [
669 |         "from huggingface_hub import hf_hub_download\n",
670 |         "\n",
671 |         "model_path = hf_hub_download(repo_id=GGUF_HUGGINGFACE_REPO, filename=GGUF_HUGGINGFACE_BIN_FILE)\n",
672 |         "model_path"
673 |       ],
674 |       "metadata": {
675 |         "colab": {
676 |           "base_uri": "https://localhost:8080/",
677 |           "height": 85,
678 |           "referenced_widgets": [
679 |             "4bb480c16b9a41bfa0227ec094e1415c",
680 |             "f145b14de79340328e6ae0c561deea89",
681 |             "5131d58c260842ccbbf663b493a34c81",
682 |             "b1d81c8ad8bd40358f5f9b0f731d14df",
683 |             "e5c6918b22af4407ad5a4498342ed8f0",
684 |             "56a77d51814a4d418ed6b8c148808ac6",
685 |             "2c7ad5881e684b9894c6e6a6f573fd22",
686 |             "a6b32ef03b5344e6bc9b3cb76647dba3",
687 |             "15a5e9800661418c85f7edba054fa0bc",
688 |             "77b4ce39574449e2a6d635bb536d943f",
689 |             "33dba6865d7b4d38b0c6fa09f8a443e8"
690 |           ]
691 |         },
692 |         "id": "kzes1srfF9K9",
693 |         "outputId": "8ef73338-79ce-4b01-a344-71bc7f86d7aa"
694 |       },
695 |       "execution_count": 4,
696 |       "outputs": [
697 |         {
698 |           "output_type": "display_data",
699 |           "data": {
700 |             "text/plain": [
701 |               "Taiwan-LLM-7B-v2.1-chat-Q5_1.gguf:   0%|          | 0.00/5.06G [00:00<?, ?B/s]"
702 |             ],
703 |             "application/vnd.jupyter.widget-view+json": {
704 |               "version_major": 2,
705 |               "version_minor": 0,
706 |               "model_id": "4bb480c16b9a41bfa0227ec094e1415c"
707 |             }
708 |           },
709 |           "metadata": {}
710 |         },
711 |         {
712 |           "output_type": "execute_result",
713 |           "data": {
714 |             "text/plain": [
715 |               "'/root/.cache/huggingface/hub/models--audreyt--Taiwan-LLM-7B-v2.1-chat-GGUF/snapshots/9770a701fd1aad4677f78098adb5cc39a3a768f9/Taiwan-LLM-7B-v2.1-chat-Q5_1.gguf'"
716 |             ],
717 |             "application/vnd.google.colaboratory.intrinsic+json": {
718 |               "type": "string"
719 |             }
720 |           },
721 |           "metadata": {},
722 |           "execution_count": 4
723 |         }
724 |       ]
725 |     },
726 |     {
727 |       "cell_type": "markdown",
728 |       "source": [
729 |         "## Step 3: Give the prompt and predict the chat completion"
730 |       ],
731 |       "metadata": {
732 |         "id": "ZNg0uI0jNl-y"
733 |       }
734 |     },
735 |     {
736 |       "cell_type": "code",
737 |       "source": [
738 |         "import os\n",
739 |         "from llama_cpp import Llama\n",
740 |         "\n",
741 |         "llm = Llama(\n",
742 |         "    model_path=model_path,\n",
743 |         "    n_ctx = 2048,\n",
744 |         "    n_batch = 8,\n",
745 |         "    n_threads = int(os.cpu_count() / 2) or 1,\n",
746 |         "    f16_kv = True,\n",
747 |         "    embedding = True,\n",
748 |         "    last_n_tokens_size = 64,\n",
749 |         "    n_gpu_layers=1000\n",
750 |         ")"
751 |       ],
752 |       "metadata": {
753 |         "colab": {
754 |           "base_uri": "https://localhost:8080/"
755 |         },
756 |         "id": "GNCPKG5LKQB4",
757 |         "outputId": "f744033c-afe9-477b-87ef-1212c5713469"
758 |       },
759 |       "execution_count": 5,
760 |       "outputs": [
761 |         {
762 |           "output_type": "stream",
763 |           "name": "stderr",
764 |           "text": [
765 |             "AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | \n"
766 |           ]
767 |         }
768 |       ]
769 |     },
770 |     {
771 |       "cell_type": "code",
772 |       "source": [
773 |         "message_json = [\n",
774 |         "    {\n",
775 |         "        \"role\": \"system\",\n",
776 |         "        \"content\": \"You are a helpful assistant. You will get the question and you need to follow the question to answer the user's question.\"\n",
777 |         "    },\n",
778 |         "    {\n",
779 |         "        \"role\": \"user\",\n",
780 |         "        \"content\": \"台灣首都在台北嗎？\"\n",
781 |         "    }\n",
782 |         "]\n",
783 |         "\n",
784 |         "llm.create_chat_completion(\n",
785 |         "    messages = message_json,\n",
786 |         "    temperature = 0.1,\n",
787 |         "    top_p = 0.85,\n",
788 |         "    max_tokens = 512,\n",
789 |         "    stop = [],\n",
790 |         "    repeat_penalty = 1.1\n",
791 |         ")"
792 |       ],
793 |       "metadata": {
794 |         "colab": {
795 |           "base_uri": "https://localhost:8080/"
796 |         },
797 |         "id": "1CiuXfZuF9GP",
798 |         "outputId": "67392c3c-34f0-4c16-8be1-e1dfc1c31640"
799 |       },
800 |       "execution_count": 12,
801 |       "outputs": [
802 |         {
803 |           "output_type": "stream",
804 |           "name": "stderr",
805 |           "text": [
806 |             "Llama.generate: prefix-match hit\n"
807 |           ]
808 |         },
809 |         {
810 |           "output_type": "execute_result",
811 |           "data": {
812 |             "text/plain": [
813 |               "{'id': 'chatcmpl-03be2099-ae72-4ae2-b453-203c380198c9',\n",
814 |               " 'object': 'chat.completion',\n",
815 |               " 'created': 1703731400,\n",
816 |               " 'model': '/root/.cache/huggingface/hub/models--audreyt--Taiwan-LLM-7B-v2.1-chat-GGUF/snapshots/9770a701fd1aad4677f78098adb5cc39a3a768f9/Taiwan-LLM-7B-v2.1-chat-Q5_1.gguf',\n",
817 |               " 'choices': [{'index': 0,\n",
818 |               "   'message': {'role': 'assistant', 'content': 'ASSISTANT: 是的，台灣的首都是台北。'},\n",
819 |               "   'finish_reason': 'stop'}],\n",
820 |               " 'usage': {'prompt_tokens': 60, 'completion_tokens': 19, 'total_tokens': 79}}"
821 |             ]
822 |           },
823 |           "metadata": {},
824 |           "execution_count": 12
825 |         }
826 |       ]
827 |     }
828 |   ]
829 | }


--------------------------------------------------------------------------------