├── LICENSE
└── README.md


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Rabinovich
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # llama.cpp Android Tutorial (Adreno OpenCL Backend)
  2 | 
  3 | ## Note Updates
  4 | 
  5 | - 09/08/2023: First version of llama.cpp android tutorial is available
  6 | 
  7 | - **(NEW!)** 04/27/2025: 2025 version of tutorial is available
  8 |   - Also including `llama-cpp-python` + custom-built `llama.cpp` tutorial
  9 | 
 10 | ## Highlights
 11 | 
 12 | - Deploying `llama.cpp` on an Android device and running it using the Adreno GPU.
 13 | - Utilizing `llama-cpp-python` with a custom-built `llama.cpp` version that supports Adreno GPU with OpenCL:
 14 |   - Enables large-scale inference evaluation directly on Android.
 15 |   - Provides a solid foundation for developing your own Android LLM applications.
 16 | 
 17 | ## Hardware Prerequisites
 18 | 
 19 | - An Android device with a **Qualcomm Snapdragon SoC** (Snapdragon 8 Gen 1, Gen 2, Gen 3, or 8 Elite)
 20 |   - Recommended (but not required): more than 12 GB of RAM
 21 | - Optional: an Ubuntu or Mac computer for building
 22 | 
 23 | ## Hardware Information
 24 | 
 25 | This tutorial is based on:
 26 | 
 27 | - Android Phone: OnePlus 13
 28 |   - SoC: Qualcomm Snapdragon 8 Elite
 29 |   - RAM: 16 GB
 30 | - Ubuntu Laptop: Kubuntu 24.04
 31 |   - Note #1: this guide is based on Ubuntu, but the steps are similar on MacOS and have been tested on a MacBook Pro.
 32 |   - Note #2: If you don't have a Linux laptop, then virtual machine also works
 33 | 
 34 | ## Reference Links
 35 | 
 36 | - [Qualcomm Official Tutorial](https://www.qualcomm.com/developer/blog/2024/11/introducing-new-opn-cl-gpu-backend-llama-cpp-for-qualcomm-adreno-gpu)
 37 | 
 38 | - [llama.cpp](https://github.com/ggerganov/llama.cpp)
 39 | 
 40 | - [llama.cpp OpenCL Backend](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/OPENCL.md)
 41 | 
 42 | - [llama-cpp-python: how to use custom-built llama.cpp](https://github.com/abetlen/llama-cpp-python/issues/1070)
 43 | 
 44 | - [Build llama.cpp locally](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md)
 45 | 
 46 | ## Software Requirements
 47 | 
 48 | ### Termux (Android Device)
 49 | 
 50 | Install Termux via F-Droid:
 51 | 
 52 | - [Official Termux Website](https://termux.dev/en/index.html)
 53 | - [F-Droid](https://f-droid.org/en/)
 54 | 
 55 | (Optional) To improve download speeds, you can change the repository:
 56 | 
 57 | ```bash
 58 | termux-change-repo
 59 | ```
 60 | 
 61 | **Important:** Avoid installing Termux from the Google Play Store.
 62 | 
 63 | ### Ubuntu Laptop
 64 | 
 65 | Required software:
 66 | 
 67 | - `python`, `cmake`, `make`, and `ninja`
 68 |   - Recommended: Install Python via `miniconda`
 69 |   - Guide: [Installing Miniconda](https://www.anaconda.com/docs/getting-started/miniconda/install)
 70 | - A C/C++ compiler (e.g., `clang`)
 71 | - Java Development Kit (in this case, `openjdk-21-jdk`)
 72 | 
 73 | Install them using the following command:
 74 | 
 75 | ```bash
 76 | sudo apt install cmake make ninja-build clang openjdk-21-jdk
 77 | ```
 78 | 
 79 | ## Enable Termux Support for OpenCL + Adreno GPU
 80 | 
 81 | First, update termux packages:
 82 | 
 83 | ```bash
 84 | pkg update && pkg upgrade
 85 | ```
 86 | 
 87 | Then, download required libraries:
 88 | 
 89 | ```bash
 90 | pkg install clinfo ocl-icd opencl-headers
 91 | ```
 92 | 
 93 | 
 94 | 
 95 | ### Option 1: For most of the Android Phone
 96 | 
 97 | Add dynamic link library `libOpenCL.so` path into `LD_LIBRARY_PATH` variable
 98 | 
 99 | ```bash
100 | vim .bashrc
101 | ```
102 | 
103 | ```bash
104 | # In .bashrc file, add:
105 | export LD_LIBRARY_PATH=/vendor/lib64:$PREFIX/lib
106 | ```
107 | 
108 | For some older phone, the location of `libOpenCL.so` might be:
109 | 
110 | ```bash
111 | export LD_LIBRARY_PATH=/system/vendor/lib64:$PREFIX/lib
112 | ```
113 | 
114 | After that:
115 | 
116 | ```bash
117 | source ~/.bashrc
118 | ```
119 | 
120 | Test if it works:
121 | 
122 | ```bash
123 | clinfo
124 | ```
125 | 
126 | If it returns:
127 | 
128 | ```
129 | Number of platforms                               1
130 |   Platform Name                                   QUALCOMM Snapdragon(TM)
131 | ...
132 | ```
133 | 
134 | It means OpenCL driver is loaded **successfully** 
135 | 
136 | If it returns:
137 | 
138 | ```bash
139 | Number of platforms                               0
140 | ...
141 | ```
142 | 
143 | It means OpenCL driver is **not detected**, and you may follow Option 2 below
144 | 
145 | ### Option 2: For OnePlus 13 (or other phone)
146 | 
147 | Copy both `libOpenCL.so` and `libOpenCL_adreno.so` into your termux app
148 | 
149 | ```bash
150 | # The destination can be any other directory in your termux
151 | cp /vendor/lib64/{libOpenCL.so, libOpenCL_adreno.so} ~/
152 | ```
153 | 
154 | Add the directory path containing these two libraries to the `LD_LIBRARY_PATH` environment variable
155 | 
156 | ```bash
157 | vim ~/.bashrc
158 | ```
159 | 
160 | ```bash
161 | # In my case, $HOME is the location of your directory containing two .so files
162 | # In .bashrc file, add:
163 | export LD_LIBRARY_PATH=$HOME:$LD_LIBRARY_PATH
164 | ```
165 | 
166 | ```bash
167 | source ~/.bashrc
168 | ```
169 | 
170 | Now, test it again:
171 | 
172 | ```bash
173 | clinfo
174 | ```
175 | 
176 | If it returns:
177 | 
178 | ```
179 | Number of platforms                               1
180 |   Platform Name                                   QUALCOMM Snapdragon(TM)
181 | ...
182 | ```
183 | 
184 | It means OpenCL driver is loaded **successfully** 
185 | 
186 | ## Building llama.cpp with OpenCL Backend
187 | 
188 | There are two options available:
189 | 
190 | 1. Option 1: Build on Laptop and send it to Android phone
191 | 2. Option 2: Build on Android phone directly
192 |    1. As of April 27, 2025, `llama-cpp-python` does not natively support building `llama.cpp` with OpenCL for Android platforms. This means you'll have to compile `llama.cpp` separately on Android phone and then integrate it with `llama-cpp-python`.
193 |    2. It's important to note that `llama-cpp-python` serves as a Python wrapper around the `llama.cpp` library. This option allows you to customize and replace the underlying `llama.cpp` implementation to suit your specific needs.
194 | 
195 | ### Option 1: Building from Ubuntu/Mac Laptop
196 | 
197 | #### Install Android NDK
198 | 
199 | Run these commands first:
200 | 
201 | ```bash
202 | cd ~ && \
203 | wget https://dl.google.com/android/repository/commandlinetools-linux-8512546_latest.zip && \ 
204 | unzip commandlinetools-linux-8512546_latest.zip && \ 
205 | mkdir -p ~/android-sdk/cmdline-tools && \ 
206 | mv cmdline-tools latest && \ 
207 | mv latest ~/android-sdk/cmdline-tools/ && \ 
208 | rm -rf commandlinetools-linux-8512546_latest.zip 
209 | ```
210 | 
211 | Check your OpenJDK location (please making sure that you download the OpenJDK-21 )
212 | 
213 | ```bash
214 | readlink -f $(which java)
215 | ```
216 | 
217 | Set `JAVA_HOME`:
218 | 
219 | ```bash
220 | vim ~/.bashsrc
221 | ```
222 | 
223 | ```bash
224 | export JAVA_HOME=/your/openjdk/directory
225 | ```
226 | 
227 | ```bash
228 | source ~/.bashrc
229 | ```
230 | 
231 | Finally, run this:
232 | 
233 | ```bash
234 | yes | ~/android-sdk/cmdline-tools/latest/bin/sdkmanager "ndk;26.3.11579264" 
235 | ```
236 | 
237 | #### Install OpenCL headers and ICD loader
238 | 
239 | ```bash
240 | # You can choose your own directory, here's an example from Qualcomm's official tutorial:
241 | mkdir -p ~/dev/llm && cd ~/dev/llm
242 | ```
243 | 
244 | ```bash
245 | git clone https://github.com/KhronosGroup/OpenCL-Headers
246 | ```
247 | 
248 | ```bash
249 | cd OpenCL-Headers && cp -r CL ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include 
250 | ```
251 | 
252 | For MacOS, it will be:
253 | ```
254 | ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/darwin-x86_64/sysroot/usr/include 
255 | ```
256 | 
257 | Then:
258 | 
259 | ```bash
260 | cd ~/dev/llm 
261 | ```
262 | 
263 | ```bash
264 | # If you are using MacOS, you may need to change your OPENCL_ICD_LOADER_HEADERS_DIR
265 | git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader && \ 
266 | cd OpenCL-ICD-Loader && \ 
267 | mkdir build_ndk26 && cd build_ndk26 && \ 
268 | cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \ 
269 |   -DCMAKE_TOOLCHAIN_FILE=~/android-sdk/ndk/26.3.11579264/build/cmake/android.toolchain.cmake \ 
270 |   -DOPENCL_ICD_LOADER_HEADERS_DIR=~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include \ 
271 |   -DANDROID_ABI=arm64-v8a \ 
272 |   -DANDROID_PLATFORM=24 \ 
273 |   -DANDROID_STL=c++_shared
274 | ```
275 | 
276 | ```bash
277 | ninja && cp libOpenCL.so ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android 
278 | ```
279 | 
280 | #### Build llama.cpp with the Adreno OpenCL backend
281 | 
282 | ```bash
283 | cd ~/dev/llm 
284 | ```
285 | 
286 | ```bash
287 | git clone https://github.com/ggerganov/llama.cpp && \
288 | cd llama.cpp && \ 
289 | mkdir build-android && cd build-android 
290 | ```
291 | 
292 | ```bash
293 | cmake .. -G Ninja \ 
294 |   -DCMAKE_TOOLCHAIN_FILE=$HOME/android-sdk/ndk/26.3.11579264/build/cmake/android.toolchain.cmake \ 
295 |   -DANDROID_ABI=arm64-v8a \ 
296 |   -DANDROID_PLATFORM=android-28 \ 
297 |   -DBUILD_SHARED_LIBS=OFF \ 
298 |   -DGGML_OPENCL=ON
299 | ```
300 | 
301 | ```bash
302 | ninja
303 | ```
304 | 
305 | Finally, copy the `dev` folder into the termux
306 | 
307 | -  Making sure you run `termux-setup-storage` on termux
308 | 
309 | 
310 | 
311 | Now, the executive files is available under `llama.cpp/build-android/bin`
312 | 
313 | Run `llama-bench` to test:
314 | 
315 | ```bash
316 | ./llama-bench -m /your/model/directory -ngl 99
317 | ```
318 | 
319 | Result should looks like this:
320 | 
321 | ```
322 | ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
323 | ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) 830'
324 | ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.47.18.21
325 | ggml_opencl: vector subgroup broadcast support: true
326 | ggml_opencl: device FP16 support: true
327 | ggml_opencl: mem base addr align: 128
328 | ggml_opencl: max mem alloc size: 1024 MB
329 | ggml_opencl: SVM coarse grain buffer support: true
330 | ggml_opencl: SVM fine grain buffer support: true
331 | ggml_opencl: SVM fine grain system support: false
332 | ggml_opencl: SVM atomics support: true
333 | ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
334 | ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
335 | ...
336 | ```
337 | 
338 | 
339 | 
340 | ### Option 2: Building from Android Phone (Recommend)
341 | 
342 | **Note: As of 04/27/2025, building from newest version of llama.cpp will cause segmentation fault. Please select a version before *b5028*. For more details, please visit: https://github.com/ggml-org/llama.cpp/issues/9289#issuecomment-2788276223**
343 | 
344 | 
345 | 
346 | Clone llama.cpp source code:
347 | 
348 | ```bash
349 | git clone https://github.com/ggml-org/llama.cpp.git
350 | ```
351 | 
352 | ```bash
353 | cd llama.cpp
354 | ```
355 | 
356 | If the segmentation fault is not fixed, then we need to switch to older version:
357 | 
358 | ```bash
359 | git reset --hard b5026
360 | ```
361 | 
362 | Note: set `BUILD_SHARED_LIBS` to `ON` if you want to use `llama-cpp-python` later:
363 | 
364 | ```bash
365 | cmake -B build-android \
366 | -DBUILD_SHARED_LIBS=ON \
367 | -DGGML_OPENCL=ON \
368 | -DGGML_OPENCL_EMBED_KERNELS=ON \
369 | -DGGML_OPENCL_USE_ADRENO_KERNELS=ON
370 | ```
371 | 
372 | Build the llama.cpp:
373 | 
374 | ```bash
375 | cmake --build build-android --config Release
376 | ```
377 | 
378 | Same as previous section, test by:
379 | 
380 | ```bash
381 | ./llama-bench -m /your/model/directory -ngl 99
382 | ```
383 | 
384 | 
385 | 
386 | ## `llama-cpp-python` with Adreno OpenCL Backend
387 | 
388 | **Note: Making sure you have set `BUILD_SHARED_LIBS=ON` when building the `llama.cpp`**
389 | 
390 | First, you need to set the eviromental variable:
391 | 
392 | ```bash
393 | vim ~/.bashrc
394 | ```
395 | 
396 | Export the following path:
397 | 
398 | ```bash
399 | # The path of libllama.so is under llama.cpp/build-android/bin
400 | export LLAMA_CPP_LIB_PATH=/home/.../llama.cpp/build-android/bin
401 | ```
402 | 
403 | ```bash
404 | source ~/.bashrc
405 | ```
406 | 
407 | Now, install llama-cpp-python with `LLAMA_BUILD=OFF`
408 | 
409 | ```bash
410 | CMAKE_ARGS="-DLLAMA_BUILD=OFF" pip install llama-cpp-python --force-reinstall
411 | ```
412 | 
413 | To test it, run:
414 | 
415 | ```bash
416 | python -c "from llama_cpp import Llama; llm = Llama(model_path="your/model/path", n_gpu_layers=99, verbose=True)"
417 | ```
418 | 
419 | If you return verbose information like this:
420 | ```bash
421 | ...
422 | load_tensors: offloading 28 repeating layers to GPU
423 | load_tensors: offloading output layer to GPU
424 | load_tensors: offloaded 29/29 layers to GPU
425 | load_tensors:   CPU_Mapped model buffer size =   434.29 MiB
426 | load_tensors:       OpenCL model buffer size =  1780.40 MiB
427 | ...
428 | ```
429 | 
430 | This means that `llama-cpp-python` is using the custom-built `llama.cpp`
431 | 
432 | ## Testing Results
433 | 
434 | Device: OnePlus 13
435 | 
436 | |        model        |   size   | params | backend | ngl  | test  | t/s            |
437 | | :-----------------: | :------: | :----: | :-----: | :--: | :---: | -------------- |
438 | |  llama 3.2 3B Q4_0  | 1.78 GiB | 3.21 B | OpenCL  |  99  | pp512 | 168.47 ± 0.98  |
439 | |  llama 3.2 3B Q4_0  | 1.78 GiB | 3.21 B | OpenCL  |  99  | tg128 | 17.57 ± 1.40   |
440 | |  llama 3.2 3B Q4_0  | 1.78 GiB | 3.21 B | OpenCL  |  99  | tg256 | 15.45 ± 0.59   |
441 | |  llama 3.2 3B Q4_0  | 1.78 GiB | 3.21 B | OpenCL  |  99  | tg512 | 13.60 ± 0.79   |
442 | |   gemma 2 2B Q4_0   | 1.51 GiB | 2.61 B | OpenCL  |  99  | pp512 | 187.26 ± 0.49  |
443 | |   gemma 2 2B Q4_0   | 1.51 GiB | 2.61 B | OpenCL  |  99  | tg128 | 8.11 ± 0.13    |
444 | |   gemma 2 2B Q4_0   | 1.51 GiB | 2.61 B | OpenCL  |  99  | tg256 | 8.06 ± 0.07    |
445 | |   gemma 2 2B Q4_0   | 1.51 GiB | 2.61 B | OpenCL  |  99  | tg512 | 8.02 ± 0.11    |
446 | | phi 3 3.5 mini Q4_0 | 2.03 GiB | 3.82 B | OpenCL  |  99  | pp512 | 110.60 ± 11.96 |
447 | | phi 3 3.5 mini Q4_0 | 2.03 GiB | 3.82 B | OpenCL  |  99  | tg128 | 16.76 ± 0.37   |
448 | | phi 3 3.5 mini Q4_0 | 2.03 GiB | 3.82 B | OpenCL  |  99  | tg256 | 16.10 ± 0.53   |
449 | | phi 3 3.5 mini Q4_0 | 2.03 GiB | 3.82 B | OpenCL  |  99  | tg512 | 14.89 ± 0.16   |
450 | 


--------------------------------------------------------------------------------