├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Rabinovich 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # llama.cpp Android Tutorial (Adreno OpenCL Backend) 2 | 3 | ## Note Updates 4 | 5 | - 09/08/2023: First version of llama.cpp android tutorial is available 6 | 7 | - **(NEW!)** 04/27/2025: 2025 version of tutorial is available 8 | - Also including `llama-cpp-python` + custom-built `llama.cpp` tutorial 9 | 10 | ## Highlights 11 | 12 | - Deploying `llama.cpp` on an Android device and running it using the Adreno GPU. 13 | - Utilizing `llama-cpp-python` with a custom-built `llama.cpp` version that supports Adreno GPU with OpenCL: 14 | - Enables large-scale inference evaluation directly on Android. 15 | - Provides a solid foundation for developing your own Android LLM applications. 16 | 17 | ## Hardware Prerequisites 18 | 19 | - An Android device with a **Qualcomm Snapdragon SoC** (Snapdragon 8 Gen 1, Gen 2, Gen 3, or 8 Elite) 20 | - Recommended (but not required): more than 12 GB of RAM 21 | - Optional: an Ubuntu or Mac computer for building 22 | 23 | ## Hardware Information 24 | 25 | This tutorial is based on: 26 | 27 | - Android Phone: OnePlus 13 28 | - SoC: Qualcomm Snapdragon 8 Elite 29 | - RAM: 16 GB 30 | - Ubuntu Laptop: Kubuntu 24.04 31 | - Note #1: this guide is based on Ubuntu, but the steps are similar on MacOS and have been tested on a MacBook Pro. 32 | - Note #2: If you don't have a Linux laptop, then virtual machine also works 33 | 34 | ## Reference Links 35 | 36 | - [Qualcomm Official Tutorial](https://www.qualcomm.com/developer/blog/2024/11/introducing-new-opn-cl-gpu-backend-llama-cpp-for-qualcomm-adreno-gpu) 37 | 38 | - [llama.cpp](https://github.com/ggerganov/llama.cpp) 39 | 40 | - [llama.cpp OpenCL Backend](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/OPENCL.md) 41 | 42 | - [llama-cpp-python: how to use custom-built llama.cpp](https://github.com/abetlen/llama-cpp-python/issues/1070) 43 | 44 | - [Build llama.cpp locally](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md) 45 | 46 | ## Software Requirements 47 | 48 | ### Termux (Android Device) 49 | 50 | Install Termux via F-Droid: 51 | 52 | - [Official Termux Website](https://termux.dev/en/index.html) 53 | - [F-Droid](https://f-droid.org/en/) 54 | 55 | (Optional) To improve download speeds, you can change the repository: 56 | 57 | ```bash 58 | termux-change-repo 59 | ``` 60 | 61 | **Important:** Avoid installing Termux from the Google Play Store. 62 | 63 | ### Ubuntu Laptop 64 | 65 | Required software: 66 | 67 | - `python`, `cmake`, `make`, and `ninja` 68 | - Recommended: Install Python via `miniconda` 69 | - Guide: [Installing Miniconda](https://www.anaconda.com/docs/getting-started/miniconda/install) 70 | - A C/C++ compiler (e.g., `clang`) 71 | - Java Development Kit (in this case, `openjdk-21-jdk`) 72 | 73 | Install them using the following command: 74 | 75 | ```bash 76 | sudo apt install cmake make ninja-build clang openjdk-21-jdk 77 | ``` 78 | 79 | ## Enable Termux Support for OpenCL + Adreno GPU 80 | 81 | First, update termux packages: 82 | 83 | ```bash 84 | pkg update && pkg upgrade 85 | ``` 86 | 87 | Then, download required libraries: 88 | 89 | ```bash 90 | pkg install clinfo ocl-icd opencl-headers 91 | ``` 92 | 93 | 94 | 95 | ### Option 1: For most of the Android Phone 96 | 97 | Add dynamic link library `libOpenCL.so` path into `LD_LIBRARY_PATH` variable 98 | 99 | ```bash 100 | vim .bashrc 101 | ``` 102 | 103 | ```bash 104 | # In .bashrc file, add: 105 | export LD_LIBRARY_PATH=/vendor/lib64:$PREFIX/lib 106 | ``` 107 | 108 | For some older phone, the location of `libOpenCL.so` might be: 109 | 110 | ```bash 111 | export LD_LIBRARY_PATH=/system/vendor/lib64:$PREFIX/lib 112 | ``` 113 | 114 | After that: 115 | 116 | ```bash 117 | source ~/.bashrc 118 | ``` 119 | 120 | Test if it works: 121 | 122 | ```bash 123 | clinfo 124 | ``` 125 | 126 | If it returns: 127 | 128 | ``` 129 | Number of platforms 1 130 | Platform Name QUALCOMM Snapdragon(TM) 131 | ... 132 | ``` 133 | 134 | It means OpenCL driver is loaded **successfully** 135 | 136 | If it returns: 137 | 138 | ```bash 139 | Number of platforms 0 140 | ... 141 | ``` 142 | 143 | It means OpenCL driver is **not detected**, and you may follow Option 2 below 144 | 145 | ### Option 2: For OnePlus 13 (or other phone) 146 | 147 | Copy both `libOpenCL.so` and `libOpenCL_adreno.so` into your termux app 148 | 149 | ```bash 150 | # The destination can be any other directory in your termux 151 | cp /vendor/lib64/{libOpenCL.so, libOpenCL_adreno.so} ~/ 152 | ``` 153 | 154 | Add the directory path containing these two libraries to the `LD_LIBRARY_PATH` environment variable 155 | 156 | ```bash 157 | vim ~/.bashrc 158 | ``` 159 | 160 | ```bash 161 | # In my case, $HOME is the location of your directory containing two .so files 162 | # In .bashrc file, add: 163 | export LD_LIBRARY_PATH=$HOME:$LD_LIBRARY_PATH 164 | ``` 165 | 166 | ```bash 167 | source ~/.bashrc 168 | ``` 169 | 170 | Now, test it again: 171 | 172 | ```bash 173 | clinfo 174 | ``` 175 | 176 | If it returns: 177 | 178 | ``` 179 | Number of platforms 1 180 | Platform Name QUALCOMM Snapdragon(TM) 181 | ... 182 | ``` 183 | 184 | It means OpenCL driver is loaded **successfully** 185 | 186 | ## Building llama.cpp with OpenCL Backend 187 | 188 | There are two options available: 189 | 190 | 1. Option 1: Build on Laptop and send it to Android phone 191 | 2. Option 2: Build on Android phone directly 192 | 1. As of April 27, 2025, `llama-cpp-python` does not natively support building `llama.cpp` with OpenCL for Android platforms. This means you'll have to compile `llama.cpp` separately on Android phone and then integrate it with `llama-cpp-python`. 193 | 2. It's important to note that `llama-cpp-python` serves as a Python wrapper around the `llama.cpp` library. This option allows you to customize and replace the underlying `llama.cpp` implementation to suit your specific needs. 194 | 195 | ### Option 1: Building from Ubuntu/Mac Laptop 196 | 197 | #### Install Android NDK 198 | 199 | Run these commands first: 200 | 201 | ```bash 202 | cd ~ && \ 203 | wget https://dl.google.com/android/repository/commandlinetools-linux-8512546_latest.zip && \ 204 | unzip commandlinetools-linux-8512546_latest.zip && \ 205 | mkdir -p ~/android-sdk/cmdline-tools && \ 206 | mv cmdline-tools latest && \ 207 | mv latest ~/android-sdk/cmdline-tools/ && \ 208 | rm -rf commandlinetools-linux-8512546_latest.zip 209 | ``` 210 | 211 | Check your OpenJDK location (please making sure that you download the OpenJDK-21 ) 212 | 213 | ```bash 214 | readlink -f $(which java) 215 | ``` 216 | 217 | Set `JAVA_HOME`: 218 | 219 | ```bash 220 | vim ~/.bashsrc 221 | ``` 222 | 223 | ```bash 224 | export JAVA_HOME=/your/openjdk/directory 225 | ``` 226 | 227 | ```bash 228 | source ~/.bashrc 229 | ``` 230 | 231 | Finally, run this: 232 | 233 | ```bash 234 | yes | ~/android-sdk/cmdline-tools/latest/bin/sdkmanager "ndk;26.3.11579264" 235 | ``` 236 | 237 | #### Install OpenCL headers and ICD loader 238 | 239 | ```bash 240 | # You can choose your own directory, here's an example from Qualcomm's official tutorial: 241 | mkdir -p ~/dev/llm && cd ~/dev/llm 242 | ``` 243 | 244 | ```bash 245 | git clone https://github.com/KhronosGroup/OpenCL-Headers 246 | ``` 247 | 248 | ```bash 249 | cd OpenCL-Headers && cp -r CL ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include 250 | ``` 251 | 252 | For MacOS, it will be: 253 | ``` 254 | ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/darwin-x86_64/sysroot/usr/include 255 | ``` 256 | 257 | Then: 258 | 259 | ```bash 260 | cd ~/dev/llm 261 | ``` 262 | 263 | ```bash 264 | # If you are using MacOS, you may need to change your OPENCL_ICD_LOADER_HEADERS_DIR 265 | git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader && \ 266 | cd OpenCL-ICD-Loader && \ 267 | mkdir build_ndk26 && cd build_ndk26 && \ 268 | cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \ 269 | -DCMAKE_TOOLCHAIN_FILE=~/android-sdk/ndk/26.3.11579264/build/cmake/android.toolchain.cmake \ 270 | -DOPENCL_ICD_LOADER_HEADERS_DIR=~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include \ 271 | -DANDROID_ABI=arm64-v8a \ 272 | -DANDROID_PLATFORM=24 \ 273 | -DANDROID_STL=c++_shared 274 | ``` 275 | 276 | ```bash 277 | ninja && cp libOpenCL.so ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android 278 | ``` 279 | 280 | #### Build llama.cpp with the Adreno OpenCL backend 281 | 282 | ```bash 283 | cd ~/dev/llm 284 | ``` 285 | 286 | ```bash 287 | git clone https://github.com/ggerganov/llama.cpp && \ 288 | cd llama.cpp && \ 289 | mkdir build-android && cd build-android 290 | ``` 291 | 292 | ```bash 293 | cmake .. -G Ninja \ 294 | -DCMAKE_TOOLCHAIN_FILE=$HOME/android-sdk/ndk/26.3.11579264/build/cmake/android.toolchain.cmake \ 295 | -DANDROID_ABI=arm64-v8a \ 296 | -DANDROID_PLATFORM=android-28 \ 297 | -DBUILD_SHARED_LIBS=OFF \ 298 | -DGGML_OPENCL=ON 299 | ``` 300 | 301 | ```bash 302 | ninja 303 | ``` 304 | 305 | Finally, copy the `dev` folder into the termux 306 | 307 | - Making sure you run `termux-setup-storage` on termux 308 | 309 | 310 | 311 | Now, the executive files is available under `llama.cpp/build-android/bin` 312 | 313 | Run `llama-bench` to test: 314 | 315 | ```bash 316 | ./llama-bench -m /your/model/directory -ngl 99 317 | ``` 318 | 319 | Result should looks like this: 320 | 321 | ``` 322 | ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)' 323 | ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) 830' 324 | ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.47.18.21 325 | ggml_opencl: vector subgroup broadcast support: true 326 | ggml_opencl: device FP16 support: true 327 | ggml_opencl: mem base addr align: 128 328 | ggml_opencl: max mem alloc size: 1024 MB 329 | ggml_opencl: SVM coarse grain buffer support: true 330 | ggml_opencl: SVM fine grain buffer support: true 331 | ggml_opencl: SVM fine grain system support: false 332 | ggml_opencl: SVM atomics support: true 333 | ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q) 334 | ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS) 335 | ... 336 | ``` 337 | 338 | 339 | 340 | ### Option 2: Building from Android Phone (Recommend) 341 | 342 | **Note: As of 04/27/2025, building from newest version of llama.cpp will cause segmentation fault. Please select a version before *b5028*. For more details, please visit: https://github.com/ggml-org/llama.cpp/issues/9289#issuecomment-2788276223** 343 | 344 | 345 | 346 | Clone llama.cpp source code: 347 | 348 | ```bash 349 | git clone https://github.com/ggml-org/llama.cpp.git 350 | ``` 351 | 352 | ```bash 353 | cd llama.cpp 354 | ``` 355 | 356 | If the segmentation fault is not fixed, then we need to switch to older version: 357 | 358 | ```bash 359 | git reset --hard b5026 360 | ``` 361 | 362 | Note: set `BUILD_SHARED_LIBS` to `ON` if you want to use `llama-cpp-python` later: 363 | 364 | ```bash 365 | cmake -B build-android \ 366 | -DBUILD_SHARED_LIBS=ON \ 367 | -DGGML_OPENCL=ON \ 368 | -DGGML_OPENCL_EMBED_KERNELS=ON \ 369 | -DGGML_OPENCL_USE_ADRENO_KERNELS=ON 370 | ``` 371 | 372 | Build the llama.cpp: 373 | 374 | ```bash 375 | cmake --build build-android --config Release 376 | ``` 377 | 378 | Same as previous section, test by: 379 | 380 | ```bash 381 | ./llama-bench -m /your/model/directory -ngl 99 382 | ``` 383 | 384 | 385 | 386 | ## `llama-cpp-python` with Adreno OpenCL Backend 387 | 388 | **Note: Making sure you have set `BUILD_SHARED_LIBS=ON` when building the `llama.cpp`** 389 | 390 | First, you need to set the eviromental variable: 391 | 392 | ```bash 393 | vim ~/.bashrc 394 | ``` 395 | 396 | Export the following path: 397 | 398 | ```bash 399 | # The path of libllama.so is under llama.cpp/build-android/bin 400 | export LLAMA_CPP_LIB_PATH=/home/.../llama.cpp/build-android/bin 401 | ``` 402 | 403 | ```bash 404 | source ~/.bashrc 405 | ``` 406 | 407 | Now, install llama-cpp-python with `LLAMA_BUILD=OFF` 408 | 409 | ```bash 410 | CMAKE_ARGS="-DLLAMA_BUILD=OFF" pip install llama-cpp-python --force-reinstall 411 | ``` 412 | 413 | To test it, run: 414 | 415 | ```bash 416 | python -c "from llama_cpp import Llama; llm = Llama(model_path="your/model/path", n_gpu_layers=99, verbose=True)" 417 | ``` 418 | 419 | If you return verbose information like this: 420 | ```bash 421 | ... 422 | load_tensors: offloading 28 repeating layers to GPU 423 | load_tensors: offloading output layer to GPU 424 | load_tensors: offloaded 29/29 layers to GPU 425 | load_tensors: CPU_Mapped model buffer size = 434.29 MiB 426 | load_tensors: OpenCL model buffer size = 1780.40 MiB 427 | ... 428 | ``` 429 | 430 | This means that `llama-cpp-python` is using the custom-built `llama.cpp` 431 | 432 | ## Testing Results 433 | 434 | Device: OnePlus 13 435 | 436 | | model | size | params | backend | ngl | test | t/s | 437 | | :-----------------: | :------: | :----: | :-----: | :--: | :---: | -------------- | 438 | | llama 3.2 3B Q4_0 | 1.78 GiB | 3.21 B | OpenCL | 99 | pp512 | 168.47 ± 0.98 | 439 | | llama 3.2 3B Q4_0 | 1.78 GiB | 3.21 B | OpenCL | 99 | tg128 | 17.57 ± 1.40 | 440 | | llama 3.2 3B Q4_0 | 1.78 GiB | 3.21 B | OpenCL | 99 | tg256 | 15.45 ± 0.59 | 441 | | llama 3.2 3B Q4_0 | 1.78 GiB | 3.21 B | OpenCL | 99 | tg512 | 13.60 ± 0.79 | 442 | | gemma 2 2B Q4_0 | 1.51 GiB | 2.61 B | OpenCL | 99 | pp512 | 187.26 ± 0.49 | 443 | | gemma 2 2B Q4_0 | 1.51 GiB | 2.61 B | OpenCL | 99 | tg128 | 8.11 ± 0.13 | 444 | | gemma 2 2B Q4_0 | 1.51 GiB | 2.61 B | OpenCL | 99 | tg256 | 8.06 ± 0.07 | 445 | | gemma 2 2B Q4_0 | 1.51 GiB | 2.61 B | OpenCL | 99 | tg512 | 8.02 ± 0.11 | 446 | | phi 3 3.5 mini Q4_0 | 2.03 GiB | 3.82 B | OpenCL | 99 | pp512 | 110.60 ± 11.96 | 447 | | phi 3 3.5 mini Q4_0 | 2.03 GiB | 3.82 B | OpenCL | 99 | tg128 | 16.76 ± 0.37 | 448 | | phi 3 3.5 mini Q4_0 | 2.03 GiB | 3.82 B | OpenCL | 99 | tg256 | 16.10 ± 0.53 | 449 | | phi 3 3.5 mini Q4_0 | 2.03 GiB | 3.82 B | OpenCL | 99 | tg512 | 14.89 ± 0.16 | 450 | --------------------------------------------------------------------------------