├── .github └── CODEOWNERS ├── star.jpeg ├── resources ├── Architectures.png └── Summary_of_on-device_LLMs_evolution.jpeg ├── LICENSE └── README.md /.github/CODEOWNERS: -------------------------------------------------------------------------------- 1 | @zhiyuan8 @alexchen4ai 2 | -------------------------------------------------------------------------------- /star.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NexaAI/Awesome-LLMs-on-device/HEAD/star.jpeg -------------------------------------------------------------------------------- /resources/Architectures.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NexaAI/Awesome-LLMs-on-device/HEAD/resources/Architectures.png -------------------------------------------------------------------------------- /resources/Summary_of_on-device_LLMs_evolution.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NexaAI/Awesome-LLMs-on-device/HEAD/resources/Summary_of_on-device_LLMs_evolution.jpeg -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Jiajun Xu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 🚀 Awesome LLMs on Device: A Must-Read Comprehensive Hub by Nexa AI 2 | 3 |
4 | 5 | [![Discord](https://dcbadge.limes.pink/api/server/thRu2HaK4D?style=flat&compact=true)](https://discord.gg/thRu2HaK4D) 6 | 7 | [On-device Model Hub](https://model-hub.nexa4ai.com/) / [Nexa SDK Documentation](https://docs.nexaai.com/) 8 | 9 | [release-url]: https://github.com/NexaAI/nexa-sdk/releases 10 | [Windows-image]: https://img.shields.io/badge/windows-0078D4?logo=windows 11 | [MacOS-image]: https://img.shields.io/badge/-MacOS-black?logo=apple 12 | [Linux-image]: https://img.shields.io/badge/-Linux-333?logo=ubuntu 13 | 14 |
15 | 16 | 17 | 18 |
19 | Summary of on-device LLMs’ evolution 20 |
Summary of On-device LLMs’ Evolution
21 |
22 | 23 | 24 | 25 | ## 🌟 About This Hub 26 | Welcome to the ultimate hub for on-device Large Language Models (LLMs)! This repository is your go-to resource for all things related to LLMs designed for on-device deployment. Whether you're a seasoned researcher, an innovative developer, or an enthusiastic learner, this comprehensive collection of cutting-edge knowledge is your gateway to understanding, leveraging, and contributing to the exciting world of on-device LLMs. 27 | 28 | ## 🚀 Why This Hub is a Must-Read 29 | - 📊 Comprehensive overview of on-device LLM evolution with easy-to-understand visualizations 30 | - 🧠 In-depth analysis of groundbreaking architectures and optimization techniques 31 | - 📱 Curated list of state-of-the-art models and frameworks ready for on-device deployment 32 | - 💡 Practical examples and case studies to inspire your next project 33 | - 🔄 Regular updates to keep you at the forefront of rapid advancements in the field 34 | - 🤝 Active community of researchers and practitioners sharing insights and experiences 35 | 36 | 37 | 38 | # 📚 What's Inside Our Hub 39 | - [Awesome LLMs on Device: A Comprehensive Survey](#-awesome-llms-on-device-a-must-read-comprehensive-hub) 40 | - [Contents](-whats-inside-our-hub) 41 | - [Foundations and Preliminaries](#foundations-and-preliminaries) 42 | - [Evolution of On-Device LLMs](#evolution-of-on-device-llms) 43 | - [LLM Architecture Foundations](#llm-architecture-foundations) 44 | - [On-Device LLMs Training](#on-device-llms-training) 45 | - [Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference](#limitations-of-cloud-based-llm-inference-and-advantages-of-on-device-inference) 46 | - [The Performance Indicator of On-Device LLMs](#the-performance-indicator-of-on-device-llms) 47 | - [Efficient Architectures for On-Device LLMs](#efficient-architectures-for-on-device-llms) 48 | - [Model Compression and Parameter Sharing](#model-compression-and-parameter-sharing) 49 | - [Collaborative and Hierarchical Model Approaches](#collaborative-and-hierarchical-model-approaches) 50 | - [Memory and Computational Efficiency](#memory-and-computational-efficiency) 51 | - [Mixture-of-Experts (MoE) Architectures](#mixture-of-experts-moe-architectures) 52 | - [Hybrid Architectures](#hybrid-architectures) 53 | - [General Efficiency and Performance Improvements](#general-efficiency-and-performance-improvements) 54 | - [Model Compression and Optimization Techniques for On-Device LLMs](#model-compression-and-optimization-techniques-for-on-device-llms) 55 | - [Quantization](#quantization) 56 | - [Pruning](#pruning) 57 | - [Knowledge Distillation](#knowledge-distillation) 58 | - [Low-Rank Factorization](#low-rank-factorization) 59 | - [Hardware Acceleration and Deployment Strategies](#hardware-acceleration-and-deployment-strategies) 60 | - [Popular On-Device LLMs Framework](#popular-on-device-llms-framework) 61 | - [Hardware Acceleration](#hardware-acceleration) 62 | - [Applications](#applications) 63 | - [Tutorials and Learning Resources](#tutorials-and-learning-resources) 64 | - [Citation](#-cite-our-work) 65 | 66 | ## Foundations and Preliminaries 67 | 68 | ### Evolution of On-Device LLMs 69 | 70 | - Tinyllama: An open-source small language model
arXiv 2024 [[Paper]](https://arxiv.org/abs/2401.02385) [[Github]](https://github.com/jzhang38/TinyLlama) 71 | - MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
arXiv 2024 [[Paper]](https://arxiv.org/abs/2402.03766) [[Github]](https://github.com/Meituan-AutoML/MobileVLM) 72 | - MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
arXiv 2024 [[Paper]](https://arxiv.org/abs/2406.10290) 73 | - Octopus series papers
arXiv 2024 [[Octopus]](https://arxiv.org/abs/2404.01549) [[Octopus v2]](https://arxiv.org/abs/2404.01744) [[Octopus v3]](https://arxiv.org/abs/2404.11459) [[Octopus v4]](https://arxiv.org/abs/2404.19296) [[Github]](https://github.com/NexaAI) 74 | - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
arXiv 2024 [[Paper]](https://arxiv.org/abs/2402.17764) 75 | - AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
arXiv 2023 [[Paper]](https://arxiv.org/abs/2306.00978) [[Github]](https://github.com/mit-han-lab/llm-awq) 76 | - Small Language Models: Survey, Measurements, and Insights
arXiv 2024 [[Paper]](https://arxiv.org/pdf/2409.15790) 77 | 78 | 79 | ### LLM Architecture Foundations 80 | 81 | - The case for 4-bit precision: k-bit inference scaling laws
ICML 2023 [[Paper]](https://arxiv.org/abs/2212.09720) 82 | - Challenges and applications of large language models
arXiv 2023 [[Paper]](https://arxiv.org/abs/2307.10169) 83 | - MiniLLM: Knowledge distillation of large language models
ICLR 2023 [[Paper]](https://arxiv.org/abs/2306.08543) [[github]](https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs) 84 | - Gptq: Accurate post-training quantization for generative pre-trained transformers
ICLR 2023 [[Paper]](https://arxiv.org/abs/2210.17323) [[Github]](https://github.com/IST-DASLab/gptq) 85 | - Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale
NeurIPS 2022 [[Paper]](https://arxiv.org/abs/2208.07339) 86 | 87 | ### On-Device LLMs Training 88 | 89 | - OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
ICML 2024 [[Paper]](https://arxiv.org/abs/2404.14619) [[Github]](https://github.com/apple/corenet) 90 | 91 | ### Limitations of Cloud-Based LLM Inference and Advantages of On-Device Inference 92 | 93 | - Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
arXiv 2024 [[Paper]](https://arxiv.org/abs/2404.07973) 94 | - Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
arXiv 2024 [[Paper]](https://arxiv.org/abs/2404.14219) 95 | - Exploring post-training quantization in llms from comprehensive study to low rank compensation
AAAI 2024 [[Paper]](https://arxiv.org/abs/2303.08302) 96 | - Matrix compression via randomized low rank and low precision factorization
NeurIPS 2023 [[Paper]](https://arxiv.org/abs/2310.11028) [[Github]](https://github.com/pilancilab/matrix-compressor) 97 | 98 | ### The Performance Indicator of On-Device LLMs 99 | 100 | - MNN: A lightweight deep neural network inference engine
2024 [[Github]](https://github.com/alibaba/MNN) 101 | - PowerInfer-2: Fast Large Language Model Inference on a Smartphone
arXiv 2024 [[Paper]](https://arxiv.org/abs/2406.06282) [[Github]](https://github.com/SJTU-IPADS/PowerInfer) 102 | - llama.cpp: Lightweight library for Approximate Nearest Neighbors and Maximum Inner Product Search
2023 [[Github]](https://github.com/ggerganov/llama.cpp) 103 | - Powerinfer: Fast large language model serving with a consumer-grade gpu
arXiv 2023 [[Paper]](https://arxiv.org/abs/2312.12456) [[Github]](https://github.com/SJTU-IPADS/PowerInfer) 104 | 105 | ## Efficient Architectures for On-Device LLMs 106 | 107 | | Model | Performance | Computational Efficiency | Memory Requirements | 108 | |---------------------------------|-----------------------------------------------------|----------------------------------------------------------------------------|-------------------------------------------------------------------| 109 | | **[MobileLLM](https://arxiv.org/abs/2402.14905)** | High accuracy, optimized for sub-billion parameter models | Embedding sharing, grouped-query attention | Reduced model size due to deep and thin structures | 110 | | **[EdgeShard](https://arxiv.org/abs/2405.14371)** | Up to 50% latency reduction, 2× throughput improvement | Collaborative edge-cloud computing, optimal shard placement | Distributed model components reduce individual device load | 111 | | **[LLMCad](https://arxiv.org/abs/2309.04255)** | Up to 9.3× speedup in token generation | Generate-then-verify, token tree generation | Smaller LLM for token generation, larger LLM for verification | 112 | | **[Any-Precision LLM](https://arxiv.org/abs/2402.10517)** | Supports multiple precisions efficiently | Post-training quantization, memory-efficient design | Substantial memory savings with versatile model precisions | 113 | | **[Breakthrough Memory](https://ieeexplore.ieee.org/abstract/document/10477465)** | Up to 4.5× performance improvement | PIM and PNM technologies enhance memory processing | Enhanced memory bandwidth and capacity | 114 | | **[MELTing Point](https://arxiv.org/abs/2403.12844)** | Provides systematic performance evaluation | Analyzes impacts of quantization, efficient model evaluation | Evaluates memory and computational efficiency trade-offs | 115 | | **[LLMaaS on device](https://arxiv.org/abs/2403.11805)** | Reduces context switching latency significantly | Stateful execution, fine-grained KV cache compression | Efficient memory management with tolerance-aware compression and swapping | 116 | | **[LocMoE](https://arxiv.org/abs/2401.13920)** | Reduces training time per epoch by up to 22.24% | Orthogonal gating weights, locality-based expert regularization | Minimizes communication overhead with group-wise All-to-All and recompute pipeline | 117 | | **[EdgeMoE](https://arxiv.org/abs/2308.14352)** | Significant performance improvements on edge devices | Expert-wise bitwidth adaptation, preloading experts | Efficient memory management through expert-by-expert computation reordering | 118 | |**[JetMoE](https://arxiv.org/abs/2404.07413)**| Outperforms Llama27B and 13B-Chat with fewer parameters | Reduces inference computation by 70% using sparse activation | 8B total parameters, only 2B activated per input token | 119 | |**[Pangu-$`\pi`$ Pro](https://arxiv.org/abs/2402.02791)**| Neural architecture, parameter initialization, and optimization strategy for billion-level parameter models | Embedding sharing, tokenizer compression | Reduced model size via architecture tweaking | 120 | |**[Zamba2](https://www.zyphra.com/post/zamba2-small)**| 2x faster time-to-first-token, a 27% reduction in memory overhead, and a 1.29x lower generation latency compared to Phi3-3.8B. | Hybrid Mamba2/Attention architecture and shared transformer block | 2.7B parameters, fewer KV-states due to reduced attention | 121 | 122 | 123 | ### Model Compression and Parameter Sharing 124 | 125 | - AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
arXiv 2024 [[Paper]](https://arxiv.org/abs/2306.00978) [[Github]](https://github.com/mit-han-lab/llm-awq) 126 | - MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
arXiv 2024 [[Paper]](https://arxiv.org/abs/2402.14905) [[Github]](https://github.com/facebookresearch/MobileLLM) 127 | 128 | ### Collaborative and Hierarchical Model Approaches 129 | 130 | - EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
arXiv 2024 [[Paper]](https://arxiv.org/abs/2405.14371) 131 | - Llmcad: Fast and scalable on-device large language model inference
arXiv 2023 [[Paper]](https://arxiv.org/abs/2309.04255) 132 | 133 | ### Memory and Computational Efficiency 134 | 135 | - The Breakthrough Memory Solutions for Improved Performance on LLM Inference
IEEE Micro 2024 [[Paper]](https://ieeexplore.ieee.org/document/10477465) 136 | - MELTing point: Mobile Evaluation of Language Transformers
arXiv 2024 [[Paper]](https://arxiv.org/abs/2403.12844) [[Github]](https://github.com/brave-experiments/MELT-public) 137 | 138 | ### Mixture-of-Experts (MoE) Architectures 139 | 140 | - LLM as a system service on mobile devices
arXiv 2024 [[Paper]](https://arxiv.org/abs/2403.11805) 141 | - Locmoe: A low-overhead moe for large language model training
arXiv 2024 [[Paper]](https://arxiv.org/abs/2401.13920) 142 | - Edgemoe: Fast on-device inference of moe-based large language models
arXiv 2023 [[Paper]](https://arxiv.org/abs/2308.14352) 143 | 144 | ### Hybrid Architectures 145 | 146 | - Zamba2: Hybrid Mamba2 and attention models for on-device
2024 [[Zamba2-2.7B]](https://www.zyphra.com/post/zamba2-small) [[Zamba2-1.2B]](https://www.zyphra.com/post/zamba2-mini) 147 | 148 | ### General Efficiency and Performance Improvements 149 | 150 | - Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
arXiv 2024 [[Paper]](https://www.arxiv.org/pdf/2402.10517) [[Github]](https://github.com/SNU-ARC/any-precision-llm) 151 | - On the viability of using llms for sw/hw co-design: An example in designing cim dnn accelerators
IEEE SOCC 2023 [[Paper]](https://arxiv.org/abs/2306.06923) 152 | 153 | ## Model Compression and Optimization Techniques for On-Device LLMs 154 | 155 | ### Quantization 156 | 157 | - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
arXiv 2024 [[Paper]](https://arxiv.org/abs/2402.17764) 158 | - AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
arXiv 2024 [[Paper]](https://arxiv.org/abs/2306.00978) [[Github]](https://github.com/mit-han-lab/llm-awq) 159 | - Gptq: Accurate post-training quantization for generative pre-trained transformers
ICLR 2023 [[Paper]](https://arxiv.org/abs/2210.17323) [[Github]](https://github.com/IST-DASLab/gptq) 160 | - Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale
NeurIPS 2022 [[Paper]](https://arxiv.org/abs/2208.07339) 161 | 162 | ### Pruning 163 | 164 | - Challenges and applications of large language models
arXiv 2023 [[Paper]](https://arxiv.org/abs/2307.10169) 165 | 166 | ### Knowledge Distillation 167 | 168 | - MiniLLM: Knowledge distillation of large language models
ICLR 2024 [[Paper]](https://arxiv.org/abs/2306.08543) 169 | 170 | ### Low-Rank Factorization 171 | 172 | - Exploring post-training quantization in llms from comprehensive study to low rank compensation
AAAI 2024 [[Paper]](https://arxiv.org/abs/2303.08302) 173 | - Matrix compression via randomized low rank and low precision factorization
NeurIPS 2023 [[Paper]](https://arxiv.org/abs/2310.11028) [[Github]](https://github.com/pilancilab/matrix-compressor) 174 | 175 | ## Hardware Acceleration and Deployment Strategies 176 | 177 | ### Popular On-Device LLMs Framework 178 | 179 | - llama.cpp: A lightweight library for efficient LLM inference on various hardware with minimal setup. [[Github]](https://github.com/ggerganov/llama.cpp) 180 | - MNN: A blazing fast, lightweight deep learning framework. [[Github]](https://github.com/alibaba/MNN) 181 | - PowerInfer: A CPU/GPU LLM inference engine leveraging activation locality for device. [[Github]](https://github.com/SJTU-IPADS/PowerInfer) 182 | - ExecuTorch: A platform for On-device AI across mobile, embedded and edge for PyTorch. [[Github]](https://github.com/pytorch/executorch) 183 | - MediaPipe: A suite of tools and libraries, enables quick application of AI and ML techniques. [[Github]](https://github.com/google-ai-edge/mediapipe) 184 | - MLC-LLM: A machine learning compiler and high-performance deployment engine for large language models. [[Github]](https://github.com/mlc-ai/mlc-llm) 185 | - VLLM: A fast and easy-to-use library for LLM inference and serving. [[Github]](https://github.com/vllm-project/vllm) 186 | - OpenLLM: An open platform for operating large language models (LLMs) in production. [[Github]](https://python.langchain.com/v0.2/docs/integrations/llms/openllm/) 187 | - mllm: Fast and lightweight multimodal LLM inference engine for mobile and edge devices. [[Github]](https://github.com/UbiquitousLearning/mllm) 188 | 189 | 190 | ### Hardware Acceleration 191 | 192 | - The Breakthrough Memory Solutions for Improved Performance on LLM Inference
IEEE Micro 2024 [[Paper]](https://ieeexplore.ieee.org/document/10477465) 193 | - Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond
IEEE Hot Chips 2021 [[Paper]](https://ieeexplore.ieee.org/abstract/document/9567191) 194 | 195 | ## Applications 196 | - Text Generating For Messaging: [Gboard smart reply](https://developer.android.com/ai/aicore#gboard-smart) 197 | - Translation: [LLMCad](https://arxiv.org/abs/2309.04255) 198 | - Meeting Summarizing 199 | - Healthcare application: [BioMistral-7B](https://arxiv.org/abs/2402.10373), [HuatuoGPT](https://arxiv.org/abs/2311.09774) 200 | - Research Support 201 | - Companion Robot 202 | - Disability Support: [Octopus v3](https://arxiv.org/abs/2404.11459), [Talkback with Gemini Nano](https://store.google.com/intl/en/ideas/articles/gemini-nano-google-pixel/) 203 | - Autonomous Vehicles: [DriveVLM](https://arxiv.org/abs/2402.12289) 204 | 205 | ## Model Reference 206 | 207 | | Model | Institute | Paper | 208 | | :-------------------: | :-----------------: | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 209 | | Gemini Nano | Google | [Gemini: A Family of Highly Capable Multimodal Models](https://arxiv.org/pdf/2312.11805.pdf) | 210 | | Octopus series model | Nexa AI | [Octopus v2: On-device language model for super agent](https://arxiv.org/pdf/2404.01744.pdf)
[Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent](https://arxiv.org/pdf/2404.11459.pdf)
[Octopus v4: Graph of language models](https://arxiv.org/pdf/2404.19296.pdf)
[Octopus: On-device language model for function calling of software APIs](https://arxiv.org/pdf/2404.01549.pdf) | 211 | | OpenELM and Ferret-v2 | Apple | [OpenELM is a significant large language model integrated within iOS to enhance application functionalities.](https://arxiv.org/abs/2404.14619)
[Ferret-v2 significantly improves upon its predecessor, introducing enhanced visual processing capabilities and an advanced training regimen.](https://arxiv.org/abs/2404.07973) | 212 | | Phi series | Microsoft | [Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone](https://arxiv.org/pdf/2404.14219.pdf) | 213 | | MiniCPM | Tsinghua University | [A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone](https://huggingface.co/openbmb/MiniCPM-V-2_6) | 214 | | Gemma2-9B | Google | [Gemma 2: Improving Open Language Models at a Practical Size](https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf) | 215 | | Qwen2-0.5B | Alibaba Group | [Qwen Technical Report](https://arxiv.org/pdf/2309.16609.pdf) | 216 | | GLM-Edge | THUDM | [GLM-Edge Github Page](https://github.com/THUDM/GLM-Edge) | 217 | 218 | ## Tutorials and Learning Resources 219 | 220 | - MIT: [TinyML and Efficient Deep Learning Computing](https://efficientml.ai) 221 | - Harvard: [Machine Learning Systems](https://mlsysbook.ai/) 222 | - Deep Learning AI : [Introduction to on-device AI](https://www.deeplearning.ai/short-courses/introduction-to-on-device-ai/) 223 | 224 | # 🤝 Join the On-Device LLM Revolution 225 | 226 | We believe in the power of community! If you're passionate about on-device AI and want to contribute to this ever-growing knowledge hub, here's how you can get involved: 227 | 1. Fork the repository 228 | 2. Create a new branch for your brilliant additions 229 | 3. Make your updates and push your changes 230 | 4. Submit a pull request and become part of the on-device LLM movement 231 | 232 | # ⭐ Star History ⭐ 233 | 234 | [![Star History Chart](https://api.star-history.com/svg?repos=NexaAI/Awesome-LLMs-on-device&type=Timeline)](https://star-history.com/#NexaAI/Awesome-LLMs-on-device&Timeline) 235 | 236 | # 📖 Cite Our Work 237 | If our hub fuels your research or powers your projects, we'd be thrilled if you could cite our paper [here](https://arxiv.org/abs/2409.00088): 238 | 239 | ```bibtex 240 | @article{xu2024device, 241 | title={On-Device Language Models: A Comprehensive Review}, 242 | author={Xu, Jiajun and Li, Zhiyuan and Chen, Wei and Wang, Qun and Gao, Xin and Cai, Qi and Ling, Ziyuan}, 243 | journal={arXiv preprint arXiv:2409.00088}, 244 | year={2024} 245 | } 246 | ``` 247 | 248 | # 📄 License 249 | 250 | This project is open-source and available under the MIT License. See the [LICENSE](LICENSE) file for more details. 251 | 252 | Don't just read about the future of AI – be part of it. Star this repo, spread the word, and let's push the boundaries of on-device LLMs together! 🚀🌟 253 | --------------------------------------------------------------------------------