└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Awesome Small Language Models 2 | 3 | A curated list of awesome resources, tools, and projects related to small language models. This list focuses on modern, efficient language models designed for various applications, from research to production deployment. 4 | 5 | ## Table of Contents 6 | - [Awesome Small Language Models](#awesome-small-language-models) 7 | - [Table of Contents](#table-of-contents) 8 | - [Some famous Small Language Models](#some-famous-small-language-models) 9 | - [Frameworks and Tools](#frameworks-and-tools) 10 | - [Fine-tuning Techniques](#fine-tuning-techniques) 11 | - [Fine-tuning Guide](#fine-tuning-guide) 12 | - [Hardware Requirements](#hardware-requirements) 13 | - [Inference Optimization](#inference-optimization) 14 | - [Applications and Use Cases](#applications-and-use-cases) 15 | - [Research Papers and Articles](#research-papers-and-articles) 16 | - [Tutorials and Guides](#tutorials-and-guides) 17 | - [Community Projects](#community-projects) 18 | - [Contributing](#contributing) 19 | - [License](#license) 20 | 21 | ## Some famous Small Language Models 22 | 23 | - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) - A fine-tuned version of LLaMA, optimized for instruction following 24 | - [Vicuna](https://github.com/lm-sys/FastChat) - An open-source chatbot trained by fine-tuning LLaMA 25 | - [FLAN-T5 Small](https://huggingface.co/google/flan-t5-small) - A smaller version of the FLAN-T5 model 26 | - [DistilGPT2](https://huggingface.co/distilgpt2) - A distilled version of GPT-2 27 | - [BERT-Mini](https://huggingface.co/prajjwal1/bert-mini) - A smaller BERT model with 4 layers 28 | 29 | ## Frameworks and Tools 30 | 31 | - [Hugging Face Transformers](https://github.com/huggingface/transformers) - State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 32 | - [Peft](https://github.com/huggingface/peft) - Parameter-Efficient Fine-Tuning (PEFT) methods 33 | - [Periflow](https://github.com/periflow/periflow) - A framework for deploying large language models 34 | - [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) - 8-bit CUDA functions for PyTorch 35 | - [TensorFlow Lite](https://www.tensorflow.org/lite) - A set of tools to help developers run TensorFlow models on mobile, embedded, and IoT devices 36 | - [ONNX Runtime](https://github.com/microsoft/onnxruntime) - Cross-platform, high performance ML inferencing and training accelerator 37 | 38 | ## Fine-tuning Techniques 39 | 40 | - [LoRA (Low-Rank Adaptation)](https://arxiv.org/abs/2106.09685): Efficient fine-tuning method that significantly reduces the number of trainable parameters 41 | - [QLoRA](https://arxiv.org/abs/2305.14314): Quantized Low-Rank Adaptation for even more efficient fine-tuning 42 | - [P-tuning v2](https://arxiv.org/abs/2110.07602): Prompt tuning method for adapting pre-trained language models 43 | - [Adapter Tuning](https://arxiv.org/abs/1902.00751): Adding small trainable modules to frozen pre-trained models 44 | 45 | ### Fine-tuning Guide 46 | 47 | 1. Choose a base model (e.g., FLAN-T5 Small, DistilGPT2) 48 | 2. Prepare your dataset for the specific task 49 | 3. Select a fine-tuning technique (e.g., LoRA, QLoRA) 50 | 4. Use Hugging Face's Transformers and Peft libraries for implementation 51 | 5. Train on your data, monitoring for overfitting 52 | 6. Evaluate the fine-tuned model on a test set 53 | 7. Optimize for inference (quantization, pruning, etc.) 54 | 55 | ## Hardware Requirements 56 | 57 | RAM requirements vary based on model size and fine-tuning technique: 58 | 59 | - Small models (e.g., BERT-Mini, DistilGPT2): 4-8 GB RAM 60 | - Medium models (e.g., FLAN-T5 Small): 8-16 GB RAM 61 | - Larger models with efficient fine-tuning (e.g., Alpaca with LoRA): 16-32 GB RAM 62 | 63 | For training, GPU memory requirements are typically higher. Using techniques like LoRA or QLoRA can significantly reduce memory needs. 64 | 65 | ## Inference Optimization 66 | 67 | - Quantization: Reducing model precision (e.g., INT8, FP16) 68 | - Pruning: Removing unnecessary weights 69 | - Knowledge Distillation: Training a smaller model to mimic a larger one 70 | - Caching: Storing intermediate results for faster inference 71 | - Frameworks for optimization: 72 | - [ONNX Runtime](https://github.com/microsoft/onnxruntime) 73 | - [TensorRT](https://developer.nvidia.com/tensorrt) 74 | - [OpenVINO](https://github.com/openvinotoolkit/openvino) 75 | 76 | ## Applications and Use Cases 77 | 78 | - On-device natural language processing 79 | - Chatbots and conversational AI 80 | - Text summarization and generation 81 | - Sentiment analysis 82 | - Named Entity Recognition (NER) 83 | - Question Answering systems 84 | 85 | ## Research Papers and Articles 86 | 87 | - [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) 88 | - [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314) 89 | - [P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/abs/2110.07602) 90 | - [Alpaca: A Strong, Replicable Instruction-Following Model](https://crfm.stanford.edu/2023/03/13/alpaca.html) 91 | 92 | ## Tutorials and Guides 93 | 94 | - [Fine-tuning with LoRA using Hugging Face Transformers](https://huggingface.co/blog/lora) 95 | - [Quantization for Transformers with ONNX Runtime](https://huggingface.co/blog/onnx-quantize-transformers) 96 | - [Deploying Hugging Face Models on CPU with ONNX Runtime](https://huggingface.co/blog/onnx-runtime-inference) 97 | - [Optimizing Inference with TensorFlow Lite](https://www.tensorflow.org/lite/performance/best_practices) 98 | 99 | ## Community Projects 100 | 101 | - [Add your awesome community projects here!] 102 | 103 | 104 | ## Contributing 105 | 106 | Your contributions are always welcome! Please read the contribution guidelines first. 107 | 108 | ## License 109 | 110 | This awesome list is under the MIT License. --------------------------------------------------------------------------------