└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # Awesome Small Language Models
  2 | 
  3 | A curated list of awesome resources, tools, and projects related to small language models. This list focuses on modern, efficient language models designed for various applications, from research to production deployment.
  4 | 
  5 | ## Table of Contents
  6 | - [Awesome Small Language Models](#awesome-small-language-models)
  7 |   - [Table of Contents](#table-of-contents)
  8 |   - [Some famous Small Language Models](#some-famous-small-language-models)
  9 |   - [Frameworks and Tools](#frameworks-and-tools)
 10 |   - [Fine-tuning Techniques](#fine-tuning-techniques)
 11 |     - [Fine-tuning Guide](#fine-tuning-guide)
 12 |   - [Hardware Requirements](#hardware-requirements)
 13 |   - [Inference Optimization](#inference-optimization)
 14 |   - [Applications and Use Cases](#applications-and-use-cases)
 15 |   - [Research Papers and Articles](#research-papers-and-articles)
 16 |   - [Tutorials and Guides](#tutorials-and-guides)
 17 |   - [Community Projects](#community-projects)
 18 |   - [Contributing](#contributing)
 19 |   - [License](#license)
 20 | 
 21 | ## Some famous Small Language Models
 22 | 
 23 | - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) - A fine-tuned version of LLaMA, optimized for instruction following
 24 | - [Vicuna](https://github.com/lm-sys/FastChat) - An open-source chatbot trained by fine-tuning LLaMA
 25 | - [FLAN-T5 Small](https://huggingface.co/google/flan-t5-small) - A smaller version of the FLAN-T5 model
 26 | - [DistilGPT2](https://huggingface.co/distilgpt2) - A distilled version of GPT-2
 27 | - [BERT-Mini](https://huggingface.co/prajjwal1/bert-mini) - A smaller BERT model with 4 layers
 28 | 
 29 | ## Frameworks and Tools
 30 | 
 31 | - [Hugging Face Transformers](https://github.com/huggingface/transformers) - State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
 32 | - [Peft](https://github.com/huggingface/peft) - Parameter-Efficient Fine-Tuning (PEFT) methods
 33 | - [Periflow](https://github.com/periflow/periflow) - A framework for deploying large language models
 34 | - [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) - 8-bit CUDA functions for PyTorch
 35 | - [TensorFlow Lite](https://www.tensorflow.org/lite) - A set of tools to help developers run TensorFlow models on mobile, embedded, and IoT devices
 36 | - [ONNX Runtime](https://github.com/microsoft/onnxruntime) - Cross-platform, high performance ML inferencing and training accelerator
 37 | 
 38 | ## Fine-tuning Techniques
 39 | 
 40 | - [LoRA (Low-Rank Adaptation)](https://arxiv.org/abs/2106.09685): Efficient fine-tuning method that significantly reduces the number of trainable parameters
 41 | - [QLoRA](https://arxiv.org/abs/2305.14314): Quantized Low-Rank Adaptation for even more efficient fine-tuning
 42 | - [P-tuning v2](https://arxiv.org/abs/2110.07602): Prompt tuning method for adapting pre-trained language models
 43 | - [Adapter Tuning](https://arxiv.org/abs/1902.00751): Adding small trainable modules to frozen pre-trained models
 44 | 
 45 | ### Fine-tuning Guide
 46 | 
 47 | 1. Choose a base model (e.g., FLAN-T5 Small, DistilGPT2)
 48 | 2. Prepare your dataset for the specific task
 49 | 3. Select a fine-tuning technique (e.g., LoRA, QLoRA)
 50 | 4. Use Hugging Face's Transformers and Peft libraries for implementation
 51 | 5. Train on your data, monitoring for overfitting
 52 | 6. Evaluate the fine-tuned model on a test set
 53 | 7. Optimize for inference (quantization, pruning, etc.)
 54 | 
 55 | ## Hardware Requirements
 56 | 
 57 | RAM requirements vary based on model size and fine-tuning technique:
 58 | 
 59 | - Small models (e.g., BERT-Mini, DistilGPT2): 4-8 GB RAM
 60 | - Medium models (e.g., FLAN-T5 Small): 8-16 GB RAM
 61 | - Larger models with efficient fine-tuning (e.g., Alpaca with LoRA): 16-32 GB RAM
 62 | 
 63 | For training, GPU memory requirements are typically higher. Using techniques like LoRA or QLoRA can significantly reduce memory needs.
 64 | 
 65 | ## Inference Optimization
 66 | 
 67 | - Quantization: Reducing model precision (e.g., INT8, FP16)
 68 | - Pruning: Removing unnecessary weights
 69 | - Knowledge Distillation: Training a smaller model to mimic a larger one
 70 | - Caching: Storing intermediate results for faster inference
 71 | - Frameworks for optimization:
 72 |   - [ONNX Runtime](https://github.com/microsoft/onnxruntime)
 73 |   - [TensorRT](https://developer.nvidia.com/tensorrt)
 74 |   - [OpenVINO](https://github.com/openvinotoolkit/openvino)
 75 | 
 76 | ## Applications and Use Cases
 77 | 
 78 | - On-device natural language processing
 79 | - Chatbots and conversational AI
 80 | - Text summarization and generation
 81 | - Sentiment analysis
 82 | - Named Entity Recognition (NER)
 83 | - Question Answering systems
 84 | 
 85 | ## Research Papers and Articles
 86 | 
 87 | - [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
 88 | - [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
 89 | - [P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/abs/2110.07602)
 90 | - [Alpaca: A Strong, Replicable Instruction-Following Model](https://crfm.stanford.edu/2023/03/13/alpaca.html)
 91 | 
 92 | ## Tutorials and Guides
 93 | 
 94 | - [Fine-tuning with LoRA using Hugging Face Transformers](https://huggingface.co/blog/lora)
 95 | - [Quantization for Transformers with ONNX Runtime](https://huggingface.co/blog/onnx-quantize-transformers)
 96 | - [Deploying Hugging Face Models on CPU with ONNX Runtime](https://huggingface.co/blog/onnx-runtime-inference)
 97 | - [Optimizing Inference with TensorFlow Lite](https://www.tensorflow.org/lite/performance/best_practices)
 98 | 
 99 | ## Community Projects
100 | 
101 | - [Add your awesome community projects here!]
102 | 
103 | 
104 | ## Contributing
105 | 
106 | Your contributions are always welcome! Please read the contribution guidelines first.
107 | 
108 | ## License
109 | 
110 | This awesome list is under the MIT License.


--------------------------------------------------------------------------------