├── requirements.txt ├── readme.md └── quantize.py /requirements.txt: -------------------------------------------------------------------------------- 1 | --extra-index-url https://download.pytorch.org/whl/cu118 2 | --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ 3 | torch==2.0.1 4 | auto-gptq 5 | transformers 6 | sentencepiece 7 | protobuf 8 | optimum -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # transformers-gptq-quant 2 | 3 | A Python-based utility for quantizing GPT models using the Hugging Face 'transformers' library. 4 | 5 | ## Description 6 | 7 | This repository provides a Python script to quantize models with the Hugging Face 'transformers' and AutoGPTQ for 4 or 8 bit. Quantization can reduce the memory requirements of models, which may be beneficial for deployment on resource-constrained devices. 8 | 9 | ## Features 10 | 11 | - Supports quantizing various GPTQ precisions (8bit and 4bit). 12 | - Allows for command-line or direct script useage. 13 | - Compatible with a range of models and datasets. 14 | 15 | ## Prerequisites 16 | 17 | To utilize this utility, ensure that you've installed the dependencies listed in the `requirements.txt` file. You can install these using pip: 18 | 19 | ```bash 20 | pip install -r requirements.txt 21 | ``` 22 | 23 | ## Usage 24 | You can run the script in two ways: 25 | 26 | 1. By setting hardcoded values for the parameters at the top of the script under 'Hardcoded default values'. 27 | 2. By providing command-line arguments to override the hardcoded values. 28 | 29 | # Parameters: 30 | - model_id: The huggingface repository or local directory containing a HF model. Example: 'teknium/openhermes', './hermes2', etc. It can be a folder or a huggingface directory. 31 | - bits: The number of bits to use for quantization. Options: 4 and 8 - Default: `4` 32 | - dataset: The name of the dataset to use. Example: 'wikitext2', 'c4', etc. - Default: `wikitext2` 33 | - group_size: The size of the group for quantization. Usually a power of 2. - Default: `128` 34 | - device_map: Device mapping configuration for loading the model. Example: 'auto', 'cpu', 'cuda:0', etc. - Default `"auto"` 35 | 36 | ## Running the script 37 | 38 | 1. **Using hardcoded values**: 39 | - Edit the values of `MODEL_ID`, `BITS`, `DATASET`, `GROUP_SIZE`, and `DEVICE_MAP` at the top of the script. 40 | - Run the script without additional command-line arguments. 41 | 42 | 2. **Using command-line arguments**: 43 | - Use the following syntax: 44 | ```bash 45 | python quantize.py --model_id 'teknium/openhermes' --bits 8 --dataset 'wikitext2' --group_size 32 --device_map 'auto' 46 | ``` 47 | 48 | **Note**: Command-line arguments will take precedence over hardcoded values. 49 | 50 | ## Requirements 51 | 52 | Ensure you have the necessary dependencies by checking the `requirements.txt` in this repo. Install them via: 53 | 54 | ```bash 55 | pip install -r requirements.txt 56 | ``` 57 | 58 | ## Contributing 59 | 60 | Feel free to contribute to this project by creating issues or submitting pull requests. Ensure that your contributions are in line with the project's aim and structure. 61 | 62 | ## License 63 | 64 | This project is open source, under the MIT license. 65 | -------------------------------------------------------------------------------- /quantize.py: -------------------------------------------------------------------------------- 1 | """ 2 | README: 3 | 4 | This Python script is designed to quantize a GPT model using the Hugging Face 'transformers' library. 5 | You can run this script in two ways: 6 | 1. By setting hardcoded values for the parameters at the top of this script under 'Hardcoded default values'. 7 | 2. By providing command-line arguments to override the hardcoded values. 8 | 9 | Parameters: 10 | 11 | - model_id: The identifier of the pre-trained GPT model. Example: 'gpt2', 'gpt2-medium', etc. 12 | - bits: The number of bits to use for quantization. Example: 8, 16, etc. 13 | - dataset: The name of the dataset to use. Example: 'wikitext2', 'wikitext103', etc. 14 | - group_size: The size of the group for quantization. Usually a power of 2. 15 | - device_map: Device mapping configuration for loading the model. Example: 'auto', 'cpu', 'cuda:0', etc. 16 | 17 | Usage: 18 | 19 | 1. To use hardcoded values: 20 | - Edit the values of MODEL_ID, BITS, DATASET, GROUP_SIZE, and DEVICE_MAP at the top of this script. 21 | - Run the script. 22 | 23 | 2. To use command-line arguments: 24 | - Use the following syntax: 25 | python script_name.py --model_id 'gpt2' --bits 8 --dataset 'wikitext2' --group_size 32 --device_map 'auto' 26 | 27 | Note: Command-line arguments will take precedence over hardcoded values. 28 | """ 29 | 30 | import argparse 31 | import auto_gptq 32 | from transformers import GPTQConfig, AutoModelForCausalLM, AutoTokenizer 33 | 34 | # Hardcoded default values 35 | MODEL_ID = "teknium/OpenHermes-2-Mistral-7B" 36 | BITS = 4 37 | DATASET = "wikitext2" 38 | GROUP_SIZE = 128 39 | DEVICE_MAP = "auto" 40 | 41 | def main(model_id, bits, dataset, group_size, device_map): 42 | tokenizer = AutoTokenizer.from_pretrained(model_id) 43 | gptq_config = GPTQConfig(bits=bits, dataset=dataset, tokenizer=tokenizer, group_size=group_size, desc_act=True) 44 | model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=gptq_config, device_map=device_map) 45 | model.to("cpu") 46 | model.save_pretrained(f"{model_id}_{bits}bit") 47 | tokenizer.save_pretrained(f"{model_id}_{bits}bit") 48 | 49 | if __name__ == "__main__": 50 | parser = argparse.ArgumentParser(description="Quantize a GPT model. Run with --help for more info.") 51 | parser.add_argument("--model_id", type=str, help="The pretrained model ID.") 52 | parser.add_argument("--bits", type=int, help="Number of bits for quantization.") 53 | parser.add_argument("--dataset", type=str, help="The dataset to use.") 54 | parser.add_argument("--group_size", type=int, help="Group size for quantization.") 55 | parser.add_argument("--device_map", type=str, help="Device map for loading the model.") 56 | parser.add_argument("--help", action="store_true", help="Show this help message and exit.") 57 | 58 | args = parser.parse_args() 59 | 60 | if args.help: 61 | parser.print_help() 62 | exit(0) 63 | 64 | # Use command-line arguments if provided, otherwise use hardcoded values 65 | model_id = args.model_id if args.model_id else MODEL_ID 66 | bits = args.bits if args.bits else BITS 67 | dataset = args.dataset if args.dataset else DATASET 68 | group_size = args.group_size if args.group_size else GROUP_SIZE 69 | device_map = args.device_map if args.device_map else DEVICE_MAP 70 | 71 | # Dictionary mapping of argument names to their values 72 | arg_dict = { 73 | "model_id": model_id, 74 | "bits": bits, 75 | "dataset": dataset, 76 | "group_size": group_size, 77 | "device_map": device_map 78 | } 79 | 80 | # Check for unset values and store their names 81 | missing_args = [key for key, value in arg_dict.items() if not value and value != 0] # checking for empty string and excluding 0 for 'bits' 82 | 83 | if missing_args: 84 | missing_str = ', '.join(missing_args) 85 | print(f"Error: Unset values for: {missing_str}. Use --help for more information.") 86 | exit(1) 87 | 88 | main(model_id=model_id, bits=bits, dataset=dataset, group_size=group_size, device_map=device_map) --------------------------------------------------------------------------------