├── requirements.txt
├── readme.md
└── quantize.py


/requirements.txt:
--------------------------------------------------------------------------------
1 | --extra-index-url https://download.pytorch.org/whl/cu118
2 | --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
3 | torch==2.0.1
4 | auto-gptq
5 | transformers
6 | sentencepiece
7 | protobuf
8 | optimum


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
 1 | # transformers-gptq-quant
 2 | 
 3 | A Python-based utility for quantizing GPT models using the Hugging Face 'transformers' library.
 4 | 
 5 | ## Description
 6 | 
 7 | This repository provides a Python script to quantize models with the Hugging Face 'transformers' and AutoGPTQ for 4 or 8 bit. Quantization can reduce the memory requirements of models, which may be beneficial for deployment on resource-constrained devices.
 8 | 
 9 | ## Features
10 | 
11 | - Supports quantizing various GPTQ precisions (8bit and 4bit).
12 | - Allows for command-line or direct script useage.
13 | - Compatible with a range of models and datasets.
14 | 
15 | ## Prerequisites
16 | 
17 | To utilize this utility, ensure that you've installed the dependencies listed in the `requirements.txt` file. You can install these using pip:
18 | 
19 | ```bash
20 | pip install -r requirements.txt
21 | ```
22 | 
23 | ## Usage
24 | You can run the script in two ways:
25 | 
26 | 1. By setting hardcoded values for the parameters at the top of the script under 'Hardcoded default values'.
27 | 2. By providing command-line arguments to override the hardcoded values.
28 | 
29 | # Parameters:
30 | - model_id: The huggingface repository or local directory containing a HF model. Example: 'teknium/openhermes', './hermes2', etc. It can be a folder or a huggingface directory.
31 | - bits: The number of bits to use for quantization. Options: 4 and 8 - Default: `4`
32 | - dataset: The name of the dataset to use. Example: 'wikitext2', 'c4', etc. - Default: `wikitext2`
33 | - group_size: The size of the group for quantization. Usually a power of 2. - Default: `128`
34 | - device_map: Device mapping configuration for loading the model. Example: 'auto', 'cpu', 'cuda:0', etc. - Default `"auto"`
35 | 
36 | ## Running the script
37 | 
38 | 1. **Using hardcoded values**:
39 |     - Edit the values of `MODEL_ID`, `BITS`, `DATASET`, `GROUP_SIZE`, and `DEVICE_MAP` at the top of the script.
40 |     - Run the script without additional command-line arguments.
41 | 
42 | 2. **Using command-line arguments**:
43 |     - Use the following syntax:
44 |     ```bash
45 |     python quantize.py --model_id 'teknium/openhermes' --bits 8 --dataset 'wikitext2' --group_size 32 --device_map 'auto'
46 |     ```
47 | 
48 |     **Note**: Command-line arguments will take precedence over hardcoded values.
49 | 
50 | ## Requirements
51 | 
52 | Ensure you have the necessary dependencies by checking the `requirements.txt` in this repo. Install them via:
53 | 
54 | ```bash
55 | pip install -r requirements.txt
56 | ```
57 | 
58 | ## Contributing
59 | 
60 | Feel free to contribute to this project by creating issues or submitting pull requests. Ensure that your contributions are in line with the project's aim and structure.
61 | 
62 | ## License
63 | 
64 | This project is open source, under the MIT license.
65 | 


--------------------------------------------------------------------------------
/quantize.py:
--------------------------------------------------------------------------------
 1 | """
 2 | README:
 3 | 
 4 | This Python script is designed to quantize a GPT model using the Hugging Face 'transformers' library.
 5 | You can run this script in two ways:
 6 | 1. By setting hardcoded values for the parameters at the top of this script under 'Hardcoded default values'.
 7 | 2. By providing command-line arguments to override the hardcoded values.
 8 | 
 9 | Parameters:
10 | 
11 | - model_id: The identifier of the pre-trained GPT model. Example: 'gpt2', 'gpt2-medium', etc.
12 | - bits: The number of bits to use for quantization. Example: 8, 16, etc.
13 | - dataset: The name of the dataset to use. Example: 'wikitext2', 'wikitext103', etc.
14 | - group_size: The size of the group for quantization. Usually a power of 2.
15 | - device_map: Device mapping configuration for loading the model. Example: 'auto', 'cpu', 'cuda:0', etc.
16 | 
17 | Usage:
18 | 
19 | 1. To use hardcoded values:
20 |     - Edit the values of MODEL_ID, BITS, DATASET, GROUP_SIZE, and DEVICE_MAP at the top of this script.
21 |     - Run the script.
22 | 
23 | 2. To use command-line arguments:
24 |     - Use the following syntax:
25 |         python script_name.py --model_id 'gpt2' --bits 8 --dataset 'wikitext2' --group_size 32 --device_map 'auto'
26 | 
27 | Note: Command-line arguments will take precedence over hardcoded values.
28 | """
29 | 
30 | import argparse
31 | import auto_gptq
32 | from transformers import GPTQConfig, AutoModelForCausalLM, AutoTokenizer
33 | 
34 | # Hardcoded default values
35 | MODEL_ID = "teknium/OpenHermes-2-Mistral-7B"
36 | BITS = 4
37 | DATASET = "wikitext2"
38 | GROUP_SIZE = 128
39 | DEVICE_MAP = "auto"
40 | 
41 | def main(model_id, bits, dataset, group_size, device_map):
42 |     tokenizer = AutoTokenizer.from_pretrained(model_id)
43 |     gptq_config = GPTQConfig(bits=bits, dataset=dataset, tokenizer=tokenizer, group_size=group_size, desc_act=True)
44 |     model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=gptq_config, device_map=device_map)
45 |     model.to("cpu")
46 |     model.save_pretrained(f"{model_id}_{bits}bit")
47 |     tokenizer.save_pretrained(f"{model_id}_{bits}bit")
48 | 
49 | if __name__ == "__main__":
50 |     parser = argparse.ArgumentParser(description="Quantize a GPT model. Run with --help for more info.")
51 |     parser.add_argument("--model_id", type=str, help="The pretrained model ID.")
52 |     parser.add_argument("--bits", type=int, help="Number of bits for quantization.")
53 |     parser.add_argument("--dataset", type=str, help="The dataset to use.")
54 |     parser.add_argument("--group_size", type=int, help="Group size for quantization.")
55 |     parser.add_argument("--device_map", type=str, help="Device map for loading the model.")
56 |     parser.add_argument("--help", action="store_true", help="Show this help message and exit.")
57 | 
58 |     args = parser.parse_args()
59 | 
60 |     if args.help:
61 |         parser.print_help()
62 |         exit(0)
63 | 
64 |     # Use command-line arguments if provided, otherwise use hardcoded values
65 |     model_id = args.model_id if args.model_id else MODEL_ID
66 |     bits = args.bits if args.bits else BITS
67 |     dataset = args.dataset if args.dataset else DATASET
68 |     group_size = args.group_size if args.group_size else GROUP_SIZE
69 |     device_map = args.device_map if args.device_map else DEVICE_MAP
70 | 
71 |     # Dictionary mapping of argument names to their values
72 |     arg_dict = {
73 |         "model_id": model_id,
74 |         "bits": bits,
75 |         "dataset": dataset,
76 |         "group_size": group_size,
77 |         "device_map": device_map
78 |     }
79 | 
80 |     # Check for unset values and store their names
81 |     missing_args = [key for key, value in arg_dict.items() if not value and value != 0]  # checking for empty string and excluding 0 for 'bits'
82 | 
83 |     if missing_args:
84 |         missing_str = ', '.join(missing_args)
85 |         print(f"Error: Unset values for: {missing_str}. Use --help for more information.")
86 |         exit(1)
87 | 
88 |     main(model_id=model_id, bits=bits, dataset=dataset, group_size=group_size, device_map=device_map)


--------------------------------------------------------------------------------