├── .gitignore ├── README.md ├── environment.yml ├── minhash.py ├── model_eval ├── gen.sh └── generate.py └── model_finetune ├── MG-Verilog └── data_format_conversion.py ├── README.md └── run_peft.sh /.gitignore: -------------------------------------------------------------------------------- 1 | **/results/** 2 | **.jsonl** 3 | **/CodeLlama-7b/** 4 | **/CodeLlama-7b.nemo -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LLM4HWDesign Starting Toolkit 2 | ## Introduction 3 | This repository provides a starting toolkit for participants of the [LLM4HWDesign Contest at ICCAD 2024](https://nvlabs.github.io/LLM4HWDesign/). The toolkit includes scripts and utilities for deduplicating training data, fine-tuning models, and evaluating model performance. Participants can use this toolkit to kickstart their work and streamline their development process. 4 | 5 | ## Base Dataset 6 | The base dataset used in the contest is the [MG-Verilog dataset](https://huggingface.co/datasets/GaTech-EIC/MG-Verilog). For your submitted data, please follow the same format as the [MG-Verilog dataset](https://huggingface.co/datasets/GaTech-EIC/MG-Verilog). Please note that you can either provide multiple levels or a single level of description for each code sequence, but we will **concatenate all descriptions at different levels into one string** for each code sequence following the script below. 7 | 8 | ```python 9 | instructions_dict = { 10 | "summary": "xxx", 11 | "detailed explanation": "yyy" 12 | } 13 | 14 | result = ";\n".join([f"{key}: {value}" for key, value in instructions_dict.items()]) + "\n" 15 | 16 | ''' 17 | result should be 18 | summary: xxx; 19 | detailed explanation: yyy 20 | ''' 21 | ``` 22 | ## Toolkit Release Progress 23 | - [x] **Deduplication**: Scripts to identify and remove duplicate samples from the dataset. 24 | - [x] **Fine-tuning**: Scripts to fine-tune a pretrained language model on the MG-Verilog dataset. 25 | - [x] **Evaluation**: Tools to evaluate the performance of the fine-tuned model using standard metrics. 26 | 27 | 28 | ## Setup Environment 29 | 30 | We assume CUDA 12.1. (Only needed if you want to do fine-tuning and evaluation on your own.) 31 | 32 | `conda env create -f environment.yml` 33 | 34 | 35 | ## Deduplication 36 | The toolkit includes a deduplication script, which will be used to deduplicate each participant's data against the base dataset during the evaluation of Phase I. 37 | To run the deduplication script: 38 | ```bash 39 | python minhash.py 40 | ``` 41 | 42 | 43 | ## Evaluation 44 | 45 | The following shows an example on how to evaluate your fine-tuned model. 46 | 47 | **Prerequisites**: 48 | 49 | `export HF_TOKEN=your_huggingface_token` 50 | 51 | Prepare your fine-tuned model and tokenizer in HuggingFace format. 52 | 53 | Install [Iverilog](https://steveicarus.github.io/iverilog/usage/installation.html) 54 | 55 | Install VerilogEval as the benchmark: 56 | 57 | **Please read the WARNINGS in the [VerilogEval](https://github.com/NVlabs/verilog-eval/tree/main?tab=readme-ov-file#usage) before proceeding** 58 | 59 | **Only support 1.0 Version** https://github.com/NVlabs/verilog-eval/tree/release/1.0.0 60 | 61 | Pay attention to the "verilog-eval" or "verilog_eval" which is used in mg-verilog's own midified VerilogEval 62 | 63 | ```bash 64 | git clone https://github.com/NVlabs/verilog-eval.git 65 | pip install -e verilog-eval 66 | ``` 67 | 68 | **Evaluation Scripts**: 69 | 70 | ```bash 71 | cd model_eval 72 | ./gen.sh 73 | #example: ./gen.sh "finetuned_model/" "hf-xxxxxxxxxx" 74 | ``` 75 | 76 | NOTE: The folder with your model and config should include two files (1) the generated pytorch_model.bin and (2) the [model config](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf/blob/main/config.json) of CodeLlama-7B-Instruct from HuggingFace 77 | 78 | The results will be printed and logged in `./model_eval/data/gen.jsonl` 79 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: iccad_challenge_2024 2 | channels: 3 | - conda-forge 4 | - pytorch 5 | - nvidia 6 | dependencies: 7 | - python=3.8 8 | - pytorch=2.1.1 9 | - torchvision 10 | - pytorch-cuda=12.1 11 | - gperf 12 | - flex 13 | - bison 14 | - pip: 15 | - jsonlines 16 | - accelerate==0.24.1 17 | - pandas 18 | - fire 19 | - transformers[torch]==4.35.0 20 | - peft==0.6.1 21 | - bitsandbytes==0.41.2.post2 22 | - scipy 23 | - sentencepiece 24 | - transformers[dev] 25 | - setuptools==69.5.1 26 | - deepspeed==0.12.2 27 | - transformers[deepspeed] 28 | - datasketch==1.6.4 29 | -------------------------------------------------------------------------------- /minhash.py: -------------------------------------------------------------------------------- 1 | import json 2 | import multiprocessing as mp 3 | import re 4 | from collections import defaultdict 5 | from functools import partial 6 | from typing import Dict, List, Optional, Set, Tuple, Type 7 | 8 | from datasets import Dataset 9 | from datasketch import MinHash, MinHashLSH 10 | from tqdm import tqdm 11 | 12 | 13 | 14 | from functools import partial 15 | import sys 16 | import queue 17 | import threading 18 | from typing import TypeVar, Iterator, List, Optional, Tuple 19 | 20 | T = TypeVar('T') 21 | 22 | class ThreadedIterator(Iterator[T]): 23 | """An iterator object that computes its elements in a single parallel thread to be ready to be consumed. 24 | The iterator should *not* return `None`. Elements of the original iterable will be shuffled arbitrarily.""" 25 | def __init__(self, original_iterator: Iterator[T], max_queue_size: int = 2, enabled: bool = True): 26 | self.__is_enabled = enabled 27 | if enabled: 28 | self.__queue = queue.Queue(maxsize=max_queue_size) # type: queue.Queue[Optional[T]] 29 | self.__thread = threading.Thread(target=lambda: self.__worker(self.__queue, original_iterator), daemon=True) 30 | self.__thread.start() 31 | else: 32 | self.__original_iterator = original_iterator 33 | 34 | @staticmethod 35 | def __worker(queue: queue.Queue, original_iterator: Iterator[T])-> None: 36 | try: 37 | for element in original_iterator: 38 | assert element is not None, 'By convention, Iterables wrapped in ThreadedIterator may not contain None.' 39 | queue.put(element, block=True) 40 | queue.put(None, block=True) 41 | except Exception as e: 42 | _, __, tb = sys.exc_info() 43 | queue.put((e, tb), block=True) 44 | 45 | def __next__(self) -> T: 46 | next_element = self.__queue.get(block=True) 47 | if next_element is None: 48 | self.__thread.join() 49 | self.__queue.put(None) # Make sure that we remember that we are done if we are called once more... 50 | raise StopIteration 51 | if isinstance(next_element, tuple) and isinstance(next_element[0], Exception): 52 | raise next_element[0].with_traceback(next_element[1]) 53 | return next_element 54 | 55 | def __iter__(self): 56 | if self.__is_enabled: 57 | return self 58 | else: 59 | return iter(self.__original_iterator) 60 | 61 | 62 | 63 | 64 | 65 | NON_ALPHA = re.compile("[^A-Za-z_0-9]") 66 | # parameters used in DuplicationIndex 67 | MIN_NUM_TOKENS = 10 68 | NUM_PERM = 256 69 | 70 | def get_min_hash(tokens: List[str]) -> Optional[MinHash]: 71 | """Compute the MinHash of a code snippet.""" 72 | if len(tokens) < MIN_NUM_TOKENS: 73 | return None 74 | min_hash = MinHash(num_perm=NUM_PERM) 75 | for token in set(tokens): 76 | min_hash.update(token.encode()) 77 | return min_hash 78 | 79 | 80 | def get_tokens(code: str) -> Set[str]: 81 | """Tokenize a code snippet.""" 82 | return {t for t in NON_ALPHA.split(code) if len(t.strip()) > 0} 83 | 84 | 85 | class DuplicationIndex: 86 | def __init__( 87 | self, 88 | *, 89 | duplication_jaccard_threshold: float = 0.85, 90 | ): 91 | self._duplication_jaccard_threshold = duplication_jaccard_threshold 92 | self._num_perm = NUM_PERM 93 | self._index = MinHashLSH(threshold=self._duplication_jaccard_threshold, num_perm=self._num_perm) 94 | 95 | self._duplicate_clusters = defaultdict(set) 96 | 97 | def add(self, code_key: Tuple, min_hash: MinHash) -> None: 98 | """Add a key to _index (MinHashLSH) 99 | the min_hash is used to query closest matches based on the jaccard_threshold. 100 | The new key is either added to a existing cluster of one close match, 101 | or a new cluster is created. The clusters created in this way, depend on the order of add. 102 | 103 | Args: 104 | code_key (Tuple of (index, repo_name, path)): 105 | Theoritically any hasbale key. Here we use a tuple to retrieve the information later. 106 | min_hash: MinHash of the code_key. 107 | """ 108 | close_duplicates = self._index.query(min_hash) 109 | if code_key in self._index.keys: 110 | print(f"Duplicate key {code_key}") 111 | return 112 | 113 | self._index.insert(code_key, min_hash) 114 | if len(close_duplicates) > 0: 115 | for base_duplicate in close_duplicates: 116 | if base_duplicate in self._duplicate_clusters: 117 | self._duplicate_clusters[base_duplicate].add(code_key) 118 | break 119 | else: 120 | self._duplicate_clusters[close_duplicates[0]].add(code_key) 121 | 122 | def get_duplicate_clusters(self) -> List[List[Dict]]: 123 | """Export the duplicate clusters. 124 | For each cluster, the first element is the base element of the cluster. 125 | The base element has an estimation jaccard similarity higher than the threshold with all the other elements. 126 | 127 | Returns: 128 | duplicate_clusters (List[List[Dict]]): 129 | List of duplicate clusters. 130 | """ 131 | duplicate_clusters = [] 132 | for base, duplicates in self._duplicate_clusters.items(): 133 | cluster = [base] + list(duplicates) 134 | # reformat the cluster to be a list of dict 135 | cluster = [{"base_index": el[0], "repo_name": el[1], "path": el[2]} for el in cluster] 136 | duplicate_clusters.append(cluster) 137 | return duplicate_clusters 138 | 139 | def save(self, filepath) -> None: 140 | duplicate_clusters = self.get_duplicate_clusters() 141 | with open(filepath, "w") as f: 142 | json.dump(duplicate_clusters, f) 143 | 144 | 145 | def _compute_min_hash(element): 146 | index, data = element 147 | min_hash = get_min_hash([t for t in NON_ALPHA.split(data["code"]) if len(t.strip()) > 0]) 148 | if min_hash is not None: 149 | #can supply later and bookkept by get_duplicate_clusters 150 | return (index, "norepo", "nopath"), min_hash 151 | 152 | 153 | def minhash_iter(dataset_iterator: Type[Dataset]): 154 | with mp.Pool() as pool: 155 | for data in pool.imap_unordered( 156 | _compute_min_hash, 157 | ThreadedIterator(dataset_iterator, max_queue_size=10000), 158 | chunksize=100, 159 | ): 160 | if data is not None: 161 | yield data 162 | 163 | 164 | def make_duplicate_clusters(dataset_iterator: Type[Dataset], jaccard_threshold: float): 165 | """Find duplicate clusters in the dataset in two steps: 166 | 1. Compute MinHash for each code snippet. MinHash is a tool for fast jaccard similarity estimation. 167 | This step is computed using an asynchronous multiprocessing pool, minhash_iter 168 | 2. Find duplicate clusters. The computed MinHash is added sequentially to the DuplicationIndex. 169 | This step cannot be parallelized. So using asynchronous thread in the previous step helps to speed up the process. 170 | """ 171 | di = DuplicationIndex(duplication_jaccard_threshold=jaccard_threshold) 172 | 173 | for filename, min_hash in tqdm(ThreadedIterator(minhash_iter(enumerate(dataset_iterator)), max_queue_size=100)): 174 | di.add(filename, min_hash) 175 | 176 | # Returns a List[Cluster] where Cluster is List[str] with the filenames. 177 | return di.get_duplicate_clusters() 178 | 179 | 180 | def jaccard_similarity(code1: str, code2: str) -> float: 181 | """Compute the Jaccard similarity of two code snippets.""" 182 | tokens1 = get_tokens(code1) 183 | tokens2 = get_tokens(code2) 184 | return len(tokens1 & tokens2) / len(tokens1 | tokens2) 185 | 186 | 187 | _shared_dataset = None 188 | 189 | 190 | def _find_cluster_extremes_shared(cluster, jaccard_threshold): 191 | """Find a reduced cluster such that each code in the origin cluster is similar to at least one code in the reduced cluster. 192 | Two codes are similar if their Jaccard similarity is above the threshold. 193 | 194 | Args: 195 | cluster (List[dict]): 196 | cluster is a list of dict, each dict contains the following keys: 197 | - base_index 198 | - repo_name 199 | - path 200 | This is a typical output of DuplicationIndex.get_duplicate_clusters() 201 | jaccard_threshold (float): 202 | threshold for Jaccard similarity. 203 | Two codes are similar if their Jaccard similarity is above the threshold. 204 | 205 | Returns: 206 | extremes (List[dict]): 207 | A reduced representation of the cluster. The field copies is added to each dict. 208 | The copies field indicates the number of similar codes in the cluster for a extreme. 209 | """ 210 | extremes = [] 211 | for element1 in cluster: 212 | code1 = _shared_dataset[element1["base_index"]]["code"] 213 | for element2 in extremes: 214 | code2 = _shared_dataset[element2["base_index"]]["code"] 215 | if jaccard_similarity(code1, code2) >= jaccard_threshold: 216 | element2["copies"] += 1 217 | break 218 | else: 219 | element1["copies"] = 1 220 | extremes.append(element1) 221 | return extremes 222 | 223 | 224 | def find_extremes(cluster_list, dataset, jaccard_threshold): 225 | """Call the _find_cluster_extremes_shared function in a parallel fashion. 226 | 227 | Args: 228 | cluster_list (List[List[Dict]]): 229 | each cluster is a list of dicts with the key base_index, 230 | referring to the index of the base code in the dataset. 231 | dataset (Type[Dataset]): 232 | dataset is used to access the content of the code snippets, 233 | using the base_index from the cluster_list. 234 | dataset is shared between all the processes using a glabal variable (any other way to share the dataset?), 235 | otherwise the multi processing is not speeded up. 236 | jaccard_threshold (float): 237 | the threshold for the jaccard similarity. The default value is 0.85 238 | 239 | Returns: 240 | extremes_list (List[Dict]): 241 | Each cluster is reduced to extremes. 242 | See _find_cluster_extremes_shared for the definition of extremes. 243 | """ 244 | global _shared_dataset 245 | _shared_dataset = dataset 246 | extremes_list = [] 247 | f = partial(_find_cluster_extremes_shared, jaccard_threshold=jaccard_threshold) 248 | with mp.Pool() as pool: 249 | for extremes in tqdm( 250 | pool.imap_unordered( 251 | f, 252 | cluster_list, 253 | ), 254 | total=len(cluster_list), 255 | ): 256 | extremes_list.append(extremes) 257 | return extremes_list 258 | 259 | 260 | def deduplicate_dataset( 261 | dataset: Type[Dataset], jaccard_threshold: float = 0.85 262 | ) -> Tuple[Type[Dataset], List[List[Dict]]]: 263 | """Deduplicate the dataset using minhash and jaccard similarity. 264 | This function first generate duplicate clusters, then each cluster 265 | is reduced to the extremes that are similar to the other elements in the cluster. 266 | Codes are called similar if their Jaccard similarity is greater than jaccard_threshold (0.85 default). 267 | 268 | Args: 269 | dataset (Type[Dataset]): 270 | The dataset to deduplicate. 271 | jaccard_threshold (float, default=0.85): 272 | jaccard threshold to determine if two codes are similar 273 | 274 | Returns: 275 | ds_dedup (Type[Dataset]): 276 | The deduplicated dataset. 277 | duplicate_clusters (List[List[Dict]]): 278 | The list of duplicate clusters. 279 | Each cluster is a list of dicts with the following keys: 280 | - base_index : int 281 | The index of the code in the original dataset. 282 | - repo_name : str 283 | - path : str 284 | - copies : int 285 | The number of copies of the code in the cluster. (find_cluster_extremes) 286 | - is_extreme : bool 287 | Whether the code is an extreme in the cluster. 288 | All the codes in the cluster are removed from the dataset except the extremes. 289 | 290 | Example: 291 | >>> from datasets import load_dataset 292 | >>> from minhash_deduplication import deduplicate_dataset 293 | >>> ds = load_dataset("lvwerra/codeparrot-clean", split="train") 294 | >>> ds_dedup, duplicate_clusters = deduplicate_dataset(ds, jaccard_threshold=0.85) 295 | """ 296 | duplicate_clusters = make_duplicate_clusters(dataset, jaccard_threshold) 297 | duplicate_indices = {x["base_index"] for cluster in duplicate_clusters for x in cluster} 298 | extreme_dict = {} 299 | extremes_clusters = find_extremes(duplicate_clusters, dataset, jaccard_threshold) 300 | for extremes in extremes_clusters: 301 | for element in extremes: 302 | extreme_dict[element["base_index"]] = element 303 | remove_indices = duplicate_indices - set(extreme_dict.keys()) 304 | ds_filter = dataset.filter(lambda x, idx: idx not in remove_indices, with_indices=True) 305 | 306 | # update duplicate_clusters 307 | for cluster in duplicate_clusters: 308 | for element in cluster: 309 | element["is_extreme"] = element["base_index"] in extreme_dict 310 | if element["is_extreme"]: 311 | element["copies"] = extreme_dict[element["base_index"]]["copies"] 312 | 313 | print(f"Original dataset size: {len(dataset)}") 314 | print(f"Number of duplicate clusters: {len(duplicate_clusters)}") 315 | print(f"Files in duplicate cluster: {len(duplicate_indices)}") 316 | print(f"Unique files in duplicate cluster: {len(extreme_dict)}") 317 | print(f"Filtered dataset size: {len(ds_filter)}") 318 | 319 | return ds_filter, duplicate_clusters 320 | 321 | 322 | 323 | if __name__ == "__main__": 324 | import argparse 325 | from datasets import load_dataset 326 | parser = argparse.ArgumentParser() 327 | parser.add_argument("--dataset", type=str, default="GaTech-EIC/MG-Verilog") 328 | args = parser.parse_args() 329 | 330 | ds = load_dataset(args.dataset) 331 | #deduplicate based on the "code" key 332 | ds_test = ds["train"] 333 | ds_dedup, duplicate_clusters = deduplicate_dataset(ds_test, jaccard_threshold=0.85) 334 | # print(duplicate_clusters) 335 | print("Duplicate clusters: ", len(duplicate_clusters)) 336 | print("Original dataset size: ", len(ds_test)) 337 | print("Deduplicated dataset size: ", len(ds_dedup)) -------------------------------------------------------------------------------- /model_eval/gen.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=0,1,2,3 2 | 3 | ckpt_dir=$1 4 | token=$2 5 | 6 | export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:64' 7 | 8 | accelerate launch --multi_gpu generate.py \ 9 | --checkpoint_dir $ckpt_dir \ 10 | --model_type "qlora" \ 11 | --base_model "codellama/CodeLlama-7b-Instruct-hf" \ 12 | --tokenizer_type "code_llama" \ 13 | --cache_dir "./HF_cache/" \ 14 | --hf_token $token \ 15 | --max_new_tokens 1024 \ 16 | --temperature 0.7 \ 17 | --desc_file ../verilog_eval/descriptions/VerilogDescription_Machine.jsonl \ 18 | --desc_key "detail_description" \ 19 | --prompt_type "baseline" \ 20 | --eval_file ../verilog_eval/data/VerilogEval_Machine.jsonl \ 21 | --output_file ./data/gen.jsonl \ 22 | --fp16 \ 23 | --sample_k 20 \ 24 | --result_name Test \ 25 | --batch_size 1 26 | -------------------------------------------------------------------------------- /model_eval/generate.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import sys 4 | import time 5 | from pathlib import Path 6 | from typing import Tuple 7 | 8 | import fire 9 | import torch 10 | import numpy as np 11 | import math 12 | from datetime import timedelta 13 | 14 | 15 | from transformers import AutoTokenizer, CodeGenForCausalLM, GenerationConfig, AutoModelForCausalLM, DataCollatorForLanguageModeling 16 | import pandas as pd 17 | 18 | import jsonlines 19 | 20 | #only support https://github.com/NVlabs/verilog-eval/tree/release/1.0.0 21 | #pay attention to the "verilog-eval" or "verilog_eval" which is used in mg-verilog's own midified VerilogEval 22 | sys.path.append("../verilog-eval/verilog_eval") 23 | from evaluation import evaluate_functional_correctness 24 | from datetime import datetime 25 | from tqdm import tqdm 26 | 27 | from accelerate import PartialState, Accelerator, DeepSpeedPlugin 28 | from accelerate.utils import InitProcessGroupKwargs 29 | import torch.distributed as dist 30 | import torch.multiprocessing as mp 31 | 32 | from transformers import ( 33 | AutoTokenizer, 34 | AutoModelForCausalLM, 35 | set_seed, 36 | Seq2SeqTrainer, 37 | BitsAndBytesConfig, 38 | LlamaTokenizer 39 | 40 | ) 41 | from tokenizers import AddedToken 42 | 43 | import warnings 44 | from packaging import version 45 | from packaging.version import parse 46 | 47 | from transformers import LlamaForCausalLM, LlamaConfig 48 | from accelerate import init_empty_weights, load_checkpoint_and_dispatch 49 | 50 | 51 | # PROMPT = ( 52 | # "You only complete chats with syntax correct Verilog code. End the Verilog module code completion with 'endmodule'. Do not include module, input and output definitions." 53 | # "\n\nImplement the Verilog module based on the following description. Assume that signals are positive clock/clk edge triggered unless otherwise stated.\n\n" 54 | # "\n\n{description}\n\nModule header:\n{module_header}\n" 55 | # ) 56 | 57 | 58 | llama2_prompt_with_memory =""" 59 | [INST] <> 60 | {system_message} 61 | <> 62 | 63 | {chat_history} {human_input} [/INST] 64 | """ 65 | 66 | llama2_prompt_without_memory =""" 67 | [INST] <> 68 | {system_message} 69 | <> 70 | 71 | {human_input} [/INST] 72 | """ 73 | 74 | llama2_prompt_without_memory_without_sys =""" 75 | [INST] {human_input} [/INST] 76 | """ 77 | 78 | llama2_pompt_with_memory_without_sys =""" 79 | [INST] {chat_history} {human_input} [/INST] 80 | """ 81 | 82 | llama2_memory_prompt ="""{human_input} [/INST] {model_reply}[INST]""" 83 | 84 | system_prompt = "You only complete chats with syntax correct Verilog code. End the Verilog module code completion with 'endmodule'. Do not include module, input and output definitions." 85 | question_prompt = "Implement the Verilog module based on the following description. Assume that signals are positive clock/clk edge triggered unless otherwise stated." 86 | problem_description = "\n\n {description} \n\n Module header:\n\n {module_header}\n" 87 | PROMPT_BASELINE = llama2_prompt_without_memory.format(system_message = system_prompt, human_input = question_prompt + problem_description) 88 | 89 | system_prompt = "You only expand the high level summary of a Verilog code module design to block level summaries." 90 | question_prompt = "Come up with correct functional blocks for to implement the Verilog module in the high level summary, and expand the high level summary to block level summaries." 91 | problem_description = "\n Here is the high level summary:\n\n {description} \n\n" 92 | problem_description += "\n Here is the Verilog module header:\n\n {module_header} \n\n" 93 | LLM1_PROMPT = llama2_prompt_without_memory.format(system_message = system_prompt, human_input = question_prompt + problem_description) 94 | 95 | 96 | system_prompt = "You only complete chats with syntax correct Verilog code. End the Verilog module code completion with 'endmodule'. Do not include module, input and output definitions." 97 | question_prompt = "Implement the Verilog module based on the following block level summaries. Assume that signals are positive clock/clk edge triggered unless otherwise stated." 98 | problem_description = "\nHere are block level summaries:\n\n {description} \n\n Module header:\n\n {module_header} \n" 99 | LLM2_BLOCK_TO_CODE_PROMPT = llama2_prompt_without_memory.format(system_message = system_prompt, human_input = question_prompt + problem_description) 100 | 101 | system_prompt = "You only complete chats with syntax correct Verilog code. End the Verilog module code completion with 'endmodule'. Do not include module, input and output definitions." 102 | question_prompt = "Implement the Verilog module based on the following block level summaries. Assume that signals are positive clock/clk edge triggered unless otherwise stated." 103 | problem_description = "{description} \n\n Module header:\n\n {module_header} \n" 104 | LLM2_BLOCK_GLOBAL_TO_CODE_PROMPT = llama2_prompt_without_memory.format(system_message = system_prompt, human_input = question_prompt + problem_description) 105 | 106 | def mem(): 107 | # print("torch.cuda.memory_allocated: %fGB"%(torch.cuda.memory_allocated(0)/1024/1024/1024)) 108 | # print("torch.cuda.memory_reserved: %fGB"%(torch.cuda.memory_reserved(0)/1024/1024/1024)) 109 | # print("torch.cuda.max_memory_reserved: %fGB"%(torch.cuda.max_memory_reserved(0)/1024/1024/1024)) 110 | return torch.cuda.memory_allocated(0)/1024/1024/1024 111 | 112 | def timestamp(): 113 | return f"[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}]" 114 | 115 | import importlib 116 | from peft import ( 117 | prepare_model_for_kbit_training, 118 | LoraConfig, 119 | get_peft_model, 120 | PeftModel 121 | ) 122 | from peft.tuners.lora import LoraLayer 123 | from os.path import exists, join, isdir 124 | 125 | 126 | def is_ipex_available(): 127 | def get_major_and_minor_from_version(full_version): 128 | return str(version.parse(full_version).major) + "." + str(version.parse(full_version).minor) 129 | 130 | _torch_version = importlib.metadata.version("torch") 131 | if importlib.util.find_spec("intel_extension_for_pytorch") is None: 132 | return False 133 | _ipex_version = "N/A" 134 | try: 135 | _ipex_version = importlib.metadata.version("intel_extension_for_pytorch") 136 | except importlib.metadata.PackageNotFoundError: 137 | return False 138 | torch_major_and_minor = get_major_and_minor_from_version(_torch_version) 139 | ipex_major_and_minor = get_major_and_minor_from_version(_ipex_version) 140 | if torch_major_and_minor != ipex_major_and_minor: 141 | warnings.warn( 142 | f"Intel Extension for PyTorch {ipex_major_and_minor} needs to work with PyTorch {ipex_major_and_minor}.*," 143 | f" but PyTorch {_torch_version} is found. Please switch to the matching version and run again." 144 | ) 145 | return False 146 | return True 147 | 148 | def smart_tokenizer_and_embedding_resize( 149 | special_tokens_dict, 150 | tokenizer, 151 | model, 152 | ): 153 | """Resize tokenizer and embedding. 154 | 155 | Note: This is the unoptimized version that may make your embedding size not be divisible by 64. 156 | """ 157 | if "codellama" in model.name_or_path: 158 | for key, value in special_tokens_dict.items(): 159 | special_tokens_dict[key] = AddedToken(value, lstrip=True, rstrip=True, normalized=False) 160 | 161 | num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict) 162 | model.resize_token_embeddings(len(tokenizer)) 163 | 164 | if num_new_tokens > 0: 165 | input_embeddings_data = model.get_input_embeddings().weight.data 166 | output_embeddings_data = model.get_output_embeddings().weight.data 167 | 168 | input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean(dim=0, keepdim=True) 169 | output_embeddings_avg = output_embeddings_data[:-num_new_tokens].mean(dim=0, keepdim=True) 170 | 171 | input_embeddings_data[-num_new_tokens:] = input_embeddings_avg 172 | output_embeddings_data[-num_new_tokens:] = output_embeddings_avg 173 | 174 | def get_accelerate_model(model_name, tokenizer_type, checkpoint, bf16, fp16, cache_dir, hf_token=None): 175 | 176 | if torch.cuda.is_available(): 177 | n_gpus = torch.cuda.device_count() 178 | if is_ipex_available() and torch.xpu.is_available(): 179 | n_gpus = torch.xpu.device_count() 180 | 181 | max_memory = f'{24000}MB' 182 | max_memory = {i: max_memory for i in range(n_gpus)} 183 | device_map = "auto" 184 | 185 | # if we are in a distributed setting, we need to set the device map and max memory per device 186 | if os.environ.get('LOCAL_RANK') is not None: 187 | local_rank = int(os.environ.get('LOCAL_RANK', '0')) 188 | print("distributed", local_rank) 189 | device_map = {'': local_rank} 190 | max_memory = {'': max_memory[local_rank]} 191 | 192 | compute_dtype = (torch.float16 if fp16 else (torch.bfloat16 if bf16 else torch.float32)) 193 | 194 | ckpt_path = checkpoint 195 | 196 | config = LlamaConfig.from_pretrained(os.path.join(ckpt_path, 'config.json')) 197 | state_dict = torch.load(os.path.join(ckpt_path, 'pytorch_model.bin')) 198 | with init_empty_weights(): 199 | model = LlamaForCausalLM(config) 200 | model = load_checkpoint_and_dispatch( 201 | model, 202 | checkpoint=os.path.join(ckpt_path, 'pytorch_model.bin'), 203 | device_map=device_map, 204 | max_memory=max_memory, 205 | dtype=torch.float16 # Change dtype if needed 206 | ) 207 | model.eval() 208 | 209 | # model = AutoModelForCausalLM.from_pretrained( 210 | # model_name, 211 | # cache_dir=checkpoint, 212 | # load_in_4bit=False, 213 | # load_in_8bit=True, 214 | # device_map=device_map, 215 | # max_memory= max_memory, 216 | # quantization_config=BitsAndBytesConfig( 217 | # load_in_4bit=False, 218 | # load_in_8bit=True, 219 | # llm_int8_threshold=6.0, 220 | # llm_int8_has_fp16_weight=False, 221 | # bnb_4bit_compute_dtype=compute_dtype, 222 | # bnb_4bit_use_double_quant=True, 223 | # bnb_4bit_quant_type="nf4", 224 | # ), 225 | # torch_dtype=(torch.float32 if fp16 else (torch.bfloat16 if bf16 else torch.float32)), 226 | # trust_remote_code=False, 227 | # token=hf_token 228 | # ) 229 | 230 | setattr(model, 'model_parallel', True) 231 | setattr(model, 'is_parallelizable', True) 232 | 233 | model.config.torch_dtype=(torch.float32 if fp16 else (torch.bfloat16 if bf16 else torch.float32)) 234 | 235 | # Tokenizer 236 | print(model_name) 237 | tokenizer = AutoTokenizer.from_pretrained( 238 | model_name, 239 | cache_dir=cache_dir, 240 | padding_side="left", 241 | use_fast=False, # Fast tokenizer giving issues. 242 | tokenizer_type=tokenizer_type, 243 | trust_remote_code=False, 244 | token=hf_token 245 | ) 246 | if tokenizer._pad_token is None: 247 | smart_tokenizer_and_embedding_resize( 248 | special_tokens_dict=dict(pad_token="[PAD]"), 249 | tokenizer=tokenizer, 250 | model=model, 251 | ) 252 | if "codellama" in model_name: 253 | tokenizer.add_special_tokens({ 254 | "eos_token": AddedToken(tokenizer.convert_ids_to_tokens(model.config.eos_token_id), lstrip=True, rstrip=True, normalized=False), 255 | "bos_token": AddedToken(tokenizer.convert_ids_to_tokens(model.config.bos_token_id), lstrip=True, rstrip=True, normalized=False), 256 | "unk_token": AddedToken(tokenizer.convert_ids_to_tokens( 257 | tokenizer.pad_token_id 258 | ), lstrip=False, rstrip=False, normalized=False), 259 | 260 | }) 261 | tokenizer.add_tokens( 262 | [AddedToken(tokenizer.eot_token, lstrip=True, rstrip=True, normalized=False), 263 | AddedToken(tokenizer.prefix_token, lstrip=True, rstrip=True, normalized=False), 264 | AddedToken(tokenizer.suffix_token, lstrip=True, rstrip=True, normalized=False), 265 | AddedToken(tokenizer.middle_token, lstrip=True, rstrip=True, normalized=False), 266 | AddedToken(tokenizer.pad_token, lstrip=True, rstrip=True, normalized=False), 267 | ], 268 | special_tokens=True 269 | ) 270 | elif 'llama' in model_name or isinstance(tokenizer, LlamaTokenizer): 271 | # LLaMA tokenizer may not have correct special tokens set. 272 | # Check and add them if missing to prevent them from being parsed into different tokens. 273 | # Note that these are present in the vocabulary. 274 | # Note also that `model.config.pad_token_id` is 0 which corresponds to `` token. 275 | print('Adding special tokens.') 276 | tokenizer.add_special_tokens({ 277 | "eos_token": tokenizer.convert_ids_to_tokens(model.config.eos_token_id), 278 | "bos_token": tokenizer.convert_ids_to_tokens(model.config.bos_token_id), 279 | "unk_token": tokenizer.convert_ids_to_tokens( 280 | tokenizer.pad_token_id 281 | ), 282 | }) 283 | 284 | # if not args.full_finetune: 285 | if True: 286 | # model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=args.gradient_checkpointing) 287 | model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True) 288 | 289 | # if not args.full_finetune: 290 | if True: 291 | if checkpoint is not None: 292 | pass 293 | # print("Loading adapters from checkpoint.") 294 | # model = PeftModel.from_pretrained(model, join(checkpoint, 'adapter_model'), is_trainable=True) 295 | else: 296 | raise Exception("Only loading checkpoint is allowed.") 297 | # print(f'adding LoRA modules...') 298 | # modules = find_all_linear_names(args, model) 299 | # config = LoraConfig( 300 | # r=args.lora_r, 301 | # lora_alpha=args.lora_alpha, 302 | # target_modules=modules, 303 | # lora_dropout=args.lora_dropout, 304 | # bias="none", 305 | # task_type="CAUSAL_LM", 306 | # ) 307 | # model = get_peft_model(model, config) 308 | 309 | for name, module in model.named_modules(): 310 | if isinstance(module, LoraLayer): 311 | if bf16: 312 | module = module.to(torch.bfloat16) 313 | if 'norm' in name: 314 | module = module.to(torch.float32) 315 | if 'lm_head' in name or 'embed_tokens' in name: 316 | if hasattr(module, 'weight'): 317 | if bf16 and module.weight.dtype == torch.float32: 318 | module = module.to(torch.bfloat16) 319 | return model, tokenizer 320 | 321 | def load( 322 | checkpoint_dir: str, 323 | tokenizer_type: str = "code_llama", 324 | model_type: str = "hf", 325 | base_model: str = None, 326 | bf16: bool = False, 327 | fp16: bool = False, 328 | cache_dir: str = "/home/user_name/HF_cache/", 329 | hf_token: str = "your_hf_token_if_you_want_to_use_it", 330 | dev: str=None 331 | ): 332 | start_time = time.time() 333 | 334 | if model_type == "hf": 335 | model = CodeGenForCausalLM.from_pretrained(checkpoint_dir, cache_dir="/home/user_name/HF_cache") 336 | 337 | if dev is not None: 338 | model = model.to(dev) 339 | 340 | tokenizer = AutoTokenizer.from_pretrained(checkpoint_dir, use_fast=False, padding_side="left") 341 | tokenizer.pad_token_id = tokenizer.eos_token_id 342 | 343 | else: 344 | print("Loading base model", base_model, "Checkpoint", checkpoint_dir) 345 | model, tokenizer = get_accelerate_model(base_model, tokenizer_type, checkpoint_dir, bf16, fp16, cache_dir, hf_token=hf_token) 346 | 347 | 348 | print(f"Loaded in {time.time() - start_time:.2f} seconds") 349 | return model, tokenizer 350 | 351 | class VerilogDataset(torch.utils.data.Dataset): 352 | def __init__(self, descriptions, headers, tokenizer, device, prompt_type= "baseline", desc_key = "detail_description", k=20): 353 | self.device = device 354 | self.tokenizer = tokenizer 355 | self.prompt_type = prompt_type 356 | 357 | self.prompts = [] 358 | for desc in descriptions: 359 | for i in range(k): 360 | self.prompts.append({ 361 | "task_id": desc["task_id"], 362 | "description": desc[desc_key], 363 | "module_header": headers[desc["task_id"]] 364 | }) 365 | 366 | def __len__(self): 367 | return len(self.prompts) 368 | 369 | def __getitem__(self, index): 370 | if self.prompt_type == "baseline": 371 | PROMPT = PROMPT_BASELINE 372 | elif self.prompt_type == "llm1": 373 | PROMPT = LLM1_PROMPT 374 | elif self.prompt_type == "llm2_block_to_code": 375 | PROMPT = LLM2_BLOCK_TO_CODE_PROMPT 376 | elif self.prompt_type == "llm2_block_global_to_code": 377 | PROMPT = LLM2_BLOCK_GLOBAL_TO_CODE_PROMPT 378 | else: 379 | raise Exception("Invalid prompt type.") 380 | prompt = PROMPT.format_map(self.prompts[index]) 381 | return torch.tensor(self.tokenizer(prompt).input_ids).to(self.device) 382 | 383 | def main( 384 | checkpoint_dir: str, 385 | model_type: str = "hf", 386 | tokenizer_type="code_llama", 387 | base_model: str = None, 388 | bf16: bool = False, 389 | fp16: bool = False, 390 | cache_dir: str = "/home/user_name/HF_cache/", 391 | hf_token: str = "your_hf_token_if_you_want_to_use_it", 392 | max_new_tokens: int = 1024, 393 | temperature = None, 394 | top_p = None, 395 | top_k = None, 396 | repetition_penalty = None, 397 | sample_k: int = 1, 398 | desc_file: str = "../verilog_eval/descriptions/VerilogDescription_Machine.jsonl", 399 | # desc_file: str = "/home/user_name/DAC_2024/chatgpt4_auto_accel/fine_tune_dataset/auto_doc_part_dataset/hdlbits_for_llm2_eval.jsonl", 400 | desc_key: str = "detail_description", 401 | prompt_type: str = "baseline", 402 | eval_file: str = "../verilog_eval/data/VerilogEval_Machine.jsonl", 403 | output_file: str = "./data/gen.jsonl", 404 | result_name: str = None, 405 | batch_size: int = 1, 406 | skip_iverilog: bool = False 407 | ): 408 | 409 | with torch.no_grad(): 410 | # deepspeed_plugin = DeepSpeedPlugin(zero_stage=3, gradient_accumulation_steps=2) 411 | # accel = Accelerator(deepspeed_plugin=deepspeed_plugin) 412 | kwargs = InitProcessGroupKwargs(timeout=timedelta(seconds=3600)) 413 | accel = Accelerator(kwargs_handlers=[kwargs]) 414 | accel.print("Starting at", timestamp()) 415 | 416 | accel.print("Loading model...") 417 | device = accel.device 418 | model, tokenizer = load(checkpoint_dir, tokenizer_type, model_type, base_model=base_model, bf16=bf16, fp16=fp16, cache_dir=cache_dir, hf_token=hf_token) 419 | model.eval() 420 | 421 | accel.print("Loading descriptions...") 422 | tasks = pd.read_json(path_or_buf=desc_file, lines=True) 423 | 424 | headers = {} 425 | with jsonlines.open(eval_file) as file: 426 | for obj in file: 427 | headers[obj["task_id"]] = obj["prompt"] 428 | 429 | 430 | # clear temp file 431 | with open(output_file, "w") as file: 432 | pass 433 | 434 | outputs = [] 435 | accel.print("Starting...") 436 | 437 | with accel.split_between_processes(tasks.to_dict(orient="records")) as prompts: 438 | # print("Distr", len(prompt), [r["task_id"] for r in prompt]) 439 | tasks = prompts 440 | 441 | # create new dataset for each process to help with prompt batch balancing 442 | task_ids = [t["task_id"] for t in tasks] 443 | dataset = VerilogDataset(tasks, headers, tokenizer, device, desc_key=desc_key, prompt_type=prompt_type, k=sample_k) 444 | collator = DataCollatorForLanguageModeling(tokenizer, mlm=False) 445 | dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, collate_fn=collator) 446 | 447 | accel.print("Starting generation...") 448 | 449 | for i, prompts in enumerate(tqdm(dataloader, total=len(dataloader), desc="Process " + str(accel.process_index))): 450 | prompts = prompts["input_ids"] 451 | output = model.generate( 452 | inputs=prompts, 453 | max_new_tokens=max_new_tokens, 454 | do_sample=True, 455 | temperature=temperature, 456 | top_p=top_p, 457 | top_k=top_k, 458 | repetition_penalty=repetition_penalty, 459 | pad_token_id=tokenizer.pad_token_id 460 | ) 461 | 462 | strings = tokenizer.batch_decode(output[:, prompts.shape[1]:], skip_special_tokens=True) 463 | full_responses = tokenizer.batch_decode(output, skip_special_tokens=True) 464 | prompt_strings = tokenizer.batch_decode(prompts[:, 30:], skip_special_tokens=True) 465 | 466 | for j, comp in enumerate(strings): 467 | tid = (i * batch_size + j) // sample_k 468 | obj = { 469 | "process": accel.process_index, 470 | "task_id": task_ids[tid], 471 | "completion": comp, 472 | "prompt": prompt_strings[j], 473 | "full_response": full_responses[j], 474 | } 475 | outputs.append(obj) 476 | 477 | with jsonlines.open(output_file, mode="a") as writer: 478 | print(timestamp(), "starting write", accel.process_index, "length", len(outputs)) 479 | writer.write_all(outputs) 480 | print(timestamp(), "done writing", accel.process_index) 481 | 482 | # evaluate 483 | accel.wait_for_everyone() 484 | 485 | if accel.is_main_process and not skip_iverilog: 486 | accel.print("running eval") 487 | res = evaluate_functional_correctness(output_file, problem_file=eval_file, k=[1,5,10]) 488 | accel.print("Eval Results:", res) 489 | with open("./data/results.txt", mode="a") as f: 490 | ts = f"[{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}{' | ' + result_name if result_name is not None else ''}] " 491 | f.write(ts + str(res) + "temperature:{}".format(temperature) + "top_p:{}".format(top_p) + "top_k:{}".format(top_k) + "repetition_penalty:{}".format(repetition_penalty) + "\n") 492 | 493 | if __name__ == "__main__": 494 | fire.Fire(main) 495 | -------------------------------------------------------------------------------- /model_finetune/MG-Verilog/data_format_conversion.py: -------------------------------------------------------------------------------- 1 | from datasets import load_dataset 2 | import json 3 | 4 | # Load the dataset 5 | dataset = load_dataset('GaTech-EIC/MG-Verilog') 6 | 7 | # Function to rename keys and save to JSONL 8 | def save_to_jsonl(split, split_name): 9 | jsonl_file_path = 'dataset.jsonl' 10 | with open(jsonl_file_path, 'a') as jsonl_file: 11 | for example in split: 12 | # Rename keys 13 | current_sample_dict = example['description'] 14 | current_sample = ', '.join(f'{k}: {v}' for k, v in current_sample_dict.items()) 15 | new_example = { 16 | 'input': current_sample, 17 | 'output': example['code'] 18 | } 19 | jsonl_file.write(json.dumps(new_example) + '\n') 20 | print(f"Dataset split '{split_name}' saved to {jsonl_file_path}") 21 | 22 | # Save each split 23 | for split_name in dataset: 24 | save_to_jsonl(dataset[split_name], split_name) 25 | -------------------------------------------------------------------------------- /model_finetune/README.md: -------------------------------------------------------------------------------- 1 | # Fine-Tunning in NeMo 2 | 3 | In our evaluation process for each submission, we will conduct fine-tuning with the following steps. To reproduce this process, we suggest using a machine with 4x A100 GPUs (or any NVIDIA GPU card with more than 24GB VRAM). If you plan to use a different number of GPUs other than 4, please adjust the `TP_SIZE` and `GPU_COUNT` parameters in the `run_peft.sh` script accordingly. 4 | 5 | 6 | ## NeMO docker setup 7 | 8 | ```bash 9 | # In Host Machine 10 | docker pull nvcr.io/nvidia/nemo:24.05 11 | ``` 12 | If you encounter a permission error, please use sudo to run the above command. 13 | 14 | ## Run NeMO docker 15 | ```bash 16 | # In Host Machine 17 | docker run --gpus all --shm-size=2g --net=host --ulimit memlock=-1 --rm -it -v ${PWD}:/workspace -w /workspace -v ${PWD}/results:/results nvcr.io/nvidia/nemo:24.05 bash 18 | ``` 19 | As with the previous step, if you encounter a permission error, please use sudo to run the above command. 20 | 21 | ## Prepare dataset 22 | 23 | Please prepare the dataset to be summited in .jsonl format. Here we provided an example of converting [MG-Verilog dataset](https://huggingface.co/datasets/GaTech-EIC/MG-Verilog) in huggingface into .jsonl format and used it as the fine-tuning dataset. 24 | 25 | ```bash 26 | # In Host Machine 27 | cd MG-Verilog 28 | python data_format_conversion.py 29 | ``` 30 | 31 | ## Download Codellama-7b weight 32 | ```bash 33 | # In Docker Container 34 | huggingface-cli login 35 | huggingface-cli download codellama/CodeLlama-7b-Instruct-hf --local-dir CodeLlama-7b 36 | ``` 37 | 38 | ## Convert huggingface model to NeMO format 39 | ```bash 40 | # In Docker Container 41 | python /opt/NeMo/scripts/checkpoint_converters/convert_llama_hf_to_nemo.py --input_name_or_path=./CodeLlama-7b/ --output_path=CodeLlama-7b.nemo 42 | ``` 43 | 44 | ## Run LoRA fine-tunning 45 | ```bash 46 | # In Docker Container 47 | mkdir /workspace/results/MG-Verilog 48 | bash run_peft.sh 49 | ``` 50 | 51 | ## Convert NeMO model to huggingface format 52 | ```bash 53 | # In Docker Container 54 | cd /opt/NeMo/ 55 | python scripts/nlp_language_modeling/merge_lora_weights/merge.py \ 56 | trainer.accelerator=cpu \ 57 | gpt_model_file=/workspace/CodeLlama-7b.nemo \ 58 | lora_model_path=/workspace/results/MG-Verilog/checkpoints/megatron_gpt_peft_lora_tuning.nemo \ 59 | merged_model_path=/workspace/results/MG-Verilog/checkpoints/CodeLlama-7b-lora-merged.nemo 60 | 61 | python /opt/NeMo/scripts/checkpoint_converters/convert_llama_nemo_to_hf.py \ 62 | --input_name_or_path /workspace/results/MG-Verilog/checkpoints/CodeLlama-7b-lora-merged.nemo \ 63 | --output_path /workspace/results/MG-Verilog/checkpoints/CodeLlama-7b-lora-merged/pytorch_model.bin \ 64 | --cpu-only 65 | ``` 66 | -------------------------------------------------------------------------------- /model_finetune/run_peft.sh: -------------------------------------------------------------------------------- 1 | if [ -z "$1" ] 2 | then 3 | echo "No argument supplied for max_steps. Please provide an integer value." 4 | exit 1 5 | fi 6 | MAX_STEPS=$1 7 | 8 | # This is the nemo model we are finetuning 9 | MODEL="./CodeLlama-7b.nemo" 10 | 11 | # These are the training datasets (in our case we only have one) 12 | TRAIN_DS="[MG-Verilog/dataset.jsonl]" 13 | 14 | # These are the validation datasets (in our case we only have one) 15 | VALID_DS="[MG-Verilog/dataset.jsonl]" 16 | 17 | # These are the test datasets (in our case we only have one) 18 | TEST_DS="[MG-Verilog/dataset.jsonl]" 19 | 20 | # These are the names of the test datasets 21 | TEST_NAMES="[MG-Verilog]" 22 | 23 | # This is the PEFT scheme that we will be using. Set to "ptuning" for P-Tuning instead of LoRA 24 | PEFT_SCHEME="lora" 25 | 26 | # This is the concat sampling probability. This depends on the number of files being passed in the train set 27 | # and the sampling probability for each file. In our case, we have one training file. 28 | CONCAT_SAMPLING_PROBS="[1.0]" 29 | 30 | 31 | # This is the tensor parallel size (splitting tensors among GPUs horizontally) 32 | # See above matrix for proper value for the given model size 33 | TP_SIZE=4 34 | 35 | # This is the pipeline parallel size (splitting layers among GPUs vertically) 36 | # See above matrix for proper value for the given model size 37 | PP_SIZE=1 38 | 39 | # The number of nodes to run this on 40 | # See above matrix for proper value for the given model size 41 | NODE_COUNT=1 42 | 43 | # The number of total GPUs used 44 | GPU_COUNT=4 45 | 46 | # Where to store the finetuned model and training artifacts 47 | OUTPUT_DIR="./results/MG-Verilog" 48 | 49 | 50 | # Run the PEFT command by appropriately setting the values for the parameters such as the number of steps, 51 | # model checkpoint path, batch sizes etc. For a full reference of parameter 52 | # settings refer to the config at https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/tuning/conf/megatron_gpt_finetuning_config.yaml 53 | # the value of trainer.max_steps should be scaled according to the size of the training data to ensure that the total number of epochs equals 3 54 | python /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py \ 55 | trainer.log_every_n_steps=1 \ 56 | trainer.precision=bf16 \ 57 | trainer.devices=${GPU_COUNT} \ 58 | trainer.num_nodes=1 \ 59 | trainer.val_check_interval=20 \ 60 | trainer.max_steps=${MAX_STEPS} \ 61 | model.restore_from_path=${MODEL} \ 62 | model.peft.peft_scheme=${PEFT_SCHEME} \ 63 | model.micro_batch_size=1 \ 64 | model.global_batch_size=128 \ 65 | model.tensor_model_parallel_size=${TP_SIZE} \ 66 | model.pipeline_model_parallel_size=${PP_SIZE} \ 67 | model.megatron_amp_O2=True \ 68 | model.activations_checkpoint_granularity=selective \ 69 | model.activations_checkpoint_num_layers=null \ 70 | model.activations_checkpoint_method=uniform \ 71 | model.optim.name=fused_adam \ 72 | model.optim.lr=1e-4 \ 73 | model.answer_only_loss=True \ 74 | model.data.train_ds.file_names=${TRAIN_DS} \ 75 | model.data.validation_ds.file_names=${VALID_DS} \ 76 | model.data.test_ds.file_names=${TEST_DS} \ 77 | model.data.train_ds.concat_sampling_probabilities=${CONCAT_SAMPLING_PROBS} \ 78 | model.data.train_ds.max_seq_length=2048 \ 79 | model.data.validation_ds.max_seq_length=2048 \ 80 | model.data.train_ds.micro_batch_size=1 \ 81 | model.data.train_ds.global_batch_size=128 \ 82 | model.data.validation_ds.micro_batch_size=1 \ 83 | model.data.validation_ds.global_batch_size=128 \ 84 | model.data.train_ds.num_workers=0 \ 85 | model.data.validation_ds.num_workers=0 \ 86 | model.data.test_ds.num_workers=0 \ 87 | model.data.validation_ds.metric.name=loss \ 88 | model.data.test_ds.metric.name=loss \ 89 | exp_manager.create_wandb_logger=False \ 90 | exp_manager.checkpoint_callback_params.mode=min \ 91 | exp_manager.explicit_log_dir=${OUTPUT_DIR} \ 92 | exp_manager.resume_if_exists=True \ 93 | exp_manager.resume_ignore_no_checkpoint=True \ 94 | exp_manager.create_checkpoint_callback=True \ 95 | exp_manager.checkpoint_callback_params.monitor=validation_loss \ 96 | ++exp_manager.checkpoint_callback_params.save_best_model=False \ 97 | exp_manager.checkpoint_callback_params.save_nemo_on_train_end=True \ 98 | model.save_nemo_on_validation_end=False 99 | --------------------------------------------------------------------------------