├── .gitignore ├── INSTALLATION.md ├── README.md ├── configs ├── base_fp16.yaml ├── base_fp32.yaml ├── dynamo_fp16.yaml └── dynamo_fp32.yaml ├── experiments ├── accelerate_script.py ├── base.py ├── base_fp16.py ├── dynamic.py ├── dynamic_fp16.py ├── dynamic_optimized.py ├── dynamic_optimized_fp16.py ├── generate_script.py ├── optimize_forward.py ├── optimize_forward_fp16.py ├── optimize_model.py ├── optimize_model_fp16.py ├── optimize_train_step.py └── optimize_train_step_fp16.py ├── requirements.txt ├── run_experiments.sh ├── scripts ├── cv_classification.py ├── language_modeling.py ├── text_classification.py └── translation.py └── tools ├── summarize.py └── verify_dynamo.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | trained-resnet 3 | -------------------------------------------------------------------------------- /INSTALLATION.md: -------------------------------------------------------------------------------- 1 | # Installation guide on a new instance 2 | 3 | Jump to the last section if you alrady have CUDA installed. 4 | 5 | ## Install drivers: 6 | 7 | ```bash 8 | sudo apt install ubuntu-drivers-common 9 | ``` 10 | 11 | Run 12 | 13 | ```bash 14 | ubuntu-drivers devices 15 | ``` 16 | 17 | Output 18 | 19 | ``` 20 | == /sys/devices/pci0000:00/0000:00:04.0== 21 | modalias : pci:v000010DEd000020B0sv000010DEsd0000134Fbc03sc02i00 22 | vendor : NVIDIA Corporation 23 | driver : nvidia-driver-470-server - distro non-free 24 | driver : nvidia-driver-515-open - distro non-free recommended 25 | driver : nvidia-driver-515 - distro non-free 26 | driver : nvidia-driver-450-server - distro non-free 27 | driver : nvidia-driver-510 - distro non-free 28 | driver : nvidia-driver-510-server - distro non-free 29 | driver : nvidia-driver-515-server - distro non-free 30 | driver : nvidia-driver-470 - distro non-free 31 | driver : xserver-xorg-video-nouveau - distro free builtin 32 | ``` 33 | 34 | Pick number recommended 35 | 36 | ``` 37 | sudo apt install nvidia-headless-515-server nvidia-utils-515-server 38 | ``` 39 | 40 | Reboot 41 | ```bash 42 | sudo reboot 43 | ``` 44 | 45 | ## Install CUDA 46 | 47 | ```bash 48 | wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin 49 | sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 50 | ``` 51 | 52 | Add public key/repo 53 | ```bash 54 | sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub 55 | sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" 56 | ``` 57 | 58 | Install cuda toolkit. 59 | If you want to use conda setup, please refer the corresponding section and skip below steps. 60 | 61 | ```bash 62 | sudo apt update 63 | sudo apt install cuda-toolkit-11-7 64 | ``` 65 | 66 | (you can type-hint after cuda-toolkit- to find all available versions.) 67 | 68 | Download [CUDNN](https://developer.nvidia.com/cudnn) and scp it to the instance. 69 | 70 | Extract 71 | 72 | ```bash 73 | tar -xf cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz 74 | sudo cp cudnn-linux-x86_64-8.6.0.163_cuda11-archive/include/cudnn*.h /usr/local/cuda/include 75 | sudo cp cudnn-linux-x86_64-8.6.0.163_cuda11-archive/lib/libcudnn* /usr/local/cuda/lib64 76 | sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn* 77 | ``` 78 | 79 | Add to .bashrc 80 | 81 | ``` 82 | export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64" 83 | export CUDA_HOME=/usr/local/cuda 84 | export PATH="/usr/local/cuda/:$PATH" 85 | ``` 86 | 87 | then 88 | 89 | ```bash 90 | source ~/.bashrc 91 | ``` 92 | 93 | Check everything is alright 94 | 95 | ```bash 96 | nvidia-smi 97 | ``` 98 | 99 | ## Python 100 | 101 | ```bash 102 | sudo apt-get install pip 103 | sudo apt install python3.8-venv 104 | python3 -m venv dynamo 105 | ``` 106 | 107 | Add to .bashrc 108 | ```bash 109 | source dynamo/bin/activate 110 | ``` 111 | 112 | then 113 | 114 | ```bash 115 | source ~/.bashrc 116 | ``` 117 | 118 | Install nightlies with dynamo 119 | 120 | ```bash 121 | pip install numpy 122 | pip install --pre torch[dynamo] --extra-index-url https://download.pytorch.org/whl/nightly/cu117/ 123 | ``` 124 | 125 | # Conda Installation instructions 126 | 127 | 1. Install miniconda 128 | ```bash 129 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 130 | bash Miniconda3-latest-Linux-x86_64.sh 131 | ``` 132 | 133 | 2. Create conda python env and then activate it 134 | ```bash 135 | conda create --name dynamo python 136 | conda activate dynamo 137 | ``` 138 | 139 | 3. Install cudatoolkit-11.7. Please refer [cuda-toolkit](https://anaconda.org/nvidia/cuda-toolkit) 140 | for more information 141 | ```bash 142 | conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit 143 | ``` 144 | 145 | 4. Install PyTorch along with Torch Dynamo dependencies 146 | ```bash 147 | pip install numpy 148 | pip install --pre torch[dynamo] --extra-index-url https://download.pytorch.org/whl/nightly/cu117/ 149 | ``` 150 | 151 | 5. Verify torch-dynamo with below command assuming you are in the top folder 152 | ``` 153 | python tools/verify_dynamo.py 154 | ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Repo to test torchdynamo 2 | 3 | `pip install -r requirements.txt` 4 | 5 | ## Experiments 6 | 7 | This folder contains quick reproducers to observe different behaviors. The base scripts give a benchmark without any optimization. Then we can observe what happens when optimizing different things. This table regroups the average iteration time (alle executed on an A100): 8 | 9 | Batch size 16: 10 | 11 | | Script | FP32 | FP16 | 12 | |:--|:-:|:-:| 13 | | base | 54.44ms | 62.24ms | 14 | | optimize_model | 38.20ms | 29.85ms | 15 | | optimize_forward | 38.36ms | 29.49ms | 16 | | train_step | x | x | 17 | 18 | Batch size 8: 19 | 20 | | Script | FP32 | FP16 | 21 | |:--|:-:|:-:| 22 | | base | 53.47ms | 59.68ms | 23 | | optimize_model | 28.34ms | 23.80ms | 24 | | optimize_forward | 28.16ms | 29.34ms | 25 | | train_step | 1754.47ms | 1740.21ms | 26 | 27 | Using torchdynamo to optimize the train step does not really work lots of warning like this and the times are not right: 28 | 29 | ``` 30 | [2022-11-04 15:32:56,201] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation 31 | [2022-11-04 15:32:56,201] torch._inductor.compile_fx: [WARNING] Aot Autograd is not safe to run, so falling back to eager 32 | ``` 33 | 34 | Reproducer: `python experiments/optimize_train_step_fp16.py` 35 | 36 | Dynamic 37 | 38 | | Script | FP32 | FP16 | 39 | |:--|:-:|:-:| 40 | | dynamic | 59.23ms | 63.53ms | 41 | | dyanmic_optimized | OOM? | OOM? | 42 | 43 | ## Scripts 44 | 45 | ### Text classification 46 | 47 | Iteration avg time for training/evaluation (excluding the first) when fine-tuning BERT on MRPC. Final results of the models are within the variance of fine-tuning, no particular performance drop observed except for FP16 + torchdynamo which seems to underperform a bit (more like 82%-84% accuracy compared to 86%-87% for other tests and 0.86/0.87 F1 score instead of 0.89/0.90). 48 | 49 | | Dynamo | FP32 | FP16 | 50 | |:--|:-:|:-:| 51 | | no | 57.9ms/15.65ms | 65.87ms/18.52ms | 52 | | inductor | 36.24ms/10.55ms | 39.43ms/9.09ms | 53 | 54 | To reproduce: 55 | 56 | ```bash 57 | accelerate launch --config_file configs/base_fp32.yaml scripts/text_classification.py --task_name mrpc 58 | ``` 59 | 60 | and change the config file to one of the four options in `configs` to get the four squares. 61 | 62 | 63 | ### Language Modeling 64 | 65 | ```bash 66 | accelerate launch scripts/language_modeling.py \ 67 | --dataset_name wikitext \ 68 | --dataset_config_name wikitext-2-raw-v1 \ 69 | --model_name_or_path gpt2 \ 70 | --dynamo_backend inductor \ 71 | --mixed_precision fp16 72 | ``` 73 | 74 | ### Vision Classification 75 | 76 | ```bash 77 | accelerate launch scripts/cv_classification.py \ 78 | --model_name_or_path microsoft/resnet-18 \ 79 | --dataset_name beans \ 80 | --dynamo_backend inductor \ 81 | --mixed_precision no 82 | ``` -------------------------------------------------------------------------------- /configs/base_fp16.yaml: -------------------------------------------------------------------------------- 1 | command_file: null 2 | commands: null 3 | compute_environment: LOCAL_MACHINE 4 | deepspeed_config: {} 5 | distributed_type: 'NO' 6 | downcast_bf16: 'no' 7 | dynamo_backend: 'NO' 8 | fsdp_config: {} 9 | gpu_ids: all 10 | machine_rank: 0 11 | main_process_ip: null 12 | main_process_port: null 13 | main_training_function: main 14 | megatron_lm_config: {} 15 | mixed_precision: fp16 16 | num_machines: 1 17 | num_processes: 1 18 | rdzv_backend: static 19 | same_network: true 20 | tpu_name: null 21 | tpu_zone: null 22 | use_cpu: false 23 | -------------------------------------------------------------------------------- /configs/base_fp32.yaml: -------------------------------------------------------------------------------- 1 | command_file: null 2 | commands: null 3 | compute_environment: LOCAL_MACHINE 4 | deepspeed_config: {} 5 | distributed_type: 'NO' 6 | downcast_bf16: 'no' 7 | dynamo_backend: 'NO' 8 | fsdp_config: {} 9 | gpu_ids: all 10 | machine_rank: 0 11 | main_process_ip: null 12 | main_process_port: null 13 | main_training_function: main 14 | megatron_lm_config: {} 15 | mixed_precision: 'no' 16 | num_machines: 1 17 | num_processes: 1 18 | rdzv_backend: static 19 | same_network: true 20 | tpu_name: null 21 | tpu_zone: null 22 | use_cpu: false 23 | -------------------------------------------------------------------------------- /configs/dynamo_fp16.yaml: -------------------------------------------------------------------------------- 1 | command_file: null 2 | commands: null 3 | compute_environment: LOCAL_MACHINE 4 | deepspeed_config: {} 5 | distributed_type: 'NO' 6 | downcast_bf16: 'no' 7 | dynamo_backend: INDUCTOR 8 | fsdp_config: {} 9 | gpu_ids: all 10 | machine_rank: 0 11 | main_process_ip: null 12 | main_process_port: null 13 | main_training_function: main 14 | megatron_lm_config: {} 15 | mixed_precision: fp16 16 | num_machines: 1 17 | num_processes: 1 18 | rdzv_backend: static 19 | same_network: true 20 | tpu_name: null 21 | tpu_zone: null 22 | use_cpu: false 23 | -------------------------------------------------------------------------------- /configs/dynamo_fp32.yaml: -------------------------------------------------------------------------------- 1 | command_file: null 2 | commands: null 3 | compute_environment: LOCAL_MACHINE 4 | deepspeed_config: {} 5 | distributed_type: 'NO' 6 | downcast_bf16: 'no' 7 | dynamo_backend: INDUCTOR 8 | fsdp_config: {} 9 | gpu_ids: all 10 | machine_rank: 0 11 | main_process_ip: null 12 | main_process_port: null 13 | main_training_function: main 14 | megatron_lm_config: {} 15 | mixed_precision: 'no' 16 | num_machines: 1 17 | num_processes: 1 18 | rdzv_backend: static 19 | same_network: true 20 | tpu_name: null 21 | tpu_zone: null 22 | use_cpu: false 23 | -------------------------------------------------------------------------------- /experiments/accelerate_script.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | from torch.optim import AdamW 6 | from accelerate import Accelerator 7 | from transformers import AutoModelForMaskedLM, AutoTokenizer 8 | 9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 10 | 11 | == History == 12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 13 | 14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 15 | 16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 17 | 18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 19 | 20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 21 | 22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 23 | 24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 25 | 26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 27 | 28 | == Services and technologies == 29 | === Transformers Library === 30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 31 | 32 | 33 | === Hugging Face Hub === 34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 35 | 36 | == References == 37 | {{Reflist}} 38 | 39 | {{Portal bar|Companies}} 40 | 41 | {{DEFAULTSORT:Hugging Face}} 42 | [[Category:Machine learning]] 43 | [[Category:Open-source artificial intelligence]] 44 | 45 | """ 46 | 47 | def parse_args(): 48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 49 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 50 | parser.add_argument("--batch_size", type=int, default=16) 51 | parser.add_argument("--num_batches", type=int, default=100) 52 | 53 | args = parser.parse_args() 54 | return args 55 | 56 | class DataLoader(): 57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 59 | self.batch_size = batch_size 60 | self.num_batches = num_batches 61 | self.seq_len = seq_len 62 | self.mask_token_id = tokenizer.mask_token_id 63 | 64 | def __iter__(self): 65 | for _ in range(self.num_batches): 66 | masked_samples = [] 67 | samples = [] 68 | for _ in range(self.batch_size): 69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1) 70 | tokens = self.tokenized_corpus[start: start + self.seq_len] 71 | samples.append(tokens) 72 | 73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 74 | masked_samples.append(masked_tokens) 75 | 76 | 77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 78 | 79 | def __len__(self): 80 | return self.num_batches 81 | 82 | 83 | def main(): 84 | args = parse_args() 85 | accelerator = Accelerator() 86 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 87 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 88 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 89 | optimizer = AdamW(model.parameters(), lr=1e-4) 90 | 91 | model = model.train() 92 | model, optimizer = accelerator.prepare(model, optimizer) 93 | 94 | start_time = time.time() 95 | for step, batch in enumerate(train_dl): 96 | batch = {k: v.to(accelerator.device) for k, v in batch.items()} 97 | output = model(**batch) 98 | loss = output.loss 99 | loss.backward() 100 | optimizer.step() 101 | optimizer.zero_grad() 102 | if step == 0: 103 | first_step_time = time.time() - start_time 104 | 105 | total_training_time = time.time() - start_time 106 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 107 | print("Training finished.") 108 | print(f"First iteration took: {first_step_time:.2f}s") 109 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 110 | 111 | if __name__ == "__main__": 112 | main() 113 | -------------------------------------------------------------------------------- /experiments/base.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | from torch.optim import AdamW 6 | from transformers import AutoModelForMaskedLM, AutoTokenizer 7 | 8 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 9 | 10 | == History == 11 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 12 | 13 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 14 | 15 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 16 | 17 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 18 | 19 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 20 | 21 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 22 | 23 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 24 | 25 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 26 | 27 | == Services and technologies == 28 | === Transformers Library === 29 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 30 | 31 | 32 | === Hugging Face Hub === 33 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 34 | 35 | == References == 36 | {{Reflist}} 37 | 38 | {{Portal bar|Companies}} 39 | 40 | {{DEFAULTSORT:Hugging Face}} 41 | [[Category:Machine learning]] 42 | [[Category:Open-source artificial intelligence]] 43 | 44 | """ 45 | 46 | torch.backends.cuda.matmul.allow_tf32 = True 47 | 48 | def parse_args(): 49 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 50 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 51 | parser.add_argument("--batch_size", type=int, default=16) 52 | parser.add_argument("--num_batches", type=int, default=100) 53 | 54 | args = parser.parse_args() 55 | return args 56 | 57 | class DataLoader(): 58 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 59 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 60 | self.batch_size = batch_size 61 | self.num_batches = num_batches 62 | self.seq_len = seq_len 63 | self.mask_token_id = tokenizer.mask_token_id 64 | 65 | def __iter__(self): 66 | for _ in range(self.num_batches): 67 | masked_samples = [] 68 | samples = [] 69 | for _ in range(self.batch_size): 70 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1) 71 | tokens = self.tokenized_corpus[start: start + self.seq_len] 72 | samples.append(tokens) 73 | 74 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 75 | masked_samples.append(masked_tokens) 76 | 77 | 78 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 79 | 80 | def __len__(self): 81 | return self.num_batches 82 | 83 | 84 | def main(): 85 | args = parse_args() 86 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 87 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 88 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 89 | optimizer = AdamW(model.parameters(), lr=1e-4) 90 | 91 | device = "cuda" if torch.cuda.is_available() else "cpu" 92 | model = model.to(device).train() 93 | 94 | start_time = time.time() 95 | for step, batch in enumerate(train_dl): 96 | batch = {k: v.to(device) for k, v in batch.items()} 97 | output = model(**batch) 98 | loss = output.loss 99 | loss.backward() 100 | optimizer.step() 101 | optimizer.zero_grad() 102 | if step == 0: 103 | first_step_time = time.time() - start_time 104 | 105 | total_training_time = time.time() - start_time 106 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 107 | print("Training finished.") 108 | print(f"First iteration took: {first_step_time:.2f}s") 109 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 110 | 111 | if __name__ == "__main__": 112 | main() 113 | -------------------------------------------------------------------------------- /experiments/base_fp16.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | from torch.optim import AdamW 6 | from transformers import AutoModelForMaskedLM, AutoTokenizer 7 | 8 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 9 | 10 | == History == 11 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 12 | 13 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 14 | 15 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 16 | 17 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 18 | 19 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 20 | 21 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 22 | 23 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 24 | 25 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 26 | 27 | == Services and technologies == 28 | === Transformers Library === 29 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 30 | 31 | 32 | === Hugging Face Hub === 33 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 34 | 35 | == References == 36 | {{Reflist}} 37 | 38 | {{Portal bar|Companies}} 39 | 40 | {{DEFAULTSORT:Hugging Face}} 41 | [[Category:Machine learning]] 42 | [[Category:Open-source artificial intelligence]] 43 | 44 | """ 45 | 46 | def parse_args(): 47 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 48 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 49 | parser.add_argument("--batch_size", type=int, default=16) 50 | parser.add_argument("--num_batches", type=int, default=100) 51 | 52 | args = parser.parse_args() 53 | return args 54 | 55 | class DataLoader(): 56 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 57 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 58 | self.batch_size = batch_size 59 | self.num_batches = num_batches 60 | self.seq_len = seq_len 61 | self.mask_token_id = tokenizer.mask_token_id 62 | 63 | def __iter__(self): 64 | for _ in range(self.num_batches): 65 | masked_samples = [] 66 | samples = [] 67 | for _ in range(self.batch_size): 68 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1) 69 | tokens = self.tokenized_corpus[start: start + self.seq_len] 70 | samples.append(tokens) 71 | 72 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 73 | masked_samples.append(masked_tokens) 74 | 75 | 76 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 77 | 78 | def __len__(self): 79 | return self.num_batches 80 | 81 | 82 | def main(): 83 | args = parse_args() 84 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 85 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 86 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 87 | optimizer = AdamW(model.parameters(), lr=1e-4) 88 | 89 | device = "cuda" if torch.cuda.is_available() else "cpu" 90 | model = model.to(device).train() 91 | 92 | start_time = time.time() 93 | for step, batch in enumerate(train_dl): 94 | batch = {k: v.to(device) for k, v in batch.items()} 95 | with torch.cuda.amp.autocast(): 96 | output = model(**batch) 97 | loss = output.loss 98 | loss.backward() 99 | optimizer.step() 100 | optimizer.zero_grad() 101 | if step == 0: 102 | first_step_time = time.time() - start_time 103 | 104 | total_training_time = time.time() - start_time 105 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 106 | print("Training finished.") 107 | print(f"First iteration took: {first_step_time:.2f}s") 108 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 109 | 110 | if __name__ == "__main__": 111 | main() 112 | -------------------------------------------------------------------------------- /experiments/dynamic.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | from torch.optim import AdamW 6 | from transformers import AutoModelForMaskedLM, AutoTokenizer 7 | 8 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 9 | 10 | == History == 11 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 12 | 13 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 14 | 15 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 16 | 17 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 18 | 19 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 20 | 21 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 22 | 23 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 24 | 25 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 26 | 27 | == Services and technologies == 28 | === Transformers Library === 29 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 30 | 31 | 32 | === Hugging Face Hub === 33 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 34 | 35 | == References == 36 | {{Reflist}} 37 | 38 | {{Portal bar|Companies}} 39 | 40 | {{DEFAULTSORT:Hugging Face}} 41 | [[Category:Machine learning]] 42 | [[Category:Open-source artificial intelligence]] 43 | 44 | """ 45 | 46 | torch.backends.cuda.matmul.allow_tf32 = True 47 | 48 | def parse_args(): 49 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 50 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 51 | parser.add_argument("--batch_size", type=int, default=16) 52 | parser.add_argument("--num_batches", type=int, default=100) 53 | 54 | args = parser.parse_args() 55 | return args 56 | 57 | class DataLoader(): 58 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 59 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 60 | self.batch_size = batch_size 61 | self.num_batches = num_batches 62 | self.seq_len = seq_len 63 | self.mask_token_id = tokenizer.mask_token_id 64 | 65 | def __iter__(self): 66 | for _ in range(self.num_batches): 67 | masked_samples = [] 68 | samples = [] 69 | seq_len = random.randint(self.seq_len // 8, self.seq_len // 4 - 1) * 8 70 | for _ in range(self.batch_size): 71 | start = random.randint(0, len(self.tokenized_corpus) - seq_len - 1) 72 | tokens = self.tokenized_corpus[start: start + seq_len] 73 | samples.append(tokens) 74 | 75 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 76 | masked_samples.append(masked_tokens) 77 | 78 | 79 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 80 | 81 | def __len__(self): 82 | return self.num_batches 83 | 84 | 85 | def main(): 86 | args = parse_args() 87 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 88 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 89 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 90 | optimizer = AdamW(model.parameters(), lr=1e-4) 91 | 92 | device = "cuda" if torch.cuda.is_available() else "cpu" 93 | model = model.to(device).train() 94 | 95 | start_time = time.time() 96 | for step, batch in enumerate(train_dl): 97 | batch = {k: v.to(device) for k, v in batch.items()} 98 | output = model(**batch) 99 | loss = output.loss 100 | loss.backward() 101 | optimizer.step() 102 | optimizer.zero_grad() 103 | 104 | if step == 0: 105 | first_step_time = time.time() - start_time 106 | 107 | total_training_time = time.time() - start_time 108 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 109 | print("Training finished.") 110 | print(f"First iteration took: {first_step_time:.2f}s") 111 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 112 | 113 | 114 | 115 | 116 | 117 | 118 | if __name__ == "__main__": 119 | main() 120 | -------------------------------------------------------------------------------- /experiments/dynamic_fp16.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | from torch.optim import AdamW 6 | from transformers import AutoModelForMaskedLM, AutoTokenizer 7 | 8 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 9 | 10 | == History == 11 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 12 | 13 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 14 | 15 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 16 | 17 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 18 | 19 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 20 | 21 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 22 | 23 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 24 | 25 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 26 | 27 | == Services and technologies == 28 | === Transformers Library === 29 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 30 | 31 | 32 | === Hugging Face Hub === 33 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 34 | 35 | == References == 36 | {{Reflist}} 37 | 38 | {{Portal bar|Companies}} 39 | 40 | {{DEFAULTSORT:Hugging Face}} 41 | [[Category:Machine learning]] 42 | [[Category:Open-source artificial intelligence]] 43 | 44 | """ 45 | 46 | def parse_args(): 47 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 48 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 49 | parser.add_argument("--batch_size", type=int, default=16) 50 | parser.add_argument("--num_batches", type=int, default=100) 51 | 52 | args = parser.parse_args() 53 | return args 54 | 55 | class DataLoader(): 56 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 57 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 58 | self.batch_size = batch_size 59 | self.num_batches = num_batches 60 | self.seq_len = seq_len 61 | self.mask_token_id = tokenizer.mask_token_id 62 | 63 | def __iter__(self): 64 | for _ in range(self.num_batches): 65 | masked_samples = [] 66 | samples = [] 67 | seq_len = random.randint(self.seq_len // 8, self.seq_len // 4 - 1) * 8 68 | for _ in range(self.batch_size): 69 | start = random.randint(0, len(self.tokenized_corpus) - seq_len - 1) 70 | tokens = self.tokenized_corpus[start: start + seq_len] 71 | samples.append(tokens) 72 | 73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 74 | masked_samples.append(masked_tokens) 75 | 76 | 77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 78 | 79 | def __len__(self): 80 | return self.num_batches 81 | 82 | 83 | def main(): 84 | args = parse_args() 85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 88 | optimizer = AdamW(model.parameters(), lr=1e-4) 89 | 90 | device = "cuda" if torch.cuda.is_available() else "cpu" 91 | model = model.to(device).train() 92 | 93 | start_time = time.time() 94 | for step, batch in enumerate(train_dl): 95 | batch = {k: v.to(device) for k, v in batch.items()} 96 | with torch.cuda.amp.autocast(): 97 | output = model(**batch) 98 | loss = output.loss 99 | loss.backward() 100 | optimizer.step() 101 | optimizer.zero_grad() 102 | 103 | if step == 0: 104 | first_step_time = time.time() - start_time 105 | 106 | total_training_time = time.time() - start_time 107 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 108 | print("Training finished.") 109 | print(f"First iteration took: {first_step_time:.2f}s") 110 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 111 | 112 | 113 | 114 | 115 | 116 | 117 | if __name__ == "__main__": 118 | main() 119 | -------------------------------------------------------------------------------- /experiments/dynamic_optimized.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | import torch._dynamo as dynamo 6 | from torch.optim import AdamW 7 | from transformers import AutoModelForMaskedLM, AutoTokenizer 8 | 9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 10 | 11 | == History == 12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 13 | 14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 15 | 16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 17 | 18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 19 | 20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 21 | 22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 23 | 24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 25 | 26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 27 | 28 | == Services and technologies == 29 | === Transformers Library === 30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 31 | 32 | 33 | === Hugging Face Hub === 34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 35 | 36 | == References == 37 | {{Reflist}} 38 | 39 | {{Portal bar|Companies}} 40 | 41 | {{DEFAULTSORT:Hugging Face}} 42 | [[Category:Machine learning]] 43 | [[Category:Open-source artificial intelligence]] 44 | 45 | """ 46 | 47 | torch.backends.cuda.matmul.allow_tf32 = True 48 | 49 | def parse_args(): 50 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 51 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 52 | parser.add_argument("--batch_size", type=int, default=16) 53 | parser.add_argument("--num_batches", type=int, default=100) 54 | 55 | args = parser.parse_args() 56 | return args 57 | 58 | class DataLoader(): 59 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 60 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 61 | self.batch_size = batch_size 62 | self.num_batches = num_batches 63 | self.seq_len = seq_len 64 | self.mask_token_id = tokenizer.mask_token_id 65 | 66 | def __iter__(self): 67 | for _ in range(self.num_batches): 68 | masked_samples = [] 69 | samples = [] 70 | seq_len = random.randint(self.seq_len // 8, self.seq_len // 4 - 1) * 8 71 | for _ in range(self.batch_size): 72 | start = random.randint(0, len(self.tokenized_corpus) - seq_len - 1) 73 | tokens = self.tokenized_corpus[start: start + seq_len] 74 | samples.append(tokens) 75 | 76 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 77 | masked_samples.append(masked_tokens) 78 | 79 | 80 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 81 | 82 | def __len__(self): 83 | return self.num_batches 84 | 85 | 86 | def main(): 87 | args = parse_args() 88 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 89 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 90 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 91 | optimizer = AdamW(model.parameters(), lr=1e-4) 92 | 93 | device = "cuda" if torch.cuda.is_available() else "cpu" 94 | model = model.to(device).train() 95 | 96 | model = dynamo.optimize("inductor")(model) 97 | 98 | start_time = time.time() 99 | for step, batch in enumerate(train_dl): 100 | batch = {k: v.to(device) for k, v in batch.items()} 101 | output = model(**batch) 102 | loss = output.loss 103 | loss.backward() 104 | optimizer.step() 105 | optimizer.zero_grad() 106 | 107 | if step == 0: 108 | first_step_time = time.time() - start_time 109 | 110 | total_training_time = time.time() - start_time 111 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 112 | print("Training finished.") 113 | print(f"First iteration took: {first_step_time:.2f}s") 114 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 115 | 116 | 117 | 118 | 119 | 120 | 121 | if __name__ == "__main__": 122 | main() 123 | -------------------------------------------------------------------------------- /experiments/dynamic_optimized_fp16.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | import torch._dynamo as dynamo 6 | from torch.optim import AdamW 7 | from transformers import AutoModelForMaskedLM, AutoTokenizer 8 | 9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 10 | 11 | == History == 12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 13 | 14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 15 | 16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 17 | 18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 19 | 20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 21 | 22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 23 | 24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 25 | 26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 27 | 28 | == Services and technologies == 29 | === Transformers Library === 30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 31 | 32 | 33 | === Hugging Face Hub === 34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 35 | 36 | == References == 37 | {{Reflist}} 38 | 39 | {{Portal bar|Companies}} 40 | 41 | {{DEFAULTSORT:Hugging Face}} 42 | [[Category:Machine learning]] 43 | [[Category:Open-source artificial intelligence]] 44 | 45 | """ 46 | 47 | def parse_args(): 48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 49 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 50 | parser.add_argument("--batch_size", type=int, default=16) 51 | parser.add_argument("--num_batches", type=int, default=100) 52 | 53 | args = parser.parse_args() 54 | return args 55 | 56 | class DataLoader(): 57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 59 | self.batch_size = batch_size 60 | self.num_batches = num_batches 61 | self.seq_len = seq_len 62 | self.mask_token_id = tokenizer.mask_token_id 63 | 64 | def __iter__(self): 65 | for _ in range(self.num_batches): 66 | masked_samples = [] 67 | samples = [] 68 | seq_len = random.randint(self.seq_len // 8, self.seq_len // 4 - 1) * 8 69 | for _ in range(self.batch_size): 70 | start = random.randint(0, len(self.tokenized_corpus) - seq_len - 1) 71 | tokens = self.tokenized_corpus[start: start + seq_len] 72 | samples.append(tokens) 73 | 74 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 75 | masked_samples.append(masked_tokens) 76 | 77 | 78 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 79 | 80 | def __len__(self): 81 | return self.num_batches 82 | 83 | 84 | def main(): 85 | args = parse_args() 86 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 87 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 88 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 89 | optimizer = AdamW(model.parameters(), lr=1e-4) 90 | 91 | device = "cuda" if torch.cuda.is_available() else "cpu" 92 | model = model.to(device).train() 93 | 94 | model = dynamo.optimize("inductor")(model) 95 | 96 | start_time = time.time() 97 | for step, batch in enumerate(train_dl): 98 | batch = {k: v.to(device) for k, v in batch.items()} 99 | with torch.cuda.amp.autocast(): 100 | output = model(**batch) 101 | loss = output.loss 102 | loss.backward() 103 | optimizer.step() 104 | optimizer.zero_grad() 105 | 106 | if step == 0: 107 | first_step_time = time.time() - start_time 108 | 109 | total_training_time = time.time() - start_time 110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 111 | print("Training finished.") 112 | print(f"First iteration took: {first_step_time:.2f}s") 113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 114 | 115 | 116 | 117 | 118 | 119 | 120 | if __name__ == "__main__": 121 | main() 122 | -------------------------------------------------------------------------------- /experiments/generate_script.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | 6 | from accelerate import Accelerator 7 | from transformers import AutoModelForCausalLM, AutoTokenizer 8 | 9 | torch.backends.cuda.matmul.allow_tf32 = True 10 | 11 | 12 | def parse_args(): 13 | parser = argparse.ArgumentParser(description="Make a couple of generations") 14 | parser.add_argument("--model_name", type=str, default="gpt2") 15 | args = parser.parse_args() 16 | return args 17 | 18 | 19 | def main(): 20 | args = parse_args() 21 | accelerator = Accelerator() 22 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 23 | inputs = tokenizer(["Once upon a time,"] * 8, return_tensors="pt") 24 | model = AutoModelForCausalLM.from_pretrained(args.model_name) 25 | 26 | model = model.eval() 27 | model = accelerator.prepare(model) 28 | 29 | start_time = time.time() 30 | for step in range(50): 31 | batch = {k: v.to(accelerator.device) for k, v in inputs.items()} 32 | output = model.generate(**batch) 33 | if step == 0: 34 | first_step_time = time.time() - start_time 35 | 36 | total_training_time = time.time() - start_time 37 | avg_iteration_time = (total_training_time - first_step_time) / (50 - 1) 38 | print("Generations finished.") 39 | print(f"First iteration took: {first_step_time:.2f}s") 40 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 41 | 42 | 43 | if __name__ == "__main__": 44 | main() 45 | -------------------------------------------------------------------------------- /experiments/optimize_forward.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | import torch._dynamo as dynamo 6 | from torch.optim import AdamW 7 | from transformers import AutoModelForMaskedLM, AutoTokenizer 8 | 9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 10 | 11 | == History == 12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 13 | 14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 15 | 16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 17 | 18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 19 | 20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 21 | 22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 23 | 24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 25 | 26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 27 | 28 | == Services and technologies == 29 | === Transformers Library === 30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 31 | 32 | 33 | === Hugging Face Hub === 34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 35 | 36 | == References == 37 | {{Reflist}} 38 | 39 | {{Portal bar|Companies}} 40 | 41 | {{DEFAULTSORT:Hugging Face}} 42 | [[Category:Machine learning]] 43 | [[Category:Open-source artificial intelligence]] 44 | 45 | """ 46 | 47 | torch.backends.cuda.matmul.allow_tf32 = True 48 | 49 | def parse_args(): 50 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 51 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 52 | parser.add_argument("--batch_size", type=int, default=16) 53 | parser.add_argument("--num_batches", type=int, default=100) 54 | 55 | args = parser.parse_args() 56 | return args 57 | 58 | class DataLoader(): 59 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 60 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 61 | self.batch_size = batch_size 62 | self.num_batches = num_batches 63 | self.seq_len = seq_len 64 | self.mask_token_id = tokenizer.mask_token_id 65 | 66 | def __iter__(self): 67 | for _ in range(self.num_batches): 68 | masked_samples = [] 69 | samples = [] 70 | for _ in range(self.batch_size): 71 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1) 72 | tokens = self.tokenized_corpus[start: start + self.seq_len] 73 | samples.append(tokens) 74 | 75 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 76 | masked_samples.append(masked_tokens) 77 | 78 | 79 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 80 | 81 | def __len__(self): 82 | return self.num_batches 83 | 84 | 85 | def main(): 86 | args = parse_args() 87 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 88 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 89 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 90 | optimizer = AdamW(model.parameters(), lr=1e-4) 91 | 92 | device = "cuda" if torch.cuda.is_available() else "cpu" 93 | model = model.to(device).train() 94 | 95 | model = dynamo.optimize("inductor")(model) 96 | 97 | start_time = time.time() 98 | for step, batch in enumerate(train_dl): 99 | batch = {k: v.to(device) for k, v in batch.items()} 100 | output = model(**batch) 101 | loss = output.loss 102 | loss.backward() 103 | optimizer.step() 104 | optimizer.zero_grad() 105 | 106 | if step == 0: 107 | first_step_time = time.time() - start_time 108 | 109 | total_training_time = time.time() - start_time 110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 111 | print("Training finished.") 112 | print(f"First iteration took: {first_step_time:.2f}s") 113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 114 | 115 | 116 | 117 | 118 | 119 | 120 | if __name__ == "__main__": 121 | main() 122 | -------------------------------------------------------------------------------- /experiments/optimize_forward_fp16.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | import torch._dynamo as dynamo 6 | from torch.optim import AdamW 7 | from transformers import AutoModelForMaskedLM, AutoTokenizer 8 | 9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 10 | 11 | == History == 12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 13 | 14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 15 | 16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 17 | 18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 19 | 20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 21 | 22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 23 | 24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 25 | 26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 27 | 28 | == Services and technologies == 29 | === Transformers Library === 30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 31 | 32 | 33 | === Hugging Face Hub === 34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 35 | 36 | == References == 37 | {{Reflist}} 38 | 39 | {{Portal bar|Companies}} 40 | 41 | {{DEFAULTSORT:Hugging Face}} 42 | [[Category:Machine learning]] 43 | [[Category:Open-source artificial intelligence]] 44 | 45 | """ 46 | 47 | def parse_args(): 48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 49 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 50 | parser.add_argument("--batch_size", type=int, default=16) 51 | parser.add_argument("--num_batches", type=int, default=100) 52 | 53 | args = parser.parse_args() 54 | return args 55 | 56 | class DataLoader(): 57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 59 | self.batch_size = batch_size 60 | self.num_batches = num_batches 61 | self.seq_len = seq_len 62 | self.mask_token_id = tokenizer.mask_token_id 63 | 64 | def __iter__(self): 65 | for _ in range(self.num_batches): 66 | masked_samples = [] 67 | samples = [] 68 | for _ in range(self.batch_size): 69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1) 70 | tokens = self.tokenized_corpus[start: start + self.seq_len] 71 | samples.append(tokens) 72 | 73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 74 | masked_samples.append(masked_tokens) 75 | 76 | 77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 78 | 79 | def __len__(self): 80 | return self.num_batches 81 | 82 | 83 | def main(): 84 | args = parse_args() 85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 88 | optimizer = AdamW(model.parameters(), lr=1e-4) 89 | 90 | device = "cuda" if torch.cuda.is_available() else "cpu" 91 | model = model.to(device).train() 92 | 93 | model.forward = dynamo.optimize("inductor")(model.forward) 94 | 95 | start_time = time.time() 96 | for step, batch in enumerate(train_dl): 97 | batch = {k: v.to(device) for k, v in batch.items()} 98 | with torch.cuda.amp.autocast(): 99 | output = model(**batch) 100 | loss = output.loss 101 | loss.backward() 102 | optimizer.step() 103 | optimizer.zero_grad() 104 | 105 | if step == 0: 106 | first_step_time = time.time() - start_time 107 | 108 | total_training_time = time.time() - start_time 109 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 110 | print("Training finished.") 111 | print(f"First iteration took: {first_step_time:.2f}s") 112 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 113 | 114 | 115 | 116 | 117 | 118 | 119 | if __name__ == "__main__": 120 | main() 121 | -------------------------------------------------------------------------------- /experiments/optimize_model.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | import torch._dynamo as dynamo 6 | from torch.optim import AdamW 7 | from transformers import AutoModelForMaskedLM, AutoTokenizer 8 | 9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 10 | 11 | == History == 12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 13 | 14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 15 | 16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 17 | 18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 19 | 20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 21 | 22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 23 | 24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 25 | 26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 27 | 28 | == Services and technologies == 29 | === Transformers Library === 30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 31 | 32 | 33 | === Hugging Face Hub === 34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 35 | 36 | == References == 37 | {{Reflist}} 38 | 39 | {{Portal bar|Companies}} 40 | 41 | {{DEFAULTSORT:Hugging Face}} 42 | [[Category:Machine learning]] 43 | [[Category:Open-source artificial intelligence]] 44 | 45 | """ 46 | 47 | torch.backends.cuda.matmul.allow_tf32 = True 48 | 49 | def parse_args(): 50 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 51 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 52 | parser.add_argument("--batch_size", type=int, default=16) 53 | parser.add_argument("--num_batches", type=int, default=100) 54 | 55 | args = parser.parse_args() 56 | return args 57 | 58 | class DataLoader(): 59 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 60 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 61 | self.batch_size = batch_size 62 | self.num_batches = num_batches 63 | self.seq_len = seq_len 64 | self.mask_token_id = tokenizer.mask_token_id 65 | 66 | def __iter__(self): 67 | for _ in range(self.num_batches): 68 | masked_samples = [] 69 | samples = [] 70 | for _ in range(self.batch_size): 71 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1) 72 | tokens = self.tokenized_corpus[start: start + self.seq_len] 73 | samples.append(tokens) 74 | 75 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 76 | masked_samples.append(masked_tokens) 77 | 78 | 79 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 80 | 81 | def __len__(self): 82 | return self.num_batches 83 | 84 | 85 | def main(): 86 | args = parse_args() 87 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 88 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 89 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 90 | optimizer = AdamW(model.parameters(), lr=1e-4) 91 | 92 | device = "cuda" if torch.cuda.is_available() else "cpu" 93 | model = model.to(device).train() 94 | 95 | model = dynamo.optimize("inductor")(model) 96 | 97 | start_time = time.time() 98 | for step, batch in enumerate(train_dl): 99 | batch = {k: v.to(device) for k, v in batch.items()} 100 | output = model(**batch) 101 | loss = output.loss 102 | loss.backward() 103 | optimizer.step() 104 | optimizer.zero_grad() 105 | 106 | if step == 0: 107 | first_step_time = time.time() - start_time 108 | 109 | total_training_time = time.time() - start_time 110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 111 | print("Training finished.") 112 | print(f"First iteration took: {first_step_time:.2f}s") 113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 114 | 115 | 116 | 117 | 118 | 119 | 120 | if __name__ == "__main__": 121 | main() 122 | -------------------------------------------------------------------------------- /experiments/optimize_model_fp16.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | import torch._dynamo as dynamo 6 | from torch.optim import AdamW 7 | from transformers import AutoModelForMaskedLM, AutoTokenizer 8 | 9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 10 | 11 | == History == 12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 13 | 14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 15 | 16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 17 | 18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 19 | 20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 21 | 22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 23 | 24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 25 | 26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 27 | 28 | == Services and technologies == 29 | === Transformers Library === 30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 31 | 32 | 33 | === Hugging Face Hub === 34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 35 | 36 | == References == 37 | {{Reflist}} 38 | 39 | {{Portal bar|Companies}} 40 | 41 | {{DEFAULTSORT:Hugging Face}} 42 | [[Category:Machine learning]] 43 | [[Category:Open-source artificial intelligence]] 44 | 45 | """ 46 | 47 | def parse_args(): 48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 49 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 50 | parser.add_argument("--batch_size", type=int, default=16) 51 | parser.add_argument("--num_batches", type=int, default=100) 52 | 53 | args = parser.parse_args() 54 | return args 55 | 56 | class DataLoader(): 57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 59 | self.batch_size = batch_size 60 | self.num_batches = num_batches 61 | self.seq_len = seq_len 62 | self.mask_token_id = tokenizer.mask_token_id 63 | 64 | def __iter__(self): 65 | for _ in range(self.num_batches): 66 | masked_samples = [] 67 | samples = [] 68 | for _ in range(self.batch_size): 69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1) 70 | tokens = self.tokenized_corpus[start: start + self.seq_len] 71 | samples.append(tokens) 72 | 73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 74 | masked_samples.append(masked_tokens) 75 | 76 | 77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 78 | 79 | def __len__(self): 80 | return self.num_batches 81 | 82 | 83 | def main(): 84 | args = parse_args() 85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 88 | optimizer = AdamW(model.parameters(), lr=1e-4) 89 | 90 | device = "cuda" if torch.cuda.is_available() else "cpu" 91 | model = model.to(device).train() 92 | 93 | model.forward = dynamo.optimize("inductor")(model.forward) 94 | 95 | start_time = time.time() 96 | for step, batch in enumerate(train_dl): 97 | batch = {k: v.to(device) for k, v in batch.items()} 98 | with torch.cuda.amp.autocast(): 99 | output = model(**batch) 100 | loss = output.loss 101 | loss.backward() 102 | optimizer.step() 103 | optimizer.zero_grad() 104 | 105 | if step == 0: 106 | first_step_time = time.time() - start_time 107 | 108 | total_training_time = time.time() - start_time 109 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 110 | print("Training finished.") 111 | print(f"First iteration took: {first_step_time:.2f}s") 112 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 113 | 114 | 115 | 116 | 117 | 118 | 119 | if __name__ == "__main__": 120 | main() 121 | -------------------------------------------------------------------------------- /experiments/optimize_train_step.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | import torch._dynamo as dynamo 6 | from torch.optim import AdamW 7 | from transformers import AutoModelForMaskedLM, AutoTokenizer 8 | 9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 10 | 11 | == History == 12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 13 | 14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 15 | 16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 17 | 18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 19 | 20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 21 | 22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 23 | 24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 25 | 26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 27 | 28 | == Services and technologies == 29 | === Transformers Library === 30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 31 | 32 | 33 | === Hugging Face Hub === 34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 35 | 36 | == References == 37 | {{Reflist}} 38 | 39 | {{Portal bar|Companies}} 40 | 41 | {{DEFAULTSORT:Hugging Face}} 42 | [[Category:Machine learning]] 43 | [[Category:Open-source artificial intelligence]] 44 | 45 | """ 46 | 47 | def parse_args(): 48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 49 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 50 | parser.add_argument("--batch_size", type=int, default=16) 51 | parser.add_argument("--num_batches", type=int, default=100) 52 | 53 | args = parser.parse_args() 54 | return args 55 | 56 | class DataLoader(): 57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 59 | self.batch_size = batch_size 60 | self.num_batches = num_batches 61 | self.seq_len = seq_len 62 | self.mask_token_id = tokenizer.mask_token_id 63 | 64 | def __iter__(self): 65 | for _ in range(self.num_batches): 66 | masked_samples = [] 67 | samples = [] 68 | for _ in range(self.batch_size): 69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1) 70 | tokens = self.tokenized_corpus[start: start + self.seq_len] 71 | samples.append(tokens) 72 | 73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 74 | masked_samples.append(masked_tokens) 75 | 76 | 77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 78 | 79 | def __len__(self): 80 | return self.num_batches 81 | 82 | 83 | def main(): 84 | args = parse_args() 85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 88 | optimizer = AdamW(model.parameters(), lr=1e-4) 89 | 90 | device = "cuda" if torch.cuda.is_available() else "cpu" 91 | model = model.to(device).train() 92 | 93 | @dynamo.optimize("inductor") 94 | def train_step(batch): 95 | output = model(**batch) 96 | loss = output.loss 97 | loss.backward() 98 | optimizer.step() 99 | 100 | start_time = time.time() 101 | for step, batch in enumerate(train_dl): 102 | batch = {k: v.to(device) for k, v in batch.items()} 103 | train_step(batch) 104 | optimizer.zero_grad() 105 | 106 | if step == 0: 107 | first_step_time = time.time() - start_time 108 | 109 | total_training_time = time.time() - start_time 110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 111 | print("Training finished.") 112 | print(f"First iteration took: {first_step_time:.2f}s") 113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 114 | 115 | 116 | 117 | 118 | 119 | 120 | if __name__ == "__main__": 121 | main() 122 | -------------------------------------------------------------------------------- /experiments/optimize_train_step_fp16.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import random 3 | import time 4 | import torch 5 | import torch._dynamo as dynamo 6 | from torch.optim import AdamW 7 | from transformers import AutoModelForMaskedLM, AutoTokenizer 8 | 9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}} It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets. 10 | 11 | == History == 12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}} After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning. 13 | 14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}} 15 | 16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}} In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}} 17 | 18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}} 19 | 20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}} The company received a $2 billion valuation. 21 | 22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}} 23 | 24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}} 25 | 26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}} 27 | 28 | == Services and technologies == 29 | === Transformers Library === 30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}} 31 | 32 | 33 | === Hugging Face Hub === 34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}} The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit. 35 | 36 | == References == 37 | {{Reflist}} 38 | 39 | {{Portal bar|Companies}} 40 | 41 | {{DEFAULTSORT:Hugging Face}} 42 | [[Category:Machine learning]] 43 | [[Category:Open-source artificial intelligence]] 44 | 45 | """ 46 | 47 | def parse_args(): 48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM") 49 | parser.add_argument("--model_name", type=str, default="bert-base-cased") 50 | parser.add_argument("--batch_size", type=int, default=16) 51 | parser.add_argument("--num_batches", type=int, default=100) 52 | 53 | args = parser.parse_args() 54 | return args 55 | 56 | class DataLoader(): 57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128): 58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids 59 | self.batch_size = batch_size 60 | self.num_batches = num_batches 61 | self.seq_len = seq_len 62 | self.mask_token_id = tokenizer.mask_token_id 63 | 64 | def __iter__(self): 65 | for _ in range(self.num_batches): 66 | masked_samples = [] 67 | samples = [] 68 | for _ in range(self.batch_size): 69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1) 70 | tokens = self.tokenized_corpus[start: start + self.seq_len] 71 | samples.append(tokens) 72 | 73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens] 74 | masked_samples.append(masked_tokens) 75 | 76 | 77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)} 78 | 79 | def __len__(self): 80 | return self.num_batches 81 | 82 | 83 | def main(): 84 | args = parse_args() 85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name) 86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name) 87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches) 88 | optimizer = AdamW(model.parameters(), lr=1e-4) 89 | 90 | device = "cuda" if torch.cuda.is_available() else "cpu" 91 | model = model.to(device).train() 92 | 93 | @dynamo.optimize("inductor") 94 | def train_step(batch): 95 | output = model(**batch) 96 | loss = output.loss 97 | loss.backward() 98 | optimizer.step() 99 | 100 | start_time = time.time() 101 | for step, batch in enumerate(train_dl): 102 | batch = {k: v.to(device) for k, v in batch.items()} 103 | train_step(batch) 104 | optimizer.zero_grad() 105 | 106 | if step == 0: 107 | first_step_time = time.time() - start_time 108 | 109 | total_training_time = time.time() - start_time 110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1) 111 | print("Training finished.") 112 | print(f"First iteration took: {first_step_time:.2f}s") 113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 114 | 115 | 116 | 117 | 118 | 119 | 120 | if __name__ == "__main__": 121 | main() 122 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | transformers 2 | datasets 3 | evaluate 4 | scikit-learn 5 | git+https://github.com/huggingface/accelerate@main -------------------------------------------------------------------------------- /run_experiments.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | if [[ -z $1 ]]; 4 | then 5 | echo "model_name_or_path not passed" 6 | exit 1 7 | else 8 | echo "model_name_or_path = $1" 9 | fi 10 | 11 | if [[ -z $2 ]]; 12 | then 13 | echo "num_runs not passed" 14 | exit 1 15 | else 16 | echo "num_runs = $2" 17 | fi 18 | 19 | if [[ -z $3 ]]; 20 | then 21 | echo "task_name not passed" 22 | exit 1 23 | else 24 | echo "task_name = $2" 25 | fi 26 | 27 | model_name_or_path=$1 28 | num_runs=$2 29 | task_name=$3 30 | 31 | for ((i = 1; i <= $num_runs; i++)); 32 | do 33 | echo "experiment run $i" 34 | 35 | case $task_name in 36 | "text_classification") 37 | echo "Running text_classification" 38 | echo "inductor backend with fp32" 39 | accelerate launch scripts/text_classification.py \ 40 | --task_name mrpc \ 41 | --seed $i \ 42 | --model_name_or_path $model_name_or_path \ 43 | --dynamo_backend inductor 44 | echo "no backend with fp32" 45 | accelerate launch scripts/text_classification.py \ 46 | --task_name mrpc \ 47 | --seed $i \ 48 | --model_name_or_path $model_name_or_path 49 | echo "inductor backend with fp16" 50 | accelerate launch scripts/text_classification.py \ 51 | --task_name mrpc \ 52 | --seed $i \ 53 | --model_name_or_path $model_name_or_path \ 54 | --dynamo_backend inductor \ 55 | --mixed_precision fp16 56 | echo "no backend with fp16" 57 | accelerate launch scripts/text_classification.py \ 58 | --task_name mrpc \ 59 | --seed $i \ 60 | --model_name_or_path $model_name_or_path \ 61 | --mixed_precision fp16 \ 62 | ;; 63 | "language_modeling") 64 | echo "Running language_modeling" 65 | echo "inductor backend with fp32" 66 | accelerate launch scripts/language_modeling.py \ 67 | --dataset_name wikitext \ 68 | --dataset_config_name wikitext-2-raw-v1 \ 69 | --seed $i \ 70 | --model_name_or_path $model_name_or_path \ 71 | --dynamo_backend inductor 72 | echo "no backend with fp32" 73 | accelerate launch scripts/language_modeling.py \ 74 | --dataset_name wikitext \ 75 | --dataset_config_name wikitext-2-raw-v1 \ 76 | --seed $i \ 77 | --model_name_or_path $model_name_or_path 78 | echo "inductor backend with fp16" 79 | accelerate launch scripts/language_modeling.py \ 80 | --dataset_name wikitext \ 81 | --dataset_config_name wikitext-2-raw-v1 \ 82 | --seed $i \ 83 | --model_name_or_path $model_name_or_path \ 84 | --dynamo_backend inductor \ 85 | --mixed_precision fp16 86 | echo "no backend with fp16" 87 | accelerate launch scripts/language_modeling.py \ 88 | --dataset_name wikitext \ 89 | --dataset_config_name wikitext-2-raw-v1 \ 90 | --seed $i \ 91 | --model_name_or_path $model_name_or_path \ 92 | --mixed_precision fp16 93 | ;; 94 | "cv_classification") 95 | echo "Running cv_classification" 96 | echo "inductor backend with fp32" 97 | accelerate launch scripts/cv_classification.py \ 98 | --dataset_name beans \ 99 | --seed $i \ 100 | --model_name_or_path $model_name_or_path \ 101 | --dynamo_backend inductor 102 | echo "no backend with fp32" 103 | accelerate launch scripts/cv_classification.py \ 104 | --dataset_name beans \ 105 | --seed $i \ 106 | --model_name_or_path $model_name_or_path 107 | echo "inductor backend with fp16" 108 | accelerate launch scripts/cv_classification.py \ 109 | --dataset_name beans \ 110 | --seed $i \ 111 | --model_name_or_path $model_name_or_path \ 112 | --dynamo_backend inductor \ 113 | --mixed_precision fp16 114 | echo "no backend with fp16" 115 | accelerate launch scripts/cv_classification.py \ 116 | --dataset_name beans \ 117 | --seed $i \ 118 | --model_name_or_path $model_name_or_path \ 119 | --mixed_precision fp16 120 | ;; 121 | *) 122 | echo "Invalid task_name" 123 | exit 1 124 | ;; 125 | esac 126 | done -------------------------------------------------------------------------------- /scripts/cv_classification.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2022 The HuggingFace Inc. team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | """ Finetuning any 🤗 Transformers model for image classification leveraging 🤗 Accelerate.""" 16 | import argparse 17 | import json 18 | import logging 19 | import math 20 | import os 21 | from pathlib import Path 22 | import time 23 | 24 | import datasets 25 | import torch 26 | from datasets import load_dataset 27 | from torch.utils.data import DataLoader 28 | from torchvision.transforms import ( 29 | CenterCrop, 30 | Compose, 31 | Normalize, 32 | RandomHorizontalFlip, 33 | RandomResizedCrop, 34 | Resize, 35 | ToTensor, 36 | ) 37 | from tqdm.auto import tqdm 38 | 39 | import evaluate 40 | import transformers 41 | from accelerate import Accelerator 42 | from accelerate.logging import get_logger 43 | from accelerate.utils import set_seed 44 | from huggingface_hub import Repository 45 | from transformers import ( 46 | AutoFeatureExtractor, 47 | AutoModelForImageClassification, 48 | get_scheduler, 49 | ) 50 | 51 | 52 | torch.backends.cuda.matmul.allow_tf32 = True 53 | logger = get_logger(__name__) 54 | 55 | 56 | def parse_args(): 57 | parser = argparse.ArgumentParser(description="Fine-tune a Transformers model on an image classification dataset") 58 | parser.add_argument( 59 | "--dataset_name", 60 | type=str, 61 | default="cifar10", 62 | help=( 63 | "The name of the Dataset (from the HuggingFace hub) to train on (could be your own, possibly private," 64 | " dataset)." 65 | ), 66 | ) 67 | parser.add_argument( 68 | "--model_name_or_path", 69 | type=str, 70 | help="Path to pretrained model or model identifier from huggingface.co/models.", 71 | default="google/vit-base-patch16-224-in21k", 72 | ) 73 | parser.add_argument( 74 | "--batch_size", 75 | type=int, 76 | default=8, 77 | help="Batch size (per device) for the training dataloader.", 78 | ) 79 | parser.add_argument( 80 | "--learning_rate", 81 | type=float, 82 | default=5e-5, 83 | help="Initial learning rate (after the potential warmup period) to use.", 84 | ) 85 | parser.add_argument("--num_epochs", type=int, default=3, help="Total number of training epochs to perform.") 86 | parser.add_argument("--seed", type=int, default=0, help="A seed for reproducible training.") 87 | parser.add_argument("--dynamo_backend", type=str, default="no", help="Dynamo backend") 88 | parser.add_argument("--mixed_precision", type=str, default="no", help="`no` or `fp16`") 89 | args = parser.parse_args() 90 | return args 91 | 92 | 93 | def main(): 94 | args = parse_args() 95 | set_seed(args.seed) 96 | accelerator = Accelerator(dynamo_backend=args.dynamo_backend, mixed_precision=args.mixed_precision) 97 | 98 | logger.info(accelerator.state) 99 | # Make one log on every process with the configuration for debugging. 100 | logging.basicConfig( 101 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", 102 | datefmt="%m/%d/%Y %H:%M:%S", 103 | level=logging.INFO, 104 | ) 105 | logger.info(accelerator.state, main_process_only=False) 106 | if accelerator.is_local_main_process: 107 | datasets.utils.logging.set_verbosity_warning() 108 | transformers.utils.logging.set_verbosity_info() 109 | else: 110 | datasets.utils.logging.set_verbosity_error() 111 | transformers.utils.logging.set_verbosity_error() 112 | 113 | dataset = load_dataset(args.dataset_name, task="image-classification") 114 | feature_extractor = AutoFeatureExtractor.from_pretrained(args.model_name_or_path) 115 | model = AutoModelForImageClassification.from_pretrained( 116 | args.model_name_or_path, 117 | num_labels=len(dataset["train"].features["labels"].names), 118 | ignore_mismatched_sizes=True, 119 | ) 120 | 121 | # Preprocessing the datasets 122 | 123 | # Define torchvision transforms to be applied to each image. 124 | if "shortest_edge" in feature_extractor.size: 125 | size = feature_extractor.size["shortest_edge"] 126 | else: 127 | size = (feature_extractor.size["height"], feature_extractor.size["width"]) 128 | normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std) 129 | train_transforms = Compose( 130 | [ 131 | RandomResizedCrop(size), 132 | RandomHorizontalFlip(), 133 | ToTensor(), 134 | normalize, 135 | ] 136 | ) 137 | val_transforms = Compose( 138 | [ 139 | Resize(size), 140 | CenterCrop(size), 141 | ToTensor(), 142 | normalize, 143 | ] 144 | ) 145 | 146 | def preprocess_train(example_batch): 147 | """Apply _train_transforms across a batch.""" 148 | example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]] 149 | return example_batch 150 | 151 | def preprocess_val(example_batch): 152 | """Apply _val_transforms across a batch.""" 153 | example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]] 154 | return example_batch 155 | 156 | with accelerator.main_process_first(): 157 | dataset["train"] = dataset["train"].shuffle(seed=args.seed) 158 | # Set the training transforms 159 | train_dataset = dataset["train"].with_transform(preprocess_train) 160 | dataset["validation"] = dataset["validation"].shuffle(seed=args.seed) 161 | # Set the validation transforms 162 | eval_dataset = dataset["validation"].with_transform(preprocess_val) 163 | 164 | # DataLoaders creation: 165 | def collate_fn(examples): 166 | pixel_values = torch.stack([example["pixel_values"] for example in examples]) 167 | labels = torch.tensor([example["labels"] for example in examples]) 168 | return {"pixel_values": pixel_values, "labels": labels} 169 | 170 | train_dataloader = DataLoader( 171 | train_dataset, shuffle=True, collate_fn=collate_fn, batch_size=args.batch_size, drop_last=True 172 | ) 173 | eval_dataloader = DataLoader(eval_dataset, collate_fn=collate_fn, batch_size=args.batch_size, drop_last=True) 174 | 175 | # Optimizer 176 | optimizer = torch.optim.AdamW(model.parameters(), lr=args.learning_rate) 177 | 178 | # Scheduler. 179 | lr_scheduler = get_scheduler( 180 | name="linear", 181 | optimizer=optimizer, 182 | num_warmup_steps=0, 183 | num_training_steps=len(train_dataloader) * args.num_epochs, 184 | ) 185 | 186 | # Prepare everything with our `accelerator`. 187 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare( 188 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler 189 | ) 190 | 191 | # Get the metric function 192 | metric = evaluate.load("accuracy") 193 | # Train! 194 | # Only show the progress bar once on each machine. 195 | train_steps = len(train_dataloader) * args.num_epochs 196 | progress_bar = tqdm(range(train_steps), disable=not accelerator.is_local_main_process) 197 | 198 | start_time = time.time() 199 | for epoch in range(args.num_epochs): 200 | model.train() 201 | for step, batch in enumerate(train_dataloader): 202 | outputs = model(**batch) 203 | loss = outputs.loss 204 | predictions, references = accelerator.gather_for_metrics((outputs.logits.argmax(dim=-1), batch["labels"])) 205 | metric.add_batch(predictions=predictions, references=references) 206 | accelerator.backward(loss) 207 | optimizer.step() 208 | lr_scheduler.step() 209 | optimizer.zero_grad() 210 | progress_bar.update(1) 211 | if step == 0 and epoch == 0: 212 | first_step_time = time.time() - start_time 213 | 214 | eval_train_metric = metric.compute() 215 | print(f"Training Accuracy for backend {args.dynamo_backend} at epoch {epoch}: {eval_train_metric}") 216 | 217 | total_training_time = time.time() - start_time 218 | avg_train_iteration_time = (total_training_time - first_step_time) / (train_steps - 1) 219 | print("Training finished.") 220 | print(f"First iteration took: {first_step_time:.2f}s") 221 | print(f"Average time after the first iteration: {avg_train_iteration_time * 1000:.2f}ms") 222 | 223 | model.eval() 224 | start_time = time.time() 225 | for step, batch in enumerate(eval_dataloader): 226 | with torch.no_grad(): 227 | outputs = model(**batch) 228 | predictions = outputs.logits.argmax(dim=-1) 229 | predictions, references = accelerator.gather_for_metrics((predictions, batch["labels"])) 230 | metric.add_batch(predictions=predictions, references=references) 231 | 232 | if step == 0: 233 | first_step_time = time.time() - start_time 234 | total_eval_time = time.time() - start_time 235 | avg_test_iteration_time = (total_eval_time - first_step_time) / (len(eval_dataloader) - 1) 236 | print("Evaluation finished.") 237 | print(f"First iteration took: {first_step_time:.2f}s") 238 | print(f"Average time after the first iteration: {avg_test_iteration_time * 1000:.2f}ms") 239 | 240 | eval_test_metric = metric.compute() 241 | print(f"Test Accuracy for backend {args.dynamo_backend}: {eval_test_metric}") 242 | 243 | out_dict = { 244 | "backend": args.dynamo_backend, 245 | "mixed_precision": args.mixed_precision, 246 | "num_epochs": str(args.num_epochs), 247 | "seed": str(args.seed), 248 | "train_acc": str(eval_train_metric["accuracy"]), 249 | "avg_train_time": str(avg_train_iteration_time * 1000), 250 | "test_acc": str(eval_test_metric["accuracy"]), 251 | "avg_test_time": str(avg_test_iteration_time * 1000), 252 | } 253 | prefix = args.model_name_or_path.split("/")[-1] 254 | with open(f"{prefix}_cv_classification_results.csv", "a+") as fd: 255 | fd.seek(0) 256 | if len(fd.read(1)) == 0: 257 | fd.write(",".join(out_dict.keys()) + "\n") 258 | else: 259 | fd.write("\n") 260 | fd.write(",".join(out_dict.values())) 261 | 262 | 263 | if __name__ == "__main__": 264 | main() 265 | -------------------------------------------------------------------------------- /scripts/language_modeling.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | # Copyright 2021 The HuggingFace Inc. team. All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | """ 17 | Fine-tuning the library models for causal language modeling (GPT, GPT-2, CTRL, ...) 18 | on a text file or a dataset without using HuggingFace Trainer. 19 | 20 | Here is the full list of checkpoints on the hub that can be fine-tuned by this script: 21 | https://huggingface.co/models?filter=text-generation 22 | """ 23 | # You can also adapt this script on your own causal language modeling task. Pointers for this are left as comments. 24 | 25 | import argparse 26 | import logging 27 | import math 28 | import os 29 | from itertools import chain 30 | import time 31 | 32 | import datasets 33 | import torch 34 | from datasets import load_dataset 35 | from torch.utils.data import DataLoader 36 | from tqdm.auto import tqdm 37 | 38 | import transformers 39 | from accelerate import Accelerator 40 | from accelerate.logging import get_logger 41 | from accelerate.utils import set_seed 42 | from transformers import ( 43 | AutoModelForCausalLM, 44 | AutoTokenizer, 45 | default_data_collator, 46 | get_scheduler, 47 | ) 48 | 49 | torch.backends.cuda.matmul.allow_tf32 = True 50 | logger = get_logger(__name__) 51 | 52 | 53 | def parse_args(): 54 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a causal language modeling task") 55 | parser.add_argument( 56 | "--dataset_name", 57 | type=str, 58 | default=None, 59 | help="The name of the dataset to use (via the datasets library).", 60 | ) 61 | parser.add_argument( 62 | "--dataset_config_name", 63 | type=str, 64 | default=None, 65 | help="The configuration name of the dataset to use (via the datasets library).", 66 | ) 67 | parser.add_argument( 68 | "--model_name_or_path", 69 | type=str, 70 | help="Path to pretrained model or model identifier from huggingface.co/models.", 71 | required=False, 72 | ) 73 | parser.add_argument( 74 | "--batch_size", 75 | type=int, 76 | default=8, 77 | help="Batch size (per device) for the training dataloader.", 78 | ) 79 | parser.add_argument("--num_epochs", type=int, default=3, help="Total number of training epochs to perform.") 80 | parser.add_argument("--seed", type=int, default=0, help="A seed for reproducible training.") 81 | parser.add_argument("--dynamo_backend", type=str, default="no", help="Dynamo backend") 82 | parser.add_argument("--mixed_precision", type=str, default="no", help="`no` or `fp16`") 83 | args = parser.parse_args() 84 | return args 85 | 86 | 87 | def main(): 88 | args = parse_args() 89 | set_seed(args.seed) 90 | accelerator = Accelerator(dynamo_backend=args.dynamo_backend, mixed_precision=args.mixed_precision) 91 | 92 | # Make one log on every process with the configuration for debugging. 93 | logging.basicConfig( 94 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", 95 | datefmt="%m/%d/%Y %H:%M:%S", 96 | level=logging.INFO, 97 | ) 98 | logger.info(accelerator.state, main_process_only=False) 99 | if accelerator.is_local_main_process: 100 | datasets.utils.logging.set_verbosity_warning() 101 | transformers.utils.logging.set_verbosity_info() 102 | else: 103 | datasets.utils.logging.set_verbosity_error() 104 | transformers.utils.logging.set_verbosity_error() 105 | 106 | if args.dataset_name is not None: 107 | # Downloading and loading a dataset from the hub. 108 | raw_datasets = load_dataset(args.dataset_name, args.dataset_config_name) 109 | if "validation" not in raw_datasets.keys(): 110 | raw_datasets["validation"] = load_dataset( 111 | args.dataset_name, 112 | args.dataset_config_name, 113 | split=f"train[:{args.validation_split_percentage}%]", 114 | ) 115 | raw_datasets["train"] = load_dataset( 116 | args.dataset_name, 117 | args.dataset_config_name, 118 | split=f"train[{args.validation_split_percentage}%:]", 119 | ) 120 | 121 | tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path) 122 | model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path) 123 | 124 | # Preprocessing the datasets. 125 | # First we tokenize all the texts. 126 | column_names = raw_datasets["train"].column_names 127 | text_column_name = "text" if "text" in column_names else column_names[0] 128 | 129 | def tokenize_function(examples): 130 | return tokenizer(examples[text_column_name]) 131 | 132 | with accelerator.main_process_first(): 133 | tokenized_datasets = raw_datasets.map( 134 | tokenize_function, 135 | batched=True, 136 | num_proc=4, 137 | remove_columns=column_names, 138 | load_from_cache_file=False, 139 | desc="Running tokenizer on dataset", 140 | ) 141 | block_size = tokenizer.model_max_length 142 | 143 | # Main data processing function that will concatenate all texts from our dataset and generate chunks of block_size. 144 | def group_texts(examples): 145 | # Concatenate all texts. 146 | concatenated_examples = {k: list(chain(*examples[k])) for k in examples.keys()} 147 | total_length = len(concatenated_examples[list(examples.keys())[0]]) 148 | # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can 149 | # customize this part to your needs. 150 | if total_length >= block_size: 151 | total_length = (total_length // block_size) * block_size 152 | # Split by chunks of max_len. 153 | result = { 154 | k: [t[i : i + block_size] for i in range(0, total_length, block_size)] 155 | for k, t in concatenated_examples.items() 156 | } 157 | result["labels"] = result["input_ids"].copy() 158 | return result 159 | 160 | with accelerator.main_process_first(): 161 | lm_datasets = tokenized_datasets.map( 162 | group_texts, 163 | batched=True, 164 | num_proc=4, 165 | load_from_cache_file=False, 166 | desc=f"Grouping texts in chunks of {block_size}", 167 | ) 168 | 169 | train_dataset = lm_datasets["train"] 170 | eval_dataset = lm_datasets["validation"] 171 | 172 | # DataLoaders creation: 173 | train_dataloader = DataLoader( 174 | train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=args.batch_size, drop_last=True 175 | ) 176 | eval_dataloader = DataLoader( 177 | eval_dataset, collate_fn=default_data_collator, batch_size=args.batch_size, drop_last=True 178 | ) 179 | 180 | # Optimizer 181 | optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5) 182 | 183 | # Scheduler. 184 | lr_scheduler = get_scheduler( 185 | name="linear", 186 | optimizer=optimizer, 187 | num_warmup_steps=0, 188 | num_training_steps=len(train_dataloader) * args.num_epochs, 189 | ) 190 | 191 | # Prepare everything with our `accelerator`. 192 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare( 193 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler 194 | ) 195 | 196 | # Train! 197 | # Only show the progress bar once on each machine. 198 | train_steps = len(train_dataloader) * args.num_epochs 199 | progress_bar = tqdm(range(train_steps), disable=not accelerator.is_local_main_process) 200 | 201 | start_time = time.time() 202 | for epoch in range(args.num_epochs): 203 | model.train() 204 | total_loss = 0 205 | for step, batch in enumerate(train_dataloader): 206 | outputs = model(**batch) 207 | loss = outputs.loss 208 | total_loss += loss.detach().float() 209 | accelerator.backward(loss) 210 | optimizer.step() 211 | lr_scheduler.step() 212 | optimizer.zero_grad() 213 | progress_bar.update(1) 214 | if step == 0 and epoch == 0: 215 | first_step_time = time.time() - start_time 216 | train_perplexity = torch.exp(total_loss / len(train_dataloader)) 217 | print(f"Training Perplexity for backend {args.dynamo_backend} at epoch {epoch}: {train_perplexity}") 218 | 219 | total_training_time = time.time() - start_time 220 | avg_train_iteration_time = (total_training_time - first_step_time) / (train_steps - 1) 221 | print("Training finished.") 222 | print(f"First iteration took: {first_step_time:.2f}s") 223 | print(f"Average time after the first iteration: {avg_train_iteration_time * 1000:.2f}ms") 224 | model.eval() 225 | total_loss = 0 226 | start_time = time.time() 227 | for step, batch in enumerate(eval_dataloader): 228 | with torch.no_grad(): 229 | outputs = model(**batch) 230 | loss = outputs.loss 231 | total_loss += loss.detach().float() 232 | if step == 0: 233 | first_step_time = time.time() - start_time 234 | 235 | total_eval_time = time.time() - start_time 236 | total_eval_time = time.time() - start_time 237 | avg_test_iteration_time = (total_eval_time - first_step_time) / (len(eval_dataloader) - 1) 238 | print("Evaluation finished.") 239 | print(f"First iteration took: {first_step_time:.2f}s") 240 | print(f"Average time after the first iteration: {avg_test_iteration_time * 1000:.2f}ms") 241 | test_perplexity = torch.exp(total_loss / len(eval_dataloader)) 242 | print(f"Test Perplexity for backend {args.dynamo_backend}: {test_perplexity}") 243 | out_dict = { 244 | "backend": args.dynamo_backend, 245 | "mixed_precision": args.mixed_precision, 246 | "num_epochs": str(args.num_epochs), 247 | "seed": str(args.seed), 248 | "train_perplexity": str(train_perplexity.item()), 249 | "avg_train_time": str(avg_train_iteration_time * 1000), 250 | "test_perplexity": str(test_perplexity.item()), 251 | "avg_test_time": str(avg_test_iteration_time * 1000), 252 | } 253 | prefix = args.model_name_or_path.split("/")[-1] 254 | with open(f"{prefix}_language_modeling_task_results.csv", "a+") as fd: 255 | fd.seek(0) 256 | if len(fd.read(1)) == 0: 257 | fd.write(",".join(out_dict.keys()) + "\n") 258 | else: 259 | fd.write("\n") 260 | fd.write(",".join(out_dict.values())) 261 | 262 | 263 | if __name__ == "__main__": 264 | main() 265 | -------------------------------------------------------------------------------- /scripts/text_classification.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2021 The HuggingFace Inc. team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | """ Finetuning a 🤗 Transformers model for sequence classification on GLUE.""" 16 | import argparse 17 | import logging 18 | import time 19 | 20 | import datasets 21 | import torch 22 | from datasets import load_dataset 23 | from torch.utils.data import DataLoader 24 | from tqdm.auto import tqdm 25 | 26 | import evaluate 27 | import transformers 28 | from accelerate import Accelerator 29 | from accelerate.logging import get_logger 30 | from transformers import ( 31 | AutoModelForSequenceClassification, 32 | AutoTokenizer, 33 | DataCollatorWithPadding, 34 | default_data_collator, 35 | get_scheduler, 36 | ) 37 | 38 | torch.backends.cuda.matmul.allow_tf32 = True 39 | logger = get_logger(__name__) 40 | 41 | task_to_keys = { 42 | "cola": ("sentence", None), 43 | "mnli": ("premise", "hypothesis"), 44 | "mrpc": ("sentence1", "sentence2"), 45 | "qnli": ("question", "sentence"), 46 | "qqp": ("question1", "question2"), 47 | "rte": ("sentence1", "sentence2"), 48 | "sst2": ("sentence", None), 49 | "stsb": ("sentence1", "sentence2"), 50 | "wnli": ("sentence1", "sentence2"), 51 | } 52 | 53 | 54 | def parse_args(): 55 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a text classification task") 56 | parser.add_argument( 57 | "--task_name", 58 | type=str, 59 | default=None, 60 | help="The name of the glue task to train on.", 61 | choices=list(task_to_keys.keys()), 62 | ) 63 | parser.add_argument( 64 | "--max_length", 65 | type=int, 66 | default=128, 67 | help=( 68 | "The maximum total input sequence length after tokenization. Sequences longer than this will be truncated," 69 | " sequences shorter will be padded unless `--dynamic_lengh` is passed." 70 | ), 71 | ) 72 | parser.add_argument( 73 | "--dynamic_length", 74 | action="store_true", 75 | help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.", 76 | ) 77 | parser.add_argument( 78 | "--model_name_or_path", 79 | type=str, 80 | help="Path to pretrained model or model identifier from huggingface.co/models.", 81 | default="bert-base-cased", 82 | ) 83 | parser.add_argument( 84 | "--batch_size", 85 | type=int, 86 | default=16, 87 | help="Batch size (per device) for the dataloaders.", 88 | ) 89 | parser.add_argument( 90 | "--num_epochs", 91 | type=int, 92 | default=3, 93 | help="Number of training epochs.", 94 | ) 95 | parser.add_argument("--dynamo_backend", type=str, default="no", help="Dynamo backend") 96 | parser.add_argument("--seed", type=int, default=0, help="random seed for torch") 97 | parser.add_argument("--mixed_precision", type=str, default="no", help="`no` or `fp16`") 98 | args = parser.parse_args() 99 | return args 100 | 101 | 102 | def main(): 103 | args = parse_args() 104 | torch.manual_seed(args.seed) 105 | accelerator = Accelerator(dynamo_backend=args.dynamo_backend, mixed_precision=args.mixed_precision) 106 | 107 | # Make one log on every process with the configuration for debugging. 108 | logging.basicConfig( 109 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", 110 | datefmt="%m/%d/%Y %H:%M:%S", 111 | level=logging.INFO, 112 | ) 113 | logger.info(accelerator.state, main_process_only=False) 114 | if accelerator.is_local_main_process: 115 | datasets.utils.logging.set_verbosity_warning() 116 | transformers.utils.logging.set_verbosity_info() 117 | else: 118 | datasets.utils.logging.set_verbosity_error() 119 | transformers.utils.logging.set_verbosity_error() 120 | 121 | # Load data 122 | raw_datasets = load_dataset("glue", args.task_name) 123 | 124 | is_regression = args.task_name == "stsb" 125 | if not is_regression: 126 | label_list = raw_datasets["train"].features["label"].names 127 | num_labels = len(label_list) 128 | else: 129 | num_labels = 1 130 | 131 | # Load pretrained model and tokenizer 132 | tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path) 133 | model = AutoModelForSequenceClassification.from_pretrained(args.model_name_or_path, num_labels=num_labels) 134 | 135 | # Preprocessing the datasets 136 | sentence1_key, sentence2_key = task_to_keys[args.task_name] 137 | padding = False if args.dynamic_length else "max_length" 138 | 139 | def preprocess_function(examples): 140 | # Tokenize the texts 141 | texts = ( 142 | (examples[sentence1_key],) if sentence2_key is None else (examples[sentence1_key], examples[sentence2_key]) 143 | ) 144 | result = tokenizer(*texts, padding=padding, max_length=args.max_length, truncation=True) 145 | result["labels"] = examples["label"] 146 | return result 147 | 148 | with accelerator.main_process_first(): 149 | processed_datasets = raw_datasets.map( 150 | preprocess_function, 151 | batched=True, 152 | remove_columns=raw_datasets["train"].column_names, 153 | desc="Running tokenizer on dataset", 154 | ) 155 | 156 | train_dataset = processed_datasets["train"] 157 | eval_dataset = processed_datasets["validation_matched" if args.task_name == "mnli" else "validation"] 158 | 159 | # DataLoaders creation: 160 | if not args.dynamic_length: 161 | data_collator = default_data_collator 162 | else: 163 | data_collator = DataCollatorWithPadding(tokenizer, pad_to_multiple_of=8) 164 | 165 | train_dataloader = DataLoader( 166 | train_dataset, shuffle=True, collate_fn=data_collator, batch_size=args.batch_size, drop_last=True 167 | ) 168 | eval_dataloader = DataLoader( 169 | eval_dataset, collate_fn=data_collator, batch_size=args.batch_size, drop_last=not args.dynamic_length 170 | ) 171 | 172 | # Optimizer 173 | optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5) 174 | 175 | # Scheduler. 176 | lr_scheduler = get_scheduler( 177 | name="linear", 178 | optimizer=optimizer, 179 | num_warmup_steps=0, 180 | num_training_steps=len(train_dataloader) * args.num_epochs, 181 | ) 182 | 183 | # Prepare everything with our `accelerator`. 184 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare( 185 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler 186 | ) 187 | 188 | # Get the metric function 189 | metric = evaluate.load("glue", args.task_name) 190 | # Train! 191 | # Only show the progress bar once on each machine. 192 | train_steps = len(train_dataloader) * args.num_epochs 193 | progress_bar = tqdm(range(train_steps), disable=not accelerator.is_local_main_process) 194 | start_time = time.time() 195 | for epoch in range(args.num_epochs): 196 | model.train() 197 | for step, batch in enumerate(train_dataloader): 198 | # We need to skip steps until we reach the resumed step 199 | outputs = model(**batch) 200 | loss = outputs.loss 201 | predictions, references = accelerator.gather_for_metrics((outputs.logits.argmax(dim=-1), batch["labels"])) 202 | metric.add_batch(predictions=predictions, references=references) 203 | accelerator.backward(loss) 204 | optimizer.step() 205 | lr_scheduler.step() 206 | optimizer.zero_grad() 207 | progress_bar.update(1) 208 | if step == 0 and epoch == 0: 209 | first_step_time = time.time() - start_time 210 | 211 | eval_train_metric = metric.compute() 212 | print(f"Training Accuracy for backend {args.dynamo_backend} at epoch {epoch}: {eval_train_metric}") 213 | 214 | total_training_time = time.time() - start_time 215 | avg_train_iteration_time = (total_training_time - first_step_time) / (train_steps - 1) 216 | print("Training finished.") 217 | print(f"First iteration took: {first_step_time:.2f}s") 218 | print(f"Average time after the first iteration: {avg_train_iteration_time * 1000:.2f}ms") 219 | 220 | model.eval() 221 | start_time = time.time() 222 | for step, batch in enumerate(eval_dataloader): 223 | with torch.no_grad(): 224 | outputs = model(**batch) 225 | predictions = outputs.logits.argmax(dim=-1) if not is_regression else outputs.logits.squeeze() 226 | predictions, references = accelerator.gather_for_metrics((predictions, batch["labels"])) 227 | metric.add_batch(predictions=predictions, references=references) 228 | 229 | if step == 0: 230 | first_step_time = time.time() - start_time 231 | total_eval_time = time.time() - start_time 232 | avg_test_iteration_time = (total_eval_time - first_step_time) / (len(eval_dataloader) - 1) 233 | print("Evaluation finished.") 234 | print(f"First iteration took: {first_step_time:.2f}s") 235 | print(f"Average time after the first iteration: {avg_test_iteration_time * 1000:.2f}ms") 236 | 237 | eval_test_metric = metric.compute() 238 | print(f"Test Accuracy for backend {args.dynamo_backend}: {eval_test_metric}") 239 | 240 | out_dict = { 241 | "backend": args.dynamo_backend, 242 | "mixed_precision": args.mixed_precision, 243 | "num_epochs": str(args.num_epochs), 244 | "seed": str(args.seed), 245 | "train_acc": str(eval_train_metric["accuracy"]), 246 | "train_f1": str(eval_train_metric["f1"]), 247 | "avg_train_time": str(avg_train_iteration_time * 1000), 248 | "test_acc": str(eval_test_metric["accuracy"]), 249 | "test_f1": str(eval_test_metric["f1"]), 250 | "avg_test_time": str(avg_test_iteration_time * 1000), 251 | } 252 | prefix = args.model_name_or_path.split("/")[-1] 253 | with open(f"{prefix}_text_classification_results.csv", "a+") as fd: 254 | fd.seek(0) 255 | if len(fd.read(1)) == 0: 256 | fd.write(",".join(out_dict.keys()) + "\n") 257 | else: 258 | fd.write("\n") 259 | fd.write(",".join(out_dict.values())) 260 | 261 | 262 | if __name__ == "__main__": 263 | main() 264 | -------------------------------------------------------------------------------- /scripts/translation.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | # Copyright The HuggingFace Team and The HuggingFace Inc. team. All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | """ 17 | Fine-tuning a 🤗 Transformers model on text translation. 18 | """ 19 | # You can also adapt this script on your own text translation task. Pointers for this are left as comments. 20 | 21 | import argparse 22 | import logging 23 | import random 24 | import time 25 | 26 | import datasets 27 | import numpy as np 28 | import torch 29 | from datasets import load_dataset 30 | from torch.utils.data import DataLoader 31 | from tqdm.auto import tqdm 32 | 33 | import evaluate 34 | import transformers 35 | from accelerate import Accelerator 36 | from accelerate.logging import get_logger 37 | from transformers import ( 38 | AutoModelForSeq2SeqLM, 39 | AutoTokenizer, 40 | DataCollatorForSeq2Seq, 41 | MBartTokenizer, 42 | MBartTokenizerFast, 43 | default_data_collator, 44 | get_scheduler, 45 | ) 46 | 47 | torch.backends.cuda.matmul.allow_tf32 = True 48 | logger = get_logger(__name__) 49 | 50 | 51 | # Parsing input arguments 52 | def parse_args(): 53 | 54 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a text classification task") 55 | parser.add_argument( 56 | "--model_name_or_path", 57 | type=str, 58 | help="Path to pretrained model or model identifier from huggingface.co/models.", 59 | default="t5-small", 60 | ) 61 | parser.add_argument( 62 | "--max_length", 63 | type=int, 64 | default=128, 65 | help=( 66 | "The maximum total input sequence length after tokenization. Sequences longer than this will be truncated," 67 | " sequences shorter will be padded unless `--dynamic_lengh` is passed." 68 | ), 69 | ) 70 | parser.add_argument( 71 | "--dynamic_length", 72 | action="store_true", 73 | help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.", 74 | ) 75 | parser.add_argument( 76 | "--batch_size", 77 | type=int, 78 | default=16, 79 | help="Batch size (per device) for the dataloaders.", 80 | ) 81 | parser.add_argument( 82 | "--num_epochs", 83 | type=int, 84 | default=1, 85 | help="Number of training epochs.", 86 | ) 87 | parser.add_argument("--seed", type=int, default=0, help="A seed for reproducible training.") 88 | parser.add_argument("--dynamo_backend", type=str, default="no", help="Dynamo backend") 89 | parser.add_argument("--mixed_precision", type=str, default="no", help="`no` or `fp16`") 90 | return parser.parse_args() 91 | 92 | 93 | def main(): 94 | args = parse_args() 95 | torch.manual_seed(args.seed) 96 | accelerator = Accelerator(dynamo_backend=args.dynamo_backend, mixed_precision=args.mixed_precision) 97 | 98 | # Make one log on every process with the configuration for debugging. 99 | logging.basicConfig( 100 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", 101 | datefmt="%m/%d/%Y %H:%M:%S", 102 | level=logging.INFO, 103 | ) 104 | logger.info(accelerator.state, main_process_only=False) 105 | if accelerator.is_local_main_process: 106 | datasets.utils.logging.set_verbosity_warning() 107 | transformers.utils.logging.set_verbosity_info() 108 | else: 109 | datasets.utils.logging.set_verbosity_error() 110 | transformers.utils.logging.set_verbosity_error() 111 | 112 | # Load data 113 | raw_datasets = load_dataset("wmt16", "ro-en") 114 | 115 | # Load pretrained model and tokenizer 116 | tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path) 117 | model = AutoModelForSeq2SeqLM.from_pretrained(args.model_name_or_path) 118 | 119 | # MBART requires some language codes 120 | if isinstance(tokenizer, (MBartTokenizer, MBartTokenizerFast)): 121 | tokenizer.src_lang = "en_XX" 122 | tokenizer.tgt_lang = "ro_RO" 123 | if model.config.decoder_start_token_id is None: 124 | if isinstance(tokenizer, MBartTokenizer): 125 | model.config.decoder_start_token_id = tokenizer.lang_code_to_id["ro_RO"] 126 | else: 127 | model.config.decoder_start_token_id = tokenizer.convert_tokens_to_ids("ro_RO") 128 | 129 | # T5 requires a prefix 130 | if args.model_name_or_path in ["t5-small", "t5-base", "t5-large", "t5-3b", "t5-11b"]: 131 | prefix = "translate English to Romanian: " 132 | else: 133 | prefix = "" 134 | 135 | # Preprocessing the datasets. 136 | padding = False if args.dynamic_length else "max_length" 137 | 138 | def preprocess_function(examples): 139 | inputs = [ex["en"] for ex in examples["translation"]] 140 | targets = [ex["ro"] for ex in examples["translation"]] 141 | inputs = [prefix + inp for inp in inputs] 142 | model_inputs = tokenizer( 143 | inputs, text_target=targets, max_length=args.max_length, padding=padding, truncation=True 144 | ) 145 | 146 | # If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore 147 | # padding in the loss. 148 | if padding == "max_length": 149 | model_inputs["labels"] = [ 150 | [(l if l != tokenizer.pad_token_id else -100) for l in label] for label in model_inputs["labels"] 151 | ] 152 | 153 | return model_inputs 154 | 155 | with accelerator.main_process_first(): 156 | processed_datasets = raw_datasets.map( 157 | preprocess_function, 158 | batched=True, 159 | remove_columns=raw_datasets["train"].column_names, 160 | desc="Running tokenizer on dataset", 161 | ) 162 | 163 | train_dataset = processed_datasets["train"] 164 | eval_dataset = processed_datasets["validation"] 165 | 166 | # Log a few random samples from the training set: 167 | for index in random.sample(range(len(train_dataset)), 3): 168 | logger.info(f"Sample {index} of the training set: {train_dataset[index]}.") 169 | 170 | # DataLoaders creation: 171 | if not args.dynamic_length: 172 | data_collator = default_data_collator 173 | else: 174 | data_collator = DataCollatorForSeq2Seq( 175 | tokenizer, 176 | model=model, 177 | label_pad_token_id=-100, 178 | pad_to_multiple_of=8, 179 | ) 180 | 181 | train_dataloader = DataLoader( 182 | train_dataset, shuffle=True, collate_fn=data_collator, batch_size=args.batch_size, drop_last=True 183 | ) 184 | eval_dataloader = DataLoader( 185 | eval_dataset, collate_fn=data_collator, batch_size=args.batch_size, drop_last=not args.dynamic_length 186 | ) 187 | 188 | # Optimizer 189 | optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5) 190 | 191 | # Scheduler. 192 | lr_scheduler = get_scheduler( 193 | name="linear", 194 | optimizer=optimizer, 195 | num_warmup_steps=0, 196 | num_training_steps=len(train_dataloader) * args.num_epochs, 197 | ) 198 | 199 | # Prepare everything with our `accelerator`. 200 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare( 201 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler 202 | ) 203 | 204 | # Metric 205 | metric = evaluate.load("sacrebleu") 206 | 207 | def postprocess_text(preds, labels): 208 | preds = [pred.strip() for pred in preds] 209 | labels = [[label.strip()] for label in labels] 210 | 211 | return preds, labels 212 | 213 | # Train! 214 | # Only show the progress bar once on each machine. 215 | train_steps = min(len(train_dataloader) * args.num_epochs, 1000) 216 | progress_bar = tqdm(range(train_steps), disable=not accelerator.is_local_main_process) 217 | start_time = time.time() 218 | 219 | for epoch in range(args.num_epochs): 220 | model.train() 221 | for step, batch in enumerate(train_dataloader): 222 | # We need to skip steps until we reach the resumed step 223 | outputs = model(**batch) 224 | loss = outputs.loss 225 | accelerator.backward(loss) 226 | optimizer.step() 227 | lr_scheduler.step() 228 | optimizer.zero_grad() 229 | progress_bar.update(1) 230 | if step == 0 and epoch == 0: 231 | first_step_time = time.time() - start_time 232 | elif step >= 1000: 233 | break 234 | 235 | total_training_time = time.time() - start_time 236 | avg_iteration_time = (total_training_time - first_step_time) / (train_steps - 1) 237 | print("Training finished.") 238 | print(f"First iteration took: {first_step_time:.2f}s") 239 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 240 | 241 | model.eval() 242 | start_time = time.time() 243 | for step, batch in enumerate(eval_dataloader): 244 | with torch.no_grad(): 245 | generated_tokens = accelerator.unwrap_model(model).generate( 246 | batch["input_ids"], attention_mask=batch["attention_mask"], max_length=args.max_length 247 | ) 248 | generated_tokens = accelerator.pad_across_processes( 249 | generated_tokens, dim=1, pad_index=tokenizer.pad_token_id 250 | ) 251 | labels = batch["labels"] 252 | if args.dynamic_length: 253 | labels = accelerator.pad_across_processes(batch["labels"], dim=1, pad_index=tokenizer.pad_token_id) 254 | 255 | generated_tokens = accelerator.gather(generated_tokens).cpu().numpy() 256 | labels = accelerator.gather(labels).cpu().numpy() 257 | 258 | # Replace -100 in the labels as we can't decode them. 259 | labels = np.where(labels != -100, labels, tokenizer.pad_token_id) 260 | 261 | decoded_preds = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True) 262 | decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True) 263 | 264 | decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels) 265 | 266 | metric.add_batch(predictions=decoded_preds, references=decoded_labels) 267 | if step == 0: 268 | first_step_time = time.time() - start_time 269 | 270 | total_eval_time = time.time() - start_time 271 | avg_iteration_time = (total_eval_time - first_step_time) / (len(eval_dataloader) - 1) 272 | 273 | print("Evaluation finished.") 274 | print(f"First iteration took: {first_step_time:.2f}s") 275 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms") 276 | 277 | eval_metric = metric.compute() 278 | print(f"Test BLEU score for backend {args.dynamo_backend}: {eval_metric['score']}") 279 | 280 | 281 | if __name__ == "__main__": 282 | main() 283 | -------------------------------------------------------------------------------- /tools/summarize.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import os 3 | from collections import defaultdict 4 | import argparse 5 | 6 | 7 | def generate_and_save_plots(df, output_dir): 8 | for mixed_precision in set(df["mixed_precision"].values): 9 | metrics = defaultdict(list) 10 | filtered_df = df[df["mixed_precision"] == mixed_precision] 11 | columns = list(df.columns)[2:] 12 | 13 | # saving performance plots 14 | metrics_columns = [column for column in columns if "time" not in column] 15 | df_metric = filtered_df[metrics_columns + ["backend"]] 16 | inductor_backend_values = df_metric[df_metric["backend"] == "inductor"].values[0] 17 | pytorch_backend_values = df_metric[df_metric["backend"] == "no"].values[0] 18 | for i, column in enumerate(metrics_columns): 19 | metrics["metric"].append(column) 20 | metrics["inductor"].append(inductor_backend_values[i]) 21 | metrics["no"].append(pytorch_backend_values[i]) 22 | df_metric = pd.DataFrame(metrics) 23 | plot = df_metric.plot.bar(x="metric", rot=0) 24 | fig = plot.get_figure() 25 | fig.savefig(os.path.join(output_dir, f"{mixed_precision=}_metric.png")) 26 | 27 | # saving avg time plots 28 | metrics = defaultdict(list) 29 | time_columns = [column for column in columns if "time" in column] 30 | df_time = filtered_df[time_columns + ["backend"]] 31 | inductor_backend_values = df_time[df_time["backend"] == "inductor"].values[0] 32 | pytorch_backend_values = df_time[df_time["backend"] == "no"].values[0] 33 | for i, column in enumerate(time_columns): 34 | metrics["avg_time"].append(column) 35 | metrics["inductor"].append(inductor_backend_values[i]) 36 | metrics["no"].append(pytorch_backend_values[i]) 37 | df_metric = pd.DataFrame(metrics) 38 | plot = df_metric.plot.bar(x="avg_time", rot=0) 39 | fig = plot.get_figure() 40 | fig.savefig(os.path.join(output_dir, f"{mixed_precision=}_avg_time.png")) 41 | 42 | 43 | def get_diff_percentage(df): 44 | diff_percentage = defaultdict(list) 45 | for mixed_precision in set(df["mixed_precision"].values): 46 | diff_percentage["mixed_precision"].append(mixed_precision) 47 | filtered_df = df[df["mixed_precision"] == mixed_precision] 48 | columns = list(df.columns)[2:] 49 | inductor_backend_values = filtered_df[filtered_df["backend"] == "inductor"].values[0][2:] 50 | pytorch_backend_values = filtered_df[filtered_df["backend"] == "no"].values[0][2:] 51 | 52 | for i, column in enumerate(columns): 53 | if "time" in column: 54 | diff_percentage[f"{column}_speedup"].append( 55 | str(round((pytorch_backend_values[i] / inductor_backend_values[i]), 2)) + "x" 56 | ) 57 | else: 58 | diff_percentage[f"{column}_diff%"].append( 59 | str(round((100 * (inductor_backend_values[i] / pytorch_backend_values[i] - 1)), 2)) + "%" 60 | ) 61 | return pd.DataFrame(diff_percentage) 62 | 63 | 64 | def main(): 65 | parser = argparse.ArgumentParser(description="Get plots and summary table") 66 | parser.add_argument("--input_csv_file", type=str, required=True) 67 | parser.add_argument("--output_dir", type=str, required=True) 68 | 69 | args = parser.parse_args() 70 | os.makedirs(args.output_dir, exist_ok=True) 71 | df = pd.read_csv(args.input_csv_file) 72 | group_by_columns = ["backend", "mixed_precision"] 73 | drop_columns = ["num_epochs", "seed"] 74 | df.drop(columns=drop_columns, inplace=True) 75 | df = df.groupby(group_by_columns).agg("mean") 76 | df = df.reset_index() 77 | 78 | generate_and_save_plots(df, args.output_dir) 79 | diff_df = get_diff_percentage(df) 80 | file_prefix = args.input_csv_file.split("/")[-1].split(".")[0] 81 | diff_df.to_csv(os.path.join(args.output_dir, f"{file_prefix}_summary_table.csv"), header=True, index=False) 82 | 83 | 84 | if __name__ == "__main__": 85 | main() 86 | -------------------------------------------------------------------------------- /tools/verify_dynamo.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import subprocess 4 | import sys 5 | import traceback 6 | import warnings 7 | 8 | from pkg_resources import packaging 9 | 10 | MIN_CUDA_VERSION = packaging.version.parse("11.6") 11 | MIN_PYTHON_VERSION = (3, 7) 12 | 13 | 14 | class VerifyDynamoError(BaseException): 15 | pass 16 | 17 | 18 | def check_python(): 19 | if sys.version_info < MIN_PYTHON_VERSION: 20 | raise VerifyDynamoError( 21 | f"Python version not supported: {sys.version_info} " 22 | f"- minimum requirement: {MIN_PYTHON_VERSION}" 23 | ) 24 | return sys.version_info 25 | 26 | 27 | def check_torch(): 28 | import torch 29 | 30 | return packaging.version.parse(torch.__version__) 31 | 32 | 33 | # based on torch/utils/cpp_extension.py 34 | def get_cuda_version(): 35 | from torch.utils import cpp_extension 36 | 37 | CUDA_HOME = cpp_extension._find_cuda_home() 38 | if not CUDA_HOME: 39 | raise VerifyDynamoError(cpp_extension.CUDA_NOT_FOUND_MESSAGE) 40 | 41 | nvcc = os.path.join(CUDA_HOME, "bin", "nvcc") 42 | cuda_version_str = ( 43 | subprocess.check_output([nvcc, "--version"]) 44 | .strip() 45 | .decode(*cpp_extension.SUBPROCESS_DECODE_ARGS) 46 | ) 47 | cuda_version = re.search(r"release (\d+[.]\d+)", cuda_version_str) 48 | if cuda_version is None: 49 | raise VerifyDynamoError("CUDA version not found in `nvcc --version` output") 50 | 51 | cuda_str_version = cuda_version.group(1) 52 | return packaging.version.parse(cuda_str_version) 53 | 54 | 55 | def check_cuda(): 56 | import torch 57 | 58 | if not torch.cuda.is_available(): 59 | return None 60 | 61 | torch_cuda_ver = packaging.version.parse(torch.version.cuda) 62 | 63 | # check if torch cuda version matches system cuda version 64 | cuda_ver = get_cuda_version() 65 | if cuda_ver != torch_cuda_ver: 66 | # raise VerifyDynamoError( 67 | warnings.warn( 68 | f"CUDA version mismatch, `torch` version: {torch_cuda_ver}, env version: {cuda_ver}" 69 | ) 70 | 71 | if torch_cuda_ver < MIN_CUDA_VERSION: 72 | # raise VerifyDynamoError( 73 | warnings.warn( 74 | f"(`torch`) CUDA version not supported: {torch_cuda_ver} " 75 | f"- minimum requirement: {MIN_CUDA_VERSION}" 76 | ) 77 | if cuda_ver < MIN_CUDA_VERSION: 78 | # raise VerifyDynamoError( 79 | warnings.warn( 80 | f"(env) CUDA version not supported: {cuda_ver} " 81 | f"- minimum requirement: {MIN_CUDA_VERSION}" 82 | ) 83 | 84 | return cuda_ver 85 | 86 | 87 | def check_dynamo(backend, device, err_msg): 88 | import torch 89 | 90 | if device == "cuda" and not torch.cuda.is_available(): 91 | print(f"CUDA not available -- skipping CUDA check on {backend} backend\n") 92 | return 93 | 94 | try: 95 | import torch._dynamo as dynamo 96 | 97 | dynamo.reset() 98 | 99 | @dynamo.optimize(backend, nopython=True) 100 | def fn(x): 101 | return x + x 102 | 103 | class Module(torch.nn.Module): 104 | def __init__(self): 105 | super().__init__() 106 | 107 | def forward(self, x): 108 | return x + x 109 | 110 | mod = Module() 111 | opt_mod = dynamo.optimize(backend, nopython=True)(mod) 112 | 113 | for f in (fn, opt_mod): 114 | x = torch.randn(10, 10).to(device) 115 | x.requires_grad = True 116 | y = f(x) 117 | torch.testing.assert_close(y, x + x) 118 | z = y.sum() 119 | z.backward() 120 | torch.testing.assert_close(x.grad, 2 * torch.ones_like(x)) 121 | except Exception: 122 | sys.stderr.write(traceback.format_exc() + "\n" + err_msg + "\n\n") 123 | sys.exit(1) 124 | 125 | 126 | _SANITY_CHECK_ARGS = ( 127 | ("eager", "cpu", "CPU eager sanity check failed"), 128 | ("eager", "cuda", "CUDA eager sanity check failed"), 129 | ("aot_eager", "cpu", "CPU aot_eager sanity check failed"), 130 | ("aot_eager", "cuda", "CUDA aot_eager sanity check failed"), 131 | ("inductor", "cpu", "CPU inductor sanity check failed"), 132 | ( 133 | "inductor", 134 | "cuda", 135 | "CUDA inductor sanity check failed\n" 136 | + "NOTE: Please check that you installed the correct hash/version of `triton`", 137 | ), 138 | ) 139 | 140 | 141 | def main(): 142 | python_ver = check_python() 143 | torch_ver = check_torch() 144 | cuda_ver = check_cuda() 145 | print( 146 | f"Python version: {python_ver.major}.{python_ver.minor}.{python_ver.micro}\n" 147 | f"`torch` version: {torch_ver}\n" 148 | f"CUDA version: {cuda_ver}\n" 149 | ) 150 | for args in _SANITY_CHECK_ARGS: 151 | check_dynamo(*args) 152 | print("All required checks passed") 153 | 154 | 155 | if __name__ == "__main__": 156 | main() 157 | --------------------------------------------------------------------------------