├── .gitignore
├── INSTALLATION.md
├── README.md
├── configs
├── base_fp16.yaml
├── base_fp32.yaml
├── dynamo_fp16.yaml
└── dynamo_fp32.yaml
├── experiments
├── accelerate_script.py
├── base.py
├── base_fp16.py
├── dynamic.py
├── dynamic_fp16.py
├── dynamic_optimized.py
├── dynamic_optimized_fp16.py
├── generate_script.py
├── optimize_forward.py
├── optimize_forward_fp16.py
├── optimize_model.py
├── optimize_model_fp16.py
├── optimize_train_step.py
└── optimize_train_step_fp16.py
├── requirements.txt
├── run_experiments.sh
├── scripts
├── cv_classification.py
├── language_modeling.py
├── text_classification.py
└── translation.py
└── tools
├── summarize.py
└── verify_dynamo.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea
2 | trained-resnet
3 |
--------------------------------------------------------------------------------
/INSTALLATION.md:
--------------------------------------------------------------------------------
1 | # Installation guide on a new instance
2 |
3 | Jump to the last section if you alrady have CUDA installed.
4 |
5 | ## Install drivers:
6 |
7 | ```bash
8 | sudo apt install ubuntu-drivers-common
9 | ```
10 |
11 | Run
12 |
13 | ```bash
14 | ubuntu-drivers devices
15 | ```
16 |
17 | Output
18 |
19 | ```
20 | == /sys/devices/pci0000:00/0000:00:04.0==
21 | modalias : pci:v000010DEd000020B0sv000010DEsd0000134Fbc03sc02i00
22 | vendor : NVIDIA Corporation
23 | driver : nvidia-driver-470-server - distro non-free
24 | driver : nvidia-driver-515-open - distro non-free recommended
25 | driver : nvidia-driver-515 - distro non-free
26 | driver : nvidia-driver-450-server - distro non-free
27 | driver : nvidia-driver-510 - distro non-free
28 | driver : nvidia-driver-510-server - distro non-free
29 | driver : nvidia-driver-515-server - distro non-free
30 | driver : nvidia-driver-470 - distro non-free
31 | driver : xserver-xorg-video-nouveau - distro free builtin
32 | ```
33 |
34 | Pick number recommended
35 |
36 | ```
37 | sudo apt install nvidia-headless-515-server nvidia-utils-515-server
38 | ```
39 |
40 | Reboot
41 | ```bash
42 | sudo reboot
43 | ```
44 |
45 | ## Install CUDA
46 |
47 | ```bash
48 | wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
49 | sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
50 | ```
51 |
52 | Add public key/repo
53 | ```bash
54 | sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
55 | sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
56 | ```
57 |
58 | Install cuda toolkit.
59 | If you want to use conda setup, please refer the corresponding section and skip below steps.
60 |
61 | ```bash
62 | sudo apt update
63 | sudo apt install cuda-toolkit-11-7
64 | ```
65 |
66 | (you can type-hint after cuda-toolkit- to find all available versions.)
67 |
68 | Download [CUDNN](https://developer.nvidia.com/cudnn) and scp it to the instance.
69 |
70 | Extract
71 |
72 | ```bash
73 | tar -xf cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
74 | sudo cp cudnn-linux-x86_64-8.6.0.163_cuda11-archive/include/cudnn*.h /usr/local/cuda/include
75 | sudo cp cudnn-linux-x86_64-8.6.0.163_cuda11-archive/lib/libcudnn* /usr/local/cuda/lib64
76 | sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
77 | ```
78 |
79 | Add to .bashrc
80 |
81 | ```
82 | export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
83 | export CUDA_HOME=/usr/local/cuda
84 | export PATH="/usr/local/cuda/:$PATH"
85 | ```
86 |
87 | then
88 |
89 | ```bash
90 | source ~/.bashrc
91 | ```
92 |
93 | Check everything is alright
94 |
95 | ```bash
96 | nvidia-smi
97 | ```
98 |
99 | ## Python
100 |
101 | ```bash
102 | sudo apt-get install pip
103 | sudo apt install python3.8-venv
104 | python3 -m venv dynamo
105 | ```
106 |
107 | Add to .bashrc
108 | ```bash
109 | source dynamo/bin/activate
110 | ```
111 |
112 | then
113 |
114 | ```bash
115 | source ~/.bashrc
116 | ```
117 |
118 | Install nightlies with dynamo
119 |
120 | ```bash
121 | pip install numpy
122 | pip install --pre torch[dynamo] --extra-index-url https://download.pytorch.org/whl/nightly/cu117/
123 | ```
124 |
125 | # Conda Installation instructions
126 |
127 | 1. Install miniconda
128 | ```bash
129 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
130 | bash Miniconda3-latest-Linux-x86_64.sh
131 | ```
132 |
133 | 2. Create conda python env and then activate it
134 | ```bash
135 | conda create --name dynamo python
136 | conda activate dynamo
137 | ```
138 |
139 | 3. Install cudatoolkit-11.7. Please refer [cuda-toolkit](https://anaconda.org/nvidia/cuda-toolkit)
140 | for more information
141 | ```bash
142 | conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit
143 | ```
144 |
145 | 4. Install PyTorch along with Torch Dynamo dependencies
146 | ```bash
147 | pip install numpy
148 | pip install --pre torch[dynamo] --extra-index-url https://download.pytorch.org/whl/nightly/cu117/
149 | ```
150 |
151 | 5. Verify torch-dynamo with below command assuming you are in the top folder
152 | ```
153 | python tools/verify_dynamo.py
154 | ```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | Repo to test torchdynamo
2 |
3 | `pip install -r requirements.txt`
4 |
5 | ## Experiments
6 |
7 | This folder contains quick reproducers to observe different behaviors. The base scripts give a benchmark without any optimization. Then we can observe what happens when optimizing different things. This table regroups the average iteration time (alle executed on an A100):
8 |
9 | Batch size 16:
10 |
11 | | Script | FP32 | FP16 |
12 | |:--|:-:|:-:|
13 | | base | 54.44ms | 62.24ms |
14 | | optimize_model | 38.20ms | 29.85ms |
15 | | optimize_forward | 38.36ms | 29.49ms |
16 | | train_step | x | x |
17 |
18 | Batch size 8:
19 |
20 | | Script | FP32 | FP16 |
21 | |:--|:-:|:-:|
22 | | base | 53.47ms | 59.68ms |
23 | | optimize_model | 28.34ms | 23.80ms |
24 | | optimize_forward | 28.16ms | 29.34ms |
25 | | train_step | 1754.47ms | 1740.21ms |
26 |
27 | Using torchdynamo to optimize the train step does not really work lots of warning like this and the times are not right:
28 |
29 | ```
30 | [2022-11-04 15:32:56,201] torch._dynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
31 | [2022-11-04 15:32:56,201] torch._inductor.compile_fx: [WARNING] Aot Autograd is not safe to run, so falling back to eager
32 | ```
33 |
34 | Reproducer: `python experiments/optimize_train_step_fp16.py`
35 |
36 | Dynamic
37 |
38 | | Script | FP32 | FP16 |
39 | |:--|:-:|:-:|
40 | | dynamic | 59.23ms | 63.53ms |
41 | | dyanmic_optimized | OOM? | OOM? |
42 |
43 | ## Scripts
44 |
45 | ### Text classification
46 |
47 | Iteration avg time for training/evaluation (excluding the first) when fine-tuning BERT on MRPC. Final results of the models are within the variance of fine-tuning, no particular performance drop observed except for FP16 + torchdynamo which seems to underperform a bit (more like 82%-84% accuracy compared to 86%-87% for other tests and 0.86/0.87 F1 score instead of 0.89/0.90).
48 |
49 | | Dynamo | FP32 | FP16 |
50 | |:--|:-:|:-:|
51 | | no | 57.9ms/15.65ms | 65.87ms/18.52ms |
52 | | inductor | 36.24ms/10.55ms | 39.43ms/9.09ms |
53 |
54 | To reproduce:
55 |
56 | ```bash
57 | accelerate launch --config_file configs/base_fp32.yaml scripts/text_classification.py --task_name mrpc
58 | ```
59 |
60 | and change the config file to one of the four options in `configs` to get the four squares.
61 |
62 |
63 | ### Language Modeling
64 |
65 | ```bash
66 | accelerate launch scripts/language_modeling.py \
67 | --dataset_name wikitext \
68 | --dataset_config_name wikitext-2-raw-v1 \
69 | --model_name_or_path gpt2 \
70 | --dynamo_backend inductor \
71 | --mixed_precision fp16
72 | ```
73 |
74 | ### Vision Classification
75 |
76 | ```bash
77 | accelerate launch scripts/cv_classification.py \
78 | --model_name_or_path microsoft/resnet-18 \
79 | --dataset_name beans \
80 | --dynamo_backend inductor \
81 | --mixed_precision no
82 | ```
--------------------------------------------------------------------------------
/configs/base_fp16.yaml:
--------------------------------------------------------------------------------
1 | command_file: null
2 | commands: null
3 | compute_environment: LOCAL_MACHINE
4 | deepspeed_config: {}
5 | distributed_type: 'NO'
6 | downcast_bf16: 'no'
7 | dynamo_backend: 'NO'
8 | fsdp_config: {}
9 | gpu_ids: all
10 | machine_rank: 0
11 | main_process_ip: null
12 | main_process_port: null
13 | main_training_function: main
14 | megatron_lm_config: {}
15 | mixed_precision: fp16
16 | num_machines: 1
17 | num_processes: 1
18 | rdzv_backend: static
19 | same_network: true
20 | tpu_name: null
21 | tpu_zone: null
22 | use_cpu: false
23 |
--------------------------------------------------------------------------------
/configs/base_fp32.yaml:
--------------------------------------------------------------------------------
1 | command_file: null
2 | commands: null
3 | compute_environment: LOCAL_MACHINE
4 | deepspeed_config: {}
5 | distributed_type: 'NO'
6 | downcast_bf16: 'no'
7 | dynamo_backend: 'NO'
8 | fsdp_config: {}
9 | gpu_ids: all
10 | machine_rank: 0
11 | main_process_ip: null
12 | main_process_port: null
13 | main_training_function: main
14 | megatron_lm_config: {}
15 | mixed_precision: 'no'
16 | num_machines: 1
17 | num_processes: 1
18 | rdzv_backend: static
19 | same_network: true
20 | tpu_name: null
21 | tpu_zone: null
22 | use_cpu: false
23 |
--------------------------------------------------------------------------------
/configs/dynamo_fp16.yaml:
--------------------------------------------------------------------------------
1 | command_file: null
2 | commands: null
3 | compute_environment: LOCAL_MACHINE
4 | deepspeed_config: {}
5 | distributed_type: 'NO'
6 | downcast_bf16: 'no'
7 | dynamo_backend: INDUCTOR
8 | fsdp_config: {}
9 | gpu_ids: all
10 | machine_rank: 0
11 | main_process_ip: null
12 | main_process_port: null
13 | main_training_function: main
14 | megatron_lm_config: {}
15 | mixed_precision: fp16
16 | num_machines: 1
17 | num_processes: 1
18 | rdzv_backend: static
19 | same_network: true
20 | tpu_name: null
21 | tpu_zone: null
22 | use_cpu: false
23 |
--------------------------------------------------------------------------------
/configs/dynamo_fp32.yaml:
--------------------------------------------------------------------------------
1 | command_file: null
2 | commands: null
3 | compute_environment: LOCAL_MACHINE
4 | deepspeed_config: {}
5 | distributed_type: 'NO'
6 | downcast_bf16: 'no'
7 | dynamo_backend: INDUCTOR
8 | fsdp_config: {}
9 | gpu_ids: all
10 | machine_rank: 0
11 | main_process_ip: null
12 | main_process_port: null
13 | main_training_function: main
14 | megatron_lm_config: {}
15 | mixed_precision: 'no'
16 | num_machines: 1
17 | num_processes: 1
18 | rdzv_backend: static
19 | same_network: true
20 | tpu_name: null
21 | tpu_zone: null
22 | use_cpu: false
23 |
--------------------------------------------------------------------------------
/experiments/accelerate_script.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | from torch.optim import AdamW
6 | from accelerate import Accelerator
7 | from transformers import AutoModelForMaskedLM, AutoTokenizer
8 |
9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
10 |
11 | == History ==
12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
13 |
14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
15 |
16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
17 |
18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
19 |
20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
21 |
22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
23 |
24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
25 |
26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
27 |
28 | == Services and technologies ==
29 | === Transformers Library ===
30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
31 |
32 |
33 | === Hugging Face Hub ===
34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
35 |
36 | == References ==
37 | {{Reflist}}
38 |
39 | {{Portal bar|Companies}}
40 |
41 | {{DEFAULTSORT:Hugging Face}}
42 | [[Category:Machine learning]]
43 | [[Category:Open-source artificial intelligence]]
44 |
45 | """
46 |
47 | def parse_args():
48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
49 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
50 | parser.add_argument("--batch_size", type=int, default=16)
51 | parser.add_argument("--num_batches", type=int, default=100)
52 |
53 | args = parser.parse_args()
54 | return args
55 |
56 | class DataLoader():
57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
59 | self.batch_size = batch_size
60 | self.num_batches = num_batches
61 | self.seq_len = seq_len
62 | self.mask_token_id = tokenizer.mask_token_id
63 |
64 | def __iter__(self):
65 | for _ in range(self.num_batches):
66 | masked_samples = []
67 | samples = []
68 | for _ in range(self.batch_size):
69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1)
70 | tokens = self.tokenized_corpus[start: start + self.seq_len]
71 | samples.append(tokens)
72 |
73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
74 | masked_samples.append(masked_tokens)
75 |
76 |
77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
78 |
79 | def __len__(self):
80 | return self.num_batches
81 |
82 |
83 | def main():
84 | args = parse_args()
85 | accelerator = Accelerator()
86 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
87 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
88 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
89 | optimizer = AdamW(model.parameters(), lr=1e-4)
90 |
91 | model = model.train()
92 | model, optimizer = accelerator.prepare(model, optimizer)
93 |
94 | start_time = time.time()
95 | for step, batch in enumerate(train_dl):
96 | batch = {k: v.to(accelerator.device) for k, v in batch.items()}
97 | output = model(**batch)
98 | loss = output.loss
99 | loss.backward()
100 | optimizer.step()
101 | optimizer.zero_grad()
102 | if step == 0:
103 | first_step_time = time.time() - start_time
104 |
105 | total_training_time = time.time() - start_time
106 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
107 | print("Training finished.")
108 | print(f"First iteration took: {first_step_time:.2f}s")
109 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
110 |
111 | if __name__ == "__main__":
112 | main()
113 |
--------------------------------------------------------------------------------
/experiments/base.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | from torch.optim import AdamW
6 | from transformers import AutoModelForMaskedLM, AutoTokenizer
7 |
8 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
9 |
10 | == History ==
11 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
12 |
13 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
14 |
15 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
16 |
17 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
18 |
19 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
20 |
21 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
22 |
23 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
24 |
25 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
26 |
27 | == Services and technologies ==
28 | === Transformers Library ===
29 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
30 |
31 |
32 | === Hugging Face Hub ===
33 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
34 |
35 | == References ==
36 | {{Reflist}}
37 |
38 | {{Portal bar|Companies}}
39 |
40 | {{DEFAULTSORT:Hugging Face}}
41 | [[Category:Machine learning]]
42 | [[Category:Open-source artificial intelligence]]
43 |
44 | """
45 |
46 | torch.backends.cuda.matmul.allow_tf32 = True
47 |
48 | def parse_args():
49 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
50 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
51 | parser.add_argument("--batch_size", type=int, default=16)
52 | parser.add_argument("--num_batches", type=int, default=100)
53 |
54 | args = parser.parse_args()
55 | return args
56 |
57 | class DataLoader():
58 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
59 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
60 | self.batch_size = batch_size
61 | self.num_batches = num_batches
62 | self.seq_len = seq_len
63 | self.mask_token_id = tokenizer.mask_token_id
64 |
65 | def __iter__(self):
66 | for _ in range(self.num_batches):
67 | masked_samples = []
68 | samples = []
69 | for _ in range(self.batch_size):
70 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1)
71 | tokens = self.tokenized_corpus[start: start + self.seq_len]
72 | samples.append(tokens)
73 |
74 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
75 | masked_samples.append(masked_tokens)
76 |
77 |
78 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
79 |
80 | def __len__(self):
81 | return self.num_batches
82 |
83 |
84 | def main():
85 | args = parse_args()
86 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
87 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
88 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
89 | optimizer = AdamW(model.parameters(), lr=1e-4)
90 |
91 | device = "cuda" if torch.cuda.is_available() else "cpu"
92 | model = model.to(device).train()
93 |
94 | start_time = time.time()
95 | for step, batch in enumerate(train_dl):
96 | batch = {k: v.to(device) for k, v in batch.items()}
97 | output = model(**batch)
98 | loss = output.loss
99 | loss.backward()
100 | optimizer.step()
101 | optimizer.zero_grad()
102 | if step == 0:
103 | first_step_time = time.time() - start_time
104 |
105 | total_training_time = time.time() - start_time
106 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
107 | print("Training finished.")
108 | print(f"First iteration took: {first_step_time:.2f}s")
109 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
110 |
111 | if __name__ == "__main__":
112 | main()
113 |
--------------------------------------------------------------------------------
/experiments/base_fp16.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | from torch.optim import AdamW
6 | from transformers import AutoModelForMaskedLM, AutoTokenizer
7 |
8 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
9 |
10 | == History ==
11 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
12 |
13 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
14 |
15 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
16 |
17 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
18 |
19 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
20 |
21 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
22 |
23 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
24 |
25 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
26 |
27 | == Services and technologies ==
28 | === Transformers Library ===
29 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
30 |
31 |
32 | === Hugging Face Hub ===
33 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
34 |
35 | == References ==
36 | {{Reflist}}
37 |
38 | {{Portal bar|Companies}}
39 |
40 | {{DEFAULTSORT:Hugging Face}}
41 | [[Category:Machine learning]]
42 | [[Category:Open-source artificial intelligence]]
43 |
44 | """
45 |
46 | def parse_args():
47 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
48 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
49 | parser.add_argument("--batch_size", type=int, default=16)
50 | parser.add_argument("--num_batches", type=int, default=100)
51 |
52 | args = parser.parse_args()
53 | return args
54 |
55 | class DataLoader():
56 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
57 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
58 | self.batch_size = batch_size
59 | self.num_batches = num_batches
60 | self.seq_len = seq_len
61 | self.mask_token_id = tokenizer.mask_token_id
62 |
63 | def __iter__(self):
64 | for _ in range(self.num_batches):
65 | masked_samples = []
66 | samples = []
67 | for _ in range(self.batch_size):
68 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1)
69 | tokens = self.tokenized_corpus[start: start + self.seq_len]
70 | samples.append(tokens)
71 |
72 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
73 | masked_samples.append(masked_tokens)
74 |
75 |
76 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
77 |
78 | def __len__(self):
79 | return self.num_batches
80 |
81 |
82 | def main():
83 | args = parse_args()
84 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
85 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
86 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
87 | optimizer = AdamW(model.parameters(), lr=1e-4)
88 |
89 | device = "cuda" if torch.cuda.is_available() else "cpu"
90 | model = model.to(device).train()
91 |
92 | start_time = time.time()
93 | for step, batch in enumerate(train_dl):
94 | batch = {k: v.to(device) for k, v in batch.items()}
95 | with torch.cuda.amp.autocast():
96 | output = model(**batch)
97 | loss = output.loss
98 | loss.backward()
99 | optimizer.step()
100 | optimizer.zero_grad()
101 | if step == 0:
102 | first_step_time = time.time() - start_time
103 |
104 | total_training_time = time.time() - start_time
105 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
106 | print("Training finished.")
107 | print(f"First iteration took: {first_step_time:.2f}s")
108 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
109 |
110 | if __name__ == "__main__":
111 | main()
112 |
--------------------------------------------------------------------------------
/experiments/dynamic.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | from torch.optim import AdamW
6 | from transformers import AutoModelForMaskedLM, AutoTokenizer
7 |
8 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
9 |
10 | == History ==
11 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
12 |
13 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
14 |
15 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
16 |
17 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
18 |
19 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
20 |
21 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
22 |
23 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
24 |
25 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
26 |
27 | == Services and technologies ==
28 | === Transformers Library ===
29 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
30 |
31 |
32 | === Hugging Face Hub ===
33 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
34 |
35 | == References ==
36 | {{Reflist}}
37 |
38 | {{Portal bar|Companies}}
39 |
40 | {{DEFAULTSORT:Hugging Face}}
41 | [[Category:Machine learning]]
42 | [[Category:Open-source artificial intelligence]]
43 |
44 | """
45 |
46 | torch.backends.cuda.matmul.allow_tf32 = True
47 |
48 | def parse_args():
49 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
50 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
51 | parser.add_argument("--batch_size", type=int, default=16)
52 | parser.add_argument("--num_batches", type=int, default=100)
53 |
54 | args = parser.parse_args()
55 | return args
56 |
57 | class DataLoader():
58 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
59 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
60 | self.batch_size = batch_size
61 | self.num_batches = num_batches
62 | self.seq_len = seq_len
63 | self.mask_token_id = tokenizer.mask_token_id
64 |
65 | def __iter__(self):
66 | for _ in range(self.num_batches):
67 | masked_samples = []
68 | samples = []
69 | seq_len = random.randint(self.seq_len // 8, self.seq_len // 4 - 1) * 8
70 | for _ in range(self.batch_size):
71 | start = random.randint(0, len(self.tokenized_corpus) - seq_len - 1)
72 | tokens = self.tokenized_corpus[start: start + seq_len]
73 | samples.append(tokens)
74 |
75 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
76 | masked_samples.append(masked_tokens)
77 |
78 |
79 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
80 |
81 | def __len__(self):
82 | return self.num_batches
83 |
84 |
85 | def main():
86 | args = parse_args()
87 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
88 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
89 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
90 | optimizer = AdamW(model.parameters(), lr=1e-4)
91 |
92 | device = "cuda" if torch.cuda.is_available() else "cpu"
93 | model = model.to(device).train()
94 |
95 | start_time = time.time()
96 | for step, batch in enumerate(train_dl):
97 | batch = {k: v.to(device) for k, v in batch.items()}
98 | output = model(**batch)
99 | loss = output.loss
100 | loss.backward()
101 | optimizer.step()
102 | optimizer.zero_grad()
103 |
104 | if step == 0:
105 | first_step_time = time.time() - start_time
106 |
107 | total_training_time = time.time() - start_time
108 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
109 | print("Training finished.")
110 | print(f"First iteration took: {first_step_time:.2f}s")
111 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
112 |
113 |
114 |
115 |
116 |
117 |
118 | if __name__ == "__main__":
119 | main()
120 |
--------------------------------------------------------------------------------
/experiments/dynamic_fp16.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | from torch.optim import AdamW
6 | from transformers import AutoModelForMaskedLM, AutoTokenizer
7 |
8 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
9 |
10 | == History ==
11 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
12 |
13 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
14 |
15 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
16 |
17 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
18 |
19 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
20 |
21 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
22 |
23 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
24 |
25 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
26 |
27 | == Services and technologies ==
28 | === Transformers Library ===
29 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
30 |
31 |
32 | === Hugging Face Hub ===
33 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
34 |
35 | == References ==
36 | {{Reflist}}
37 |
38 | {{Portal bar|Companies}}
39 |
40 | {{DEFAULTSORT:Hugging Face}}
41 | [[Category:Machine learning]]
42 | [[Category:Open-source artificial intelligence]]
43 |
44 | """
45 |
46 | def parse_args():
47 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
48 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
49 | parser.add_argument("--batch_size", type=int, default=16)
50 | parser.add_argument("--num_batches", type=int, default=100)
51 |
52 | args = parser.parse_args()
53 | return args
54 |
55 | class DataLoader():
56 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
57 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
58 | self.batch_size = batch_size
59 | self.num_batches = num_batches
60 | self.seq_len = seq_len
61 | self.mask_token_id = tokenizer.mask_token_id
62 |
63 | def __iter__(self):
64 | for _ in range(self.num_batches):
65 | masked_samples = []
66 | samples = []
67 | seq_len = random.randint(self.seq_len // 8, self.seq_len // 4 - 1) * 8
68 | for _ in range(self.batch_size):
69 | start = random.randint(0, len(self.tokenized_corpus) - seq_len - 1)
70 | tokens = self.tokenized_corpus[start: start + seq_len]
71 | samples.append(tokens)
72 |
73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
74 | masked_samples.append(masked_tokens)
75 |
76 |
77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
78 |
79 | def __len__(self):
80 | return self.num_batches
81 |
82 |
83 | def main():
84 | args = parse_args()
85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
88 | optimizer = AdamW(model.parameters(), lr=1e-4)
89 |
90 | device = "cuda" if torch.cuda.is_available() else "cpu"
91 | model = model.to(device).train()
92 |
93 | start_time = time.time()
94 | for step, batch in enumerate(train_dl):
95 | batch = {k: v.to(device) for k, v in batch.items()}
96 | with torch.cuda.amp.autocast():
97 | output = model(**batch)
98 | loss = output.loss
99 | loss.backward()
100 | optimizer.step()
101 | optimizer.zero_grad()
102 |
103 | if step == 0:
104 | first_step_time = time.time() - start_time
105 |
106 | total_training_time = time.time() - start_time
107 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
108 | print("Training finished.")
109 | print(f"First iteration took: {first_step_time:.2f}s")
110 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
111 |
112 |
113 |
114 |
115 |
116 |
117 | if __name__ == "__main__":
118 | main()
119 |
--------------------------------------------------------------------------------
/experiments/dynamic_optimized.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | import torch._dynamo as dynamo
6 | from torch.optim import AdamW
7 | from transformers import AutoModelForMaskedLM, AutoTokenizer
8 |
9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
10 |
11 | == History ==
12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
13 |
14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
15 |
16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
17 |
18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
19 |
20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
21 |
22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
23 |
24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
25 |
26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
27 |
28 | == Services and technologies ==
29 | === Transformers Library ===
30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
31 |
32 |
33 | === Hugging Face Hub ===
34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
35 |
36 | == References ==
37 | {{Reflist}}
38 |
39 | {{Portal bar|Companies}}
40 |
41 | {{DEFAULTSORT:Hugging Face}}
42 | [[Category:Machine learning]]
43 | [[Category:Open-source artificial intelligence]]
44 |
45 | """
46 |
47 | torch.backends.cuda.matmul.allow_tf32 = True
48 |
49 | def parse_args():
50 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
51 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
52 | parser.add_argument("--batch_size", type=int, default=16)
53 | parser.add_argument("--num_batches", type=int, default=100)
54 |
55 | args = parser.parse_args()
56 | return args
57 |
58 | class DataLoader():
59 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
60 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
61 | self.batch_size = batch_size
62 | self.num_batches = num_batches
63 | self.seq_len = seq_len
64 | self.mask_token_id = tokenizer.mask_token_id
65 |
66 | def __iter__(self):
67 | for _ in range(self.num_batches):
68 | masked_samples = []
69 | samples = []
70 | seq_len = random.randint(self.seq_len // 8, self.seq_len // 4 - 1) * 8
71 | for _ in range(self.batch_size):
72 | start = random.randint(0, len(self.tokenized_corpus) - seq_len - 1)
73 | tokens = self.tokenized_corpus[start: start + seq_len]
74 | samples.append(tokens)
75 |
76 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
77 | masked_samples.append(masked_tokens)
78 |
79 |
80 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
81 |
82 | def __len__(self):
83 | return self.num_batches
84 |
85 |
86 | def main():
87 | args = parse_args()
88 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
89 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
90 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
91 | optimizer = AdamW(model.parameters(), lr=1e-4)
92 |
93 | device = "cuda" if torch.cuda.is_available() else "cpu"
94 | model = model.to(device).train()
95 |
96 | model = dynamo.optimize("inductor")(model)
97 |
98 | start_time = time.time()
99 | for step, batch in enumerate(train_dl):
100 | batch = {k: v.to(device) for k, v in batch.items()}
101 | output = model(**batch)
102 | loss = output.loss
103 | loss.backward()
104 | optimizer.step()
105 | optimizer.zero_grad()
106 |
107 | if step == 0:
108 | first_step_time = time.time() - start_time
109 |
110 | total_training_time = time.time() - start_time
111 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
112 | print("Training finished.")
113 | print(f"First iteration took: {first_step_time:.2f}s")
114 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
115 |
116 |
117 |
118 |
119 |
120 |
121 | if __name__ == "__main__":
122 | main()
123 |
--------------------------------------------------------------------------------
/experiments/dynamic_optimized_fp16.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | import torch._dynamo as dynamo
6 | from torch.optim import AdamW
7 | from transformers import AutoModelForMaskedLM, AutoTokenizer
8 |
9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
10 |
11 | == History ==
12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
13 |
14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
15 |
16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
17 |
18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
19 |
20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
21 |
22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
23 |
24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
25 |
26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
27 |
28 | == Services and technologies ==
29 | === Transformers Library ===
30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
31 |
32 |
33 | === Hugging Face Hub ===
34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
35 |
36 | == References ==
37 | {{Reflist}}
38 |
39 | {{Portal bar|Companies}}
40 |
41 | {{DEFAULTSORT:Hugging Face}}
42 | [[Category:Machine learning]]
43 | [[Category:Open-source artificial intelligence]]
44 |
45 | """
46 |
47 | def parse_args():
48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
49 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
50 | parser.add_argument("--batch_size", type=int, default=16)
51 | parser.add_argument("--num_batches", type=int, default=100)
52 |
53 | args = parser.parse_args()
54 | return args
55 |
56 | class DataLoader():
57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
59 | self.batch_size = batch_size
60 | self.num_batches = num_batches
61 | self.seq_len = seq_len
62 | self.mask_token_id = tokenizer.mask_token_id
63 |
64 | def __iter__(self):
65 | for _ in range(self.num_batches):
66 | masked_samples = []
67 | samples = []
68 | seq_len = random.randint(self.seq_len // 8, self.seq_len // 4 - 1) * 8
69 | for _ in range(self.batch_size):
70 | start = random.randint(0, len(self.tokenized_corpus) - seq_len - 1)
71 | tokens = self.tokenized_corpus[start: start + seq_len]
72 | samples.append(tokens)
73 |
74 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
75 | masked_samples.append(masked_tokens)
76 |
77 |
78 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
79 |
80 | def __len__(self):
81 | return self.num_batches
82 |
83 |
84 | def main():
85 | args = parse_args()
86 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
87 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
88 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
89 | optimizer = AdamW(model.parameters(), lr=1e-4)
90 |
91 | device = "cuda" if torch.cuda.is_available() else "cpu"
92 | model = model.to(device).train()
93 |
94 | model = dynamo.optimize("inductor")(model)
95 |
96 | start_time = time.time()
97 | for step, batch in enumerate(train_dl):
98 | batch = {k: v.to(device) for k, v in batch.items()}
99 | with torch.cuda.amp.autocast():
100 | output = model(**batch)
101 | loss = output.loss
102 | loss.backward()
103 | optimizer.step()
104 | optimizer.zero_grad()
105 |
106 | if step == 0:
107 | first_step_time = time.time() - start_time
108 |
109 | total_training_time = time.time() - start_time
110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
111 | print("Training finished.")
112 | print(f"First iteration took: {first_step_time:.2f}s")
113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
114 |
115 |
116 |
117 |
118 |
119 |
120 | if __name__ == "__main__":
121 | main()
122 |
--------------------------------------------------------------------------------
/experiments/generate_script.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 |
6 | from accelerate import Accelerator
7 | from transformers import AutoModelForCausalLM, AutoTokenizer
8 |
9 | torch.backends.cuda.matmul.allow_tf32 = True
10 |
11 |
12 | def parse_args():
13 | parser = argparse.ArgumentParser(description="Make a couple of generations")
14 | parser.add_argument("--model_name", type=str, default="gpt2")
15 | args = parser.parse_args()
16 | return args
17 |
18 |
19 | def main():
20 | args = parse_args()
21 | accelerator = Accelerator()
22 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
23 | inputs = tokenizer(["Once upon a time,"] * 8, return_tensors="pt")
24 | model = AutoModelForCausalLM.from_pretrained(args.model_name)
25 |
26 | model = model.eval()
27 | model = accelerator.prepare(model)
28 |
29 | start_time = time.time()
30 | for step in range(50):
31 | batch = {k: v.to(accelerator.device) for k, v in inputs.items()}
32 | output = model.generate(**batch)
33 | if step == 0:
34 | first_step_time = time.time() - start_time
35 |
36 | total_training_time = time.time() - start_time
37 | avg_iteration_time = (total_training_time - first_step_time) / (50 - 1)
38 | print("Generations finished.")
39 | print(f"First iteration took: {first_step_time:.2f}s")
40 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
41 |
42 |
43 | if __name__ == "__main__":
44 | main()
45 |
--------------------------------------------------------------------------------
/experiments/optimize_forward.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | import torch._dynamo as dynamo
6 | from torch.optim import AdamW
7 | from transformers import AutoModelForMaskedLM, AutoTokenizer
8 |
9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
10 |
11 | == History ==
12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
13 |
14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
15 |
16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
17 |
18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
19 |
20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
21 |
22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
23 |
24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
25 |
26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
27 |
28 | == Services and technologies ==
29 | === Transformers Library ===
30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
31 |
32 |
33 | === Hugging Face Hub ===
34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
35 |
36 | == References ==
37 | {{Reflist}}
38 |
39 | {{Portal bar|Companies}}
40 |
41 | {{DEFAULTSORT:Hugging Face}}
42 | [[Category:Machine learning]]
43 | [[Category:Open-source artificial intelligence]]
44 |
45 | """
46 |
47 | torch.backends.cuda.matmul.allow_tf32 = True
48 |
49 | def parse_args():
50 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
51 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
52 | parser.add_argument("--batch_size", type=int, default=16)
53 | parser.add_argument("--num_batches", type=int, default=100)
54 |
55 | args = parser.parse_args()
56 | return args
57 |
58 | class DataLoader():
59 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
60 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
61 | self.batch_size = batch_size
62 | self.num_batches = num_batches
63 | self.seq_len = seq_len
64 | self.mask_token_id = tokenizer.mask_token_id
65 |
66 | def __iter__(self):
67 | for _ in range(self.num_batches):
68 | masked_samples = []
69 | samples = []
70 | for _ in range(self.batch_size):
71 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1)
72 | tokens = self.tokenized_corpus[start: start + self.seq_len]
73 | samples.append(tokens)
74 |
75 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
76 | masked_samples.append(masked_tokens)
77 |
78 |
79 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
80 |
81 | def __len__(self):
82 | return self.num_batches
83 |
84 |
85 | def main():
86 | args = parse_args()
87 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
88 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
89 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
90 | optimizer = AdamW(model.parameters(), lr=1e-4)
91 |
92 | device = "cuda" if torch.cuda.is_available() else "cpu"
93 | model = model.to(device).train()
94 |
95 | model = dynamo.optimize("inductor")(model)
96 |
97 | start_time = time.time()
98 | for step, batch in enumerate(train_dl):
99 | batch = {k: v.to(device) for k, v in batch.items()}
100 | output = model(**batch)
101 | loss = output.loss
102 | loss.backward()
103 | optimizer.step()
104 | optimizer.zero_grad()
105 |
106 | if step == 0:
107 | first_step_time = time.time() - start_time
108 |
109 | total_training_time = time.time() - start_time
110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
111 | print("Training finished.")
112 | print(f"First iteration took: {first_step_time:.2f}s")
113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
114 |
115 |
116 |
117 |
118 |
119 |
120 | if __name__ == "__main__":
121 | main()
122 |
--------------------------------------------------------------------------------
/experiments/optimize_forward_fp16.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | import torch._dynamo as dynamo
6 | from torch.optim import AdamW
7 | from transformers import AutoModelForMaskedLM, AutoTokenizer
8 |
9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
10 |
11 | == History ==
12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
13 |
14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
15 |
16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
17 |
18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
19 |
20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
21 |
22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
23 |
24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
25 |
26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
27 |
28 | == Services and technologies ==
29 | === Transformers Library ===
30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
31 |
32 |
33 | === Hugging Face Hub ===
34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
35 |
36 | == References ==
37 | {{Reflist}}
38 |
39 | {{Portal bar|Companies}}
40 |
41 | {{DEFAULTSORT:Hugging Face}}
42 | [[Category:Machine learning]]
43 | [[Category:Open-source artificial intelligence]]
44 |
45 | """
46 |
47 | def parse_args():
48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
49 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
50 | parser.add_argument("--batch_size", type=int, default=16)
51 | parser.add_argument("--num_batches", type=int, default=100)
52 |
53 | args = parser.parse_args()
54 | return args
55 |
56 | class DataLoader():
57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
59 | self.batch_size = batch_size
60 | self.num_batches = num_batches
61 | self.seq_len = seq_len
62 | self.mask_token_id = tokenizer.mask_token_id
63 |
64 | def __iter__(self):
65 | for _ in range(self.num_batches):
66 | masked_samples = []
67 | samples = []
68 | for _ in range(self.batch_size):
69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1)
70 | tokens = self.tokenized_corpus[start: start + self.seq_len]
71 | samples.append(tokens)
72 |
73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
74 | masked_samples.append(masked_tokens)
75 |
76 |
77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
78 |
79 | def __len__(self):
80 | return self.num_batches
81 |
82 |
83 | def main():
84 | args = parse_args()
85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
88 | optimizer = AdamW(model.parameters(), lr=1e-4)
89 |
90 | device = "cuda" if torch.cuda.is_available() else "cpu"
91 | model = model.to(device).train()
92 |
93 | model.forward = dynamo.optimize("inductor")(model.forward)
94 |
95 | start_time = time.time()
96 | for step, batch in enumerate(train_dl):
97 | batch = {k: v.to(device) for k, v in batch.items()}
98 | with torch.cuda.amp.autocast():
99 | output = model(**batch)
100 | loss = output.loss
101 | loss.backward()
102 | optimizer.step()
103 | optimizer.zero_grad()
104 |
105 | if step == 0:
106 | first_step_time = time.time() - start_time
107 |
108 | total_training_time = time.time() - start_time
109 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
110 | print("Training finished.")
111 | print(f"First iteration took: {first_step_time:.2f}s")
112 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
113 |
114 |
115 |
116 |
117 |
118 |
119 | if __name__ == "__main__":
120 | main()
121 |
--------------------------------------------------------------------------------
/experiments/optimize_model.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | import torch._dynamo as dynamo
6 | from torch.optim import AdamW
7 | from transformers import AutoModelForMaskedLM, AutoTokenizer
8 |
9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
10 |
11 | == History ==
12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
13 |
14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
15 |
16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
17 |
18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
19 |
20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
21 |
22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
23 |
24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
25 |
26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
27 |
28 | == Services and technologies ==
29 | === Transformers Library ===
30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
31 |
32 |
33 | === Hugging Face Hub ===
34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
35 |
36 | == References ==
37 | {{Reflist}}
38 |
39 | {{Portal bar|Companies}}
40 |
41 | {{DEFAULTSORT:Hugging Face}}
42 | [[Category:Machine learning]]
43 | [[Category:Open-source artificial intelligence]]
44 |
45 | """
46 |
47 | torch.backends.cuda.matmul.allow_tf32 = True
48 |
49 | def parse_args():
50 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
51 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
52 | parser.add_argument("--batch_size", type=int, default=16)
53 | parser.add_argument("--num_batches", type=int, default=100)
54 |
55 | args = parser.parse_args()
56 | return args
57 |
58 | class DataLoader():
59 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
60 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
61 | self.batch_size = batch_size
62 | self.num_batches = num_batches
63 | self.seq_len = seq_len
64 | self.mask_token_id = tokenizer.mask_token_id
65 |
66 | def __iter__(self):
67 | for _ in range(self.num_batches):
68 | masked_samples = []
69 | samples = []
70 | for _ in range(self.batch_size):
71 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1)
72 | tokens = self.tokenized_corpus[start: start + self.seq_len]
73 | samples.append(tokens)
74 |
75 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
76 | masked_samples.append(masked_tokens)
77 |
78 |
79 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
80 |
81 | def __len__(self):
82 | return self.num_batches
83 |
84 |
85 | def main():
86 | args = parse_args()
87 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
88 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
89 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
90 | optimizer = AdamW(model.parameters(), lr=1e-4)
91 |
92 | device = "cuda" if torch.cuda.is_available() else "cpu"
93 | model = model.to(device).train()
94 |
95 | model = dynamo.optimize("inductor")(model)
96 |
97 | start_time = time.time()
98 | for step, batch in enumerate(train_dl):
99 | batch = {k: v.to(device) for k, v in batch.items()}
100 | output = model(**batch)
101 | loss = output.loss
102 | loss.backward()
103 | optimizer.step()
104 | optimizer.zero_grad()
105 |
106 | if step == 0:
107 | first_step_time = time.time() - start_time
108 |
109 | total_training_time = time.time() - start_time
110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
111 | print("Training finished.")
112 | print(f"First iteration took: {first_step_time:.2f}s")
113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
114 |
115 |
116 |
117 |
118 |
119 |
120 | if __name__ == "__main__":
121 | main()
122 |
--------------------------------------------------------------------------------
/experiments/optimize_model_fp16.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | import torch._dynamo as dynamo
6 | from torch.optim import AdamW
7 | from transformers import AutoModelForMaskedLM, AutoTokenizer
8 |
9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
10 |
11 | == History ==
12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
13 |
14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
15 |
16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
17 |
18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
19 |
20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
21 |
22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
23 |
24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
25 |
26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
27 |
28 | == Services and technologies ==
29 | === Transformers Library ===
30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
31 |
32 |
33 | === Hugging Face Hub ===
34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
35 |
36 | == References ==
37 | {{Reflist}}
38 |
39 | {{Portal bar|Companies}}
40 |
41 | {{DEFAULTSORT:Hugging Face}}
42 | [[Category:Machine learning]]
43 | [[Category:Open-source artificial intelligence]]
44 |
45 | """
46 |
47 | def parse_args():
48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
49 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
50 | parser.add_argument("--batch_size", type=int, default=16)
51 | parser.add_argument("--num_batches", type=int, default=100)
52 |
53 | args = parser.parse_args()
54 | return args
55 |
56 | class DataLoader():
57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
59 | self.batch_size = batch_size
60 | self.num_batches = num_batches
61 | self.seq_len = seq_len
62 | self.mask_token_id = tokenizer.mask_token_id
63 |
64 | def __iter__(self):
65 | for _ in range(self.num_batches):
66 | masked_samples = []
67 | samples = []
68 | for _ in range(self.batch_size):
69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1)
70 | tokens = self.tokenized_corpus[start: start + self.seq_len]
71 | samples.append(tokens)
72 |
73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
74 | masked_samples.append(masked_tokens)
75 |
76 |
77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
78 |
79 | def __len__(self):
80 | return self.num_batches
81 |
82 |
83 | def main():
84 | args = parse_args()
85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
88 | optimizer = AdamW(model.parameters(), lr=1e-4)
89 |
90 | device = "cuda" if torch.cuda.is_available() else "cpu"
91 | model = model.to(device).train()
92 |
93 | model.forward = dynamo.optimize("inductor")(model.forward)
94 |
95 | start_time = time.time()
96 | for step, batch in enumerate(train_dl):
97 | batch = {k: v.to(device) for k, v in batch.items()}
98 | with torch.cuda.amp.autocast():
99 | output = model(**batch)
100 | loss = output.loss
101 | loss.backward()
102 | optimizer.step()
103 | optimizer.zero_grad()
104 |
105 | if step == 0:
106 | first_step_time = time.time() - start_time
107 |
108 | total_training_time = time.time() - start_time
109 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
110 | print("Training finished.")
111 | print(f"First iteration took: {first_step_time:.2f}s")
112 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
113 |
114 |
115 |
116 |
117 |
118 |
119 | if __name__ == "__main__":
120 | main()
121 |
--------------------------------------------------------------------------------
/experiments/optimize_train_step.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | import torch._dynamo as dynamo
6 | from torch.optim import AdamW
7 | from transformers import AutoModelForMaskedLM, AutoTokenizer
8 |
9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
10 |
11 | == History ==
12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
13 |
14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
15 |
16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
17 |
18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
19 |
20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
21 |
22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
23 |
24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
25 |
26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
27 |
28 | == Services and technologies ==
29 | === Transformers Library ===
30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
31 |
32 |
33 | === Hugging Face Hub ===
34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
35 |
36 | == References ==
37 | {{Reflist}}
38 |
39 | {{Portal bar|Companies}}
40 |
41 | {{DEFAULTSORT:Hugging Face}}
42 | [[Category:Machine learning]]
43 | [[Category:Open-source artificial intelligence]]
44 |
45 | """
46 |
47 | def parse_args():
48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
49 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
50 | parser.add_argument("--batch_size", type=int, default=16)
51 | parser.add_argument("--num_batches", type=int, default=100)
52 |
53 | args = parser.parse_args()
54 | return args
55 |
56 | class DataLoader():
57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
59 | self.batch_size = batch_size
60 | self.num_batches = num_batches
61 | self.seq_len = seq_len
62 | self.mask_token_id = tokenizer.mask_token_id
63 |
64 | def __iter__(self):
65 | for _ in range(self.num_batches):
66 | masked_samples = []
67 | samples = []
68 | for _ in range(self.batch_size):
69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1)
70 | tokens = self.tokenized_corpus[start: start + self.seq_len]
71 | samples.append(tokens)
72 |
73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
74 | masked_samples.append(masked_tokens)
75 |
76 |
77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
78 |
79 | def __len__(self):
80 | return self.num_batches
81 |
82 |
83 | def main():
84 | args = parse_args()
85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
88 | optimizer = AdamW(model.parameters(), lr=1e-4)
89 |
90 | device = "cuda" if torch.cuda.is_available() else "cpu"
91 | model = model.to(device).train()
92 |
93 | @dynamo.optimize("inductor")
94 | def train_step(batch):
95 | output = model(**batch)
96 | loss = output.loss
97 | loss.backward()
98 | optimizer.step()
99 |
100 | start_time = time.time()
101 | for step, batch in enumerate(train_dl):
102 | batch = {k: v.to(device) for k, v in batch.items()}
103 | train_step(batch)
104 | optimizer.zero_grad()
105 |
106 | if step == 0:
107 | first_step_time = time.time() - start_time
108 |
109 | total_training_time = time.time() - start_time
110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
111 | print("Training finished.")
112 | print(f"First iteration took: {first_step_time:.2f}s")
113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
114 |
115 |
116 |
117 |
118 |
119 |
120 | if __name__ == "__main__":
121 | main()
122 |
--------------------------------------------------------------------------------
/experiments/optimize_train_step_fp16.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import random
3 | import time
4 | import torch
5 | import torch._dynamo as dynamo
6 | from torch.optim import AdamW
7 | from transformers import AutoModelForMaskedLM, AutoTokenizer
8 |
9 | CORPUS = """'''Hugging Face, Inc.''' is an American company that develops tools for building applications using [[machine learning]].[{{Cite web |title=Hugging Face – The AI community building the future. |url=https://huggingface.co/ |access-date=2022-08-20 |website=huggingface.co}}] It is most notable for its Transformers library built for [[natural language processing]] applications and its platform that allows users to share machine learning models and datasets.
10 |
11 | == History ==
12 | The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf originally as a company that developed a chatbot app targeted at teenagers.[{{Cite web |title=Hugging Face wants to become your artificial BFF |url=https://social.techcrunch.com/2017/03/09/hugging-face-wants-to-become-your-artificial-bff/ |access-date=2022-08-20 |website=TechCrunch |language=en-US}}] After open-sourcing the model behind the chatbot, the company [[Lean startup|pivoted]] to focus on being a platform for democratizing machine learning.
13 |
14 | In March 2021, Hugging Face raised $40 million in a [[Series B]] funding round.[{{cite web |title=Hugging Face raises $40 million for its natural language processing library |url=https://techcrunch.com/2021/03/11/hugging-face-raises-40-million-for-its-natural-language-processing-library}}]
15 |
16 | On April 28, 2021, the company launched the BigScience Research Workshop in collaboration with several other research groups to release an open large language model.[{{cite web |date=10 January 2022 |title=Inside BigScience, the quest to build a powerful open language model |url=https://venturebeat.com/2022/01/10/inside-bigscience-the-quest-to-build-a-powerful-open-language-model/}}] In 2022, the workshop concluded with the announcement of BLOOM, a multilingual large [[language model]] with 176 billion parameters.[{{Cite web |title=BLOOM |url=https://bigscience.huggingface.co/blog/bloom |access-date=2022-08-20 |website=bigscience.huggingface.co}}]
17 |
18 | On December 21, 2021, the company announced its acquisition of Gradio, a software library used to make interactive browser demos of machine learning models.[{{Cite web |title=Gradio is joining Hugging Face! |url=https://huggingface.co/blog/gradio-joins-hf |access-date=2022-08-20 |website=huggingface.co}}]
19 |
20 | On May 5, 2022, the company announced its [[Series C]] funding round led by [[Coatue Management|Coatue]] and [[Sequoia fund|Sequoia]].[{{Cite web |last=Cai |first=Kenrick |title=The $2 Billion Emoji: Hugging Face Wants To Be Launchpad For A Machine Learning Revolution |url=https://www.forbes.com/sites/kenrickcai/2022/05/09/the-2-billion-emoji-hugging-face-wants-to-be-launchpad-for-a-machine-learning-revolution/ |access-date=2022-08-20 |website=Forbes |language=en}}] The company received a $2 billion valuation.
21 |
22 | On May 13, 2022, the company introduced its Student Ambassador Program to help fulfill its mission to teach machine learning to 5 million people by 2023.[{{Cite web |title=Student Ambassador Program’s call for applications is open! |url=https://huggingface.co/blog/ambassadors |access-date=2022-08-20 |website=huggingface.co}}]
23 |
24 | On May 26, 2022, the company announced a partnership with [[Graphcore]] to optimize its Transformers library for the Graphcore IPU.[{{Cite web |title=Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers |url=https://huggingface.co/blog/graphcore-update |access-date=2022-08-19 |website=huggingface.co}}]
25 |
26 | On August 3, 2022, the company announced the Private Hub, an enterprise version of its public Hugging Face Hub that supports [[Software as a service|SaaS]] or [[On-premises software|on-premise]] deployment.[{{Cite web |title=Introducing the Private Hub: A New Way to Build With Machine Learning |url=https://huggingface.co/blog/introducing-private-hub |access-date=2022-08-20 |website=huggingface.co}}]
27 |
28 | == Services and technologies ==
29 | === Transformers Library ===
30 | The Transformers library is a [[Python (programming language)|Python]] package that contains open-source implementations of [[Transformer (machine learning model)|transformer]] models for text, image, and audio tasks. It is compatible with the [[PyTorch]], [[TensorFlow]] and [[Google JAX|JAX]] [[deep learning]] libraries and includes implementations of notable models like [[BERT (language model)|BERT]] and [[GPT-2|GPT]].[{{Cite web |title=🤗 Transformers |url=https://huggingface.co/docs/transformers/index |access-date=2022-08-20 |website=huggingface.co}}]
31 |
32 |
33 | === Hugging Face Hub ===
34 | The Hugging Face Hub is a platform where users can share pretrained datasets, models, and demos of machine learning projects.[{{Cite web |title=Hugging Face Hub documentation |url=https://huggingface.co/docs/hub/index |access-date=2022-08-20 |website=huggingface.co}}] The Hub contains [[GitHub]]-inspired features for code-sharing and collaboration, including discussions and pull requests for projects. It also hosts Hugging Face Spaces, a hosted service that allows users to build web-based demos of machine learning apps using the Gradio or Streamlit.
35 |
36 | == References ==
37 | {{Reflist}}
38 |
39 | {{Portal bar|Companies}}
40 |
41 | {{DEFAULTSORT:Hugging Face}}
42 | [[Category:Machine learning]]
43 | [[Category:Open-source artificial intelligence]]
44 |
45 | """
46 |
47 | def parse_args():
48 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a tiny corpus for masked LM")
49 | parser.add_argument("--model_name", type=str, default="bert-base-cased")
50 | parser.add_argument("--batch_size", type=int, default=16)
51 | parser.add_argument("--num_batches", type=int, default=100)
52 |
53 | args = parser.parse_args()
54 | return args
55 |
56 | class DataLoader():
57 | def __init__(self, tokenizer, batch_size=8, num_batches=100, seq_len=128):
58 | self.tokenized_corpus = tokenizer(CORPUS).input_ids
59 | self.batch_size = batch_size
60 | self.num_batches = num_batches
61 | self.seq_len = seq_len
62 | self.mask_token_id = tokenizer.mask_token_id
63 |
64 | def __iter__(self):
65 | for _ in range(self.num_batches):
66 | masked_samples = []
67 | samples = []
68 | for _ in range(self.batch_size):
69 | start = random.randint(0, len(self.tokenized_corpus) - self.seq_len - 1)
70 | tokens = self.tokenized_corpus[start: start + self.seq_len]
71 | samples.append(tokens)
72 |
73 | masked_tokens = [(t if random.random() < 0.8 else self.mask_token_id) for t in tokens]
74 | masked_samples.append(masked_tokens)
75 |
76 |
77 | yield {"input_ids": torch.tensor(masked_samples), "labels": torch.tensor(samples)}
78 |
79 | def __len__(self):
80 | return self.num_batches
81 |
82 |
83 | def main():
84 | args = parse_args()
85 | tokenizer = AutoTokenizer.from_pretrained(args.model_name)
86 | model = AutoModelForMaskedLM.from_pretrained(args.model_name)
87 | train_dl = DataLoader(tokenizer, batch_size=args.batch_size, num_batches=args.num_batches)
88 | optimizer = AdamW(model.parameters(), lr=1e-4)
89 |
90 | device = "cuda" if torch.cuda.is_available() else "cpu"
91 | model = model.to(device).train()
92 |
93 | @dynamo.optimize("inductor")
94 | def train_step(batch):
95 | output = model(**batch)
96 | loss = output.loss
97 | loss.backward()
98 | optimizer.step()
99 |
100 | start_time = time.time()
101 | for step, batch in enumerate(train_dl):
102 | batch = {k: v.to(device) for k, v in batch.items()}
103 | train_step(batch)
104 | optimizer.zero_grad()
105 |
106 | if step == 0:
107 | first_step_time = time.time() - start_time
108 |
109 | total_training_time = time.time() - start_time
110 | avg_iteration_time = (total_training_time - first_step_time) / (len(train_dl) - 1)
111 | print("Training finished.")
112 | print(f"First iteration took: {first_step_time:.2f}s")
113 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
114 |
115 |
116 |
117 |
118 |
119 |
120 | if __name__ == "__main__":
121 | main()
122 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | transformers
2 | datasets
3 | evaluate
4 | scikit-learn
5 | git+https://github.com/huggingface/accelerate@main
--------------------------------------------------------------------------------
/run_experiments.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | if [[ -z $1 ]];
4 | then
5 | echo "model_name_or_path not passed"
6 | exit 1
7 | else
8 | echo "model_name_or_path = $1"
9 | fi
10 |
11 | if [[ -z $2 ]];
12 | then
13 | echo "num_runs not passed"
14 | exit 1
15 | else
16 | echo "num_runs = $2"
17 | fi
18 |
19 | if [[ -z $3 ]];
20 | then
21 | echo "task_name not passed"
22 | exit 1
23 | else
24 | echo "task_name = $2"
25 | fi
26 |
27 | model_name_or_path=$1
28 | num_runs=$2
29 | task_name=$3
30 |
31 | for ((i = 1; i <= $num_runs; i++));
32 | do
33 | echo "experiment run $i"
34 |
35 | case $task_name in
36 | "text_classification")
37 | echo "Running text_classification"
38 | echo "inductor backend with fp32"
39 | accelerate launch scripts/text_classification.py \
40 | --task_name mrpc \
41 | --seed $i \
42 | --model_name_or_path $model_name_or_path \
43 | --dynamo_backend inductor
44 | echo "no backend with fp32"
45 | accelerate launch scripts/text_classification.py \
46 | --task_name mrpc \
47 | --seed $i \
48 | --model_name_or_path $model_name_or_path
49 | echo "inductor backend with fp16"
50 | accelerate launch scripts/text_classification.py \
51 | --task_name mrpc \
52 | --seed $i \
53 | --model_name_or_path $model_name_or_path \
54 | --dynamo_backend inductor \
55 | --mixed_precision fp16
56 | echo "no backend with fp16"
57 | accelerate launch scripts/text_classification.py \
58 | --task_name mrpc \
59 | --seed $i \
60 | --model_name_or_path $model_name_or_path \
61 | --mixed_precision fp16 \
62 | ;;
63 | "language_modeling")
64 | echo "Running language_modeling"
65 | echo "inductor backend with fp32"
66 | accelerate launch scripts/language_modeling.py \
67 | --dataset_name wikitext \
68 | --dataset_config_name wikitext-2-raw-v1 \
69 | --seed $i \
70 | --model_name_or_path $model_name_or_path \
71 | --dynamo_backend inductor
72 | echo "no backend with fp32"
73 | accelerate launch scripts/language_modeling.py \
74 | --dataset_name wikitext \
75 | --dataset_config_name wikitext-2-raw-v1 \
76 | --seed $i \
77 | --model_name_or_path $model_name_or_path
78 | echo "inductor backend with fp16"
79 | accelerate launch scripts/language_modeling.py \
80 | --dataset_name wikitext \
81 | --dataset_config_name wikitext-2-raw-v1 \
82 | --seed $i \
83 | --model_name_or_path $model_name_or_path \
84 | --dynamo_backend inductor \
85 | --mixed_precision fp16
86 | echo "no backend with fp16"
87 | accelerate launch scripts/language_modeling.py \
88 | --dataset_name wikitext \
89 | --dataset_config_name wikitext-2-raw-v1 \
90 | --seed $i \
91 | --model_name_or_path $model_name_or_path \
92 | --mixed_precision fp16
93 | ;;
94 | "cv_classification")
95 | echo "Running cv_classification"
96 | echo "inductor backend with fp32"
97 | accelerate launch scripts/cv_classification.py \
98 | --dataset_name beans \
99 | --seed $i \
100 | --model_name_or_path $model_name_or_path \
101 | --dynamo_backend inductor
102 | echo "no backend with fp32"
103 | accelerate launch scripts/cv_classification.py \
104 | --dataset_name beans \
105 | --seed $i \
106 | --model_name_or_path $model_name_or_path
107 | echo "inductor backend with fp16"
108 | accelerate launch scripts/cv_classification.py \
109 | --dataset_name beans \
110 | --seed $i \
111 | --model_name_or_path $model_name_or_path \
112 | --dynamo_backend inductor \
113 | --mixed_precision fp16
114 | echo "no backend with fp16"
115 | accelerate launch scripts/cv_classification.py \
116 | --dataset_name beans \
117 | --seed $i \
118 | --model_name_or_path $model_name_or_path \
119 | --mixed_precision fp16
120 | ;;
121 | *)
122 | echo "Invalid task_name"
123 | exit 1
124 | ;;
125 | esac
126 | done
--------------------------------------------------------------------------------
/scripts/cv_classification.py:
--------------------------------------------------------------------------------
1 | # coding=utf-8
2 | # Copyright 2022 The HuggingFace Inc. team. All rights reserved.
3 | #
4 | # Licensed under the Apache License, Version 2.0 (the "License");
5 | # you may not use this file except in compliance with the License.
6 | # You may obtain a copy of the License at
7 | #
8 | # http://www.apache.org/licenses/LICENSE-2.0
9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | """ Finetuning any 🤗 Transformers model for image classification leveraging 🤗 Accelerate."""
16 | import argparse
17 | import json
18 | import logging
19 | import math
20 | import os
21 | from pathlib import Path
22 | import time
23 |
24 | import datasets
25 | import torch
26 | from datasets import load_dataset
27 | from torch.utils.data import DataLoader
28 | from torchvision.transforms import (
29 | CenterCrop,
30 | Compose,
31 | Normalize,
32 | RandomHorizontalFlip,
33 | RandomResizedCrop,
34 | Resize,
35 | ToTensor,
36 | )
37 | from tqdm.auto import tqdm
38 |
39 | import evaluate
40 | import transformers
41 | from accelerate import Accelerator
42 | from accelerate.logging import get_logger
43 | from accelerate.utils import set_seed
44 | from huggingface_hub import Repository
45 | from transformers import (
46 | AutoFeatureExtractor,
47 | AutoModelForImageClassification,
48 | get_scheduler,
49 | )
50 |
51 |
52 | torch.backends.cuda.matmul.allow_tf32 = True
53 | logger = get_logger(__name__)
54 |
55 |
56 | def parse_args():
57 | parser = argparse.ArgumentParser(description="Fine-tune a Transformers model on an image classification dataset")
58 | parser.add_argument(
59 | "--dataset_name",
60 | type=str,
61 | default="cifar10",
62 | help=(
63 | "The name of the Dataset (from the HuggingFace hub) to train on (could be your own, possibly private,"
64 | " dataset)."
65 | ),
66 | )
67 | parser.add_argument(
68 | "--model_name_or_path",
69 | type=str,
70 | help="Path to pretrained model or model identifier from huggingface.co/models.",
71 | default="google/vit-base-patch16-224-in21k",
72 | )
73 | parser.add_argument(
74 | "--batch_size",
75 | type=int,
76 | default=8,
77 | help="Batch size (per device) for the training dataloader.",
78 | )
79 | parser.add_argument(
80 | "--learning_rate",
81 | type=float,
82 | default=5e-5,
83 | help="Initial learning rate (after the potential warmup period) to use.",
84 | )
85 | parser.add_argument("--num_epochs", type=int, default=3, help="Total number of training epochs to perform.")
86 | parser.add_argument("--seed", type=int, default=0, help="A seed for reproducible training.")
87 | parser.add_argument("--dynamo_backend", type=str, default="no", help="Dynamo backend")
88 | parser.add_argument("--mixed_precision", type=str, default="no", help="`no` or `fp16`")
89 | args = parser.parse_args()
90 | return args
91 |
92 |
93 | def main():
94 | args = parse_args()
95 | set_seed(args.seed)
96 | accelerator = Accelerator(dynamo_backend=args.dynamo_backend, mixed_precision=args.mixed_precision)
97 |
98 | logger.info(accelerator.state)
99 | # Make one log on every process with the configuration for debugging.
100 | logging.basicConfig(
101 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
102 | datefmt="%m/%d/%Y %H:%M:%S",
103 | level=logging.INFO,
104 | )
105 | logger.info(accelerator.state, main_process_only=False)
106 | if accelerator.is_local_main_process:
107 | datasets.utils.logging.set_verbosity_warning()
108 | transformers.utils.logging.set_verbosity_info()
109 | else:
110 | datasets.utils.logging.set_verbosity_error()
111 | transformers.utils.logging.set_verbosity_error()
112 |
113 | dataset = load_dataset(args.dataset_name, task="image-classification")
114 | feature_extractor = AutoFeatureExtractor.from_pretrained(args.model_name_or_path)
115 | model = AutoModelForImageClassification.from_pretrained(
116 | args.model_name_or_path,
117 | num_labels=len(dataset["train"].features["labels"].names),
118 | ignore_mismatched_sizes=True,
119 | )
120 |
121 | # Preprocessing the datasets
122 |
123 | # Define torchvision transforms to be applied to each image.
124 | if "shortest_edge" in feature_extractor.size:
125 | size = feature_extractor.size["shortest_edge"]
126 | else:
127 | size = (feature_extractor.size["height"], feature_extractor.size["width"])
128 | normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
129 | train_transforms = Compose(
130 | [
131 | RandomResizedCrop(size),
132 | RandomHorizontalFlip(),
133 | ToTensor(),
134 | normalize,
135 | ]
136 | )
137 | val_transforms = Compose(
138 | [
139 | Resize(size),
140 | CenterCrop(size),
141 | ToTensor(),
142 | normalize,
143 | ]
144 | )
145 |
146 | def preprocess_train(example_batch):
147 | """Apply _train_transforms across a batch."""
148 | example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
149 | return example_batch
150 |
151 | def preprocess_val(example_batch):
152 | """Apply _val_transforms across a batch."""
153 | example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
154 | return example_batch
155 |
156 | with accelerator.main_process_first():
157 | dataset["train"] = dataset["train"].shuffle(seed=args.seed)
158 | # Set the training transforms
159 | train_dataset = dataset["train"].with_transform(preprocess_train)
160 | dataset["validation"] = dataset["validation"].shuffle(seed=args.seed)
161 | # Set the validation transforms
162 | eval_dataset = dataset["validation"].with_transform(preprocess_val)
163 |
164 | # DataLoaders creation:
165 | def collate_fn(examples):
166 | pixel_values = torch.stack([example["pixel_values"] for example in examples])
167 | labels = torch.tensor([example["labels"] for example in examples])
168 | return {"pixel_values": pixel_values, "labels": labels}
169 |
170 | train_dataloader = DataLoader(
171 | train_dataset, shuffle=True, collate_fn=collate_fn, batch_size=args.batch_size, drop_last=True
172 | )
173 | eval_dataloader = DataLoader(eval_dataset, collate_fn=collate_fn, batch_size=args.batch_size, drop_last=True)
174 |
175 | # Optimizer
176 | optimizer = torch.optim.AdamW(model.parameters(), lr=args.learning_rate)
177 |
178 | # Scheduler.
179 | lr_scheduler = get_scheduler(
180 | name="linear",
181 | optimizer=optimizer,
182 | num_warmup_steps=0,
183 | num_training_steps=len(train_dataloader) * args.num_epochs,
184 | )
185 |
186 | # Prepare everything with our `accelerator`.
187 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(
188 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
189 | )
190 |
191 | # Get the metric function
192 | metric = evaluate.load("accuracy")
193 | # Train!
194 | # Only show the progress bar once on each machine.
195 | train_steps = len(train_dataloader) * args.num_epochs
196 | progress_bar = tqdm(range(train_steps), disable=not accelerator.is_local_main_process)
197 |
198 | start_time = time.time()
199 | for epoch in range(args.num_epochs):
200 | model.train()
201 | for step, batch in enumerate(train_dataloader):
202 | outputs = model(**batch)
203 | loss = outputs.loss
204 | predictions, references = accelerator.gather_for_metrics((outputs.logits.argmax(dim=-1), batch["labels"]))
205 | metric.add_batch(predictions=predictions, references=references)
206 | accelerator.backward(loss)
207 | optimizer.step()
208 | lr_scheduler.step()
209 | optimizer.zero_grad()
210 | progress_bar.update(1)
211 | if step == 0 and epoch == 0:
212 | first_step_time = time.time() - start_time
213 |
214 | eval_train_metric = metric.compute()
215 | print(f"Training Accuracy for backend {args.dynamo_backend} at epoch {epoch}: {eval_train_metric}")
216 |
217 | total_training_time = time.time() - start_time
218 | avg_train_iteration_time = (total_training_time - first_step_time) / (train_steps - 1)
219 | print("Training finished.")
220 | print(f"First iteration took: {first_step_time:.2f}s")
221 | print(f"Average time after the first iteration: {avg_train_iteration_time * 1000:.2f}ms")
222 |
223 | model.eval()
224 | start_time = time.time()
225 | for step, batch in enumerate(eval_dataloader):
226 | with torch.no_grad():
227 | outputs = model(**batch)
228 | predictions = outputs.logits.argmax(dim=-1)
229 | predictions, references = accelerator.gather_for_metrics((predictions, batch["labels"]))
230 | metric.add_batch(predictions=predictions, references=references)
231 |
232 | if step == 0:
233 | first_step_time = time.time() - start_time
234 | total_eval_time = time.time() - start_time
235 | avg_test_iteration_time = (total_eval_time - first_step_time) / (len(eval_dataloader) - 1)
236 | print("Evaluation finished.")
237 | print(f"First iteration took: {first_step_time:.2f}s")
238 | print(f"Average time after the first iteration: {avg_test_iteration_time * 1000:.2f}ms")
239 |
240 | eval_test_metric = metric.compute()
241 | print(f"Test Accuracy for backend {args.dynamo_backend}: {eval_test_metric}")
242 |
243 | out_dict = {
244 | "backend": args.dynamo_backend,
245 | "mixed_precision": args.mixed_precision,
246 | "num_epochs": str(args.num_epochs),
247 | "seed": str(args.seed),
248 | "train_acc": str(eval_train_metric["accuracy"]),
249 | "avg_train_time": str(avg_train_iteration_time * 1000),
250 | "test_acc": str(eval_test_metric["accuracy"]),
251 | "avg_test_time": str(avg_test_iteration_time * 1000),
252 | }
253 | prefix = args.model_name_or_path.split("/")[-1]
254 | with open(f"{prefix}_cv_classification_results.csv", "a+") as fd:
255 | fd.seek(0)
256 | if len(fd.read(1)) == 0:
257 | fd.write(",".join(out_dict.keys()) + "\n")
258 | else:
259 | fd.write("\n")
260 | fd.write(",".join(out_dict.values()))
261 |
262 |
263 | if __name__ == "__main__":
264 | main()
265 |
--------------------------------------------------------------------------------
/scripts/language_modeling.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # coding=utf-8
3 | # Copyright 2021 The HuggingFace Inc. team. All rights reserved.
4 | #
5 | # Licensed under the Apache License, Version 2.0 (the "License");
6 | # you may not use this file except in compliance with the License.
7 | # You may obtain a copy of the License at
8 | #
9 | # http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 | """
17 | Fine-tuning the library models for causal language modeling (GPT, GPT-2, CTRL, ...)
18 | on a text file or a dataset without using HuggingFace Trainer.
19 |
20 | Here is the full list of checkpoints on the hub that can be fine-tuned by this script:
21 | https://huggingface.co/models?filter=text-generation
22 | """
23 | # You can also adapt this script on your own causal language modeling task. Pointers for this are left as comments.
24 |
25 | import argparse
26 | import logging
27 | import math
28 | import os
29 | from itertools import chain
30 | import time
31 |
32 | import datasets
33 | import torch
34 | from datasets import load_dataset
35 | from torch.utils.data import DataLoader
36 | from tqdm.auto import tqdm
37 |
38 | import transformers
39 | from accelerate import Accelerator
40 | from accelerate.logging import get_logger
41 | from accelerate.utils import set_seed
42 | from transformers import (
43 | AutoModelForCausalLM,
44 | AutoTokenizer,
45 | default_data_collator,
46 | get_scheduler,
47 | )
48 |
49 | torch.backends.cuda.matmul.allow_tf32 = True
50 | logger = get_logger(__name__)
51 |
52 |
53 | def parse_args():
54 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a causal language modeling task")
55 | parser.add_argument(
56 | "--dataset_name",
57 | type=str,
58 | default=None,
59 | help="The name of the dataset to use (via the datasets library).",
60 | )
61 | parser.add_argument(
62 | "--dataset_config_name",
63 | type=str,
64 | default=None,
65 | help="The configuration name of the dataset to use (via the datasets library).",
66 | )
67 | parser.add_argument(
68 | "--model_name_or_path",
69 | type=str,
70 | help="Path to pretrained model or model identifier from huggingface.co/models.",
71 | required=False,
72 | )
73 | parser.add_argument(
74 | "--batch_size",
75 | type=int,
76 | default=8,
77 | help="Batch size (per device) for the training dataloader.",
78 | )
79 | parser.add_argument("--num_epochs", type=int, default=3, help="Total number of training epochs to perform.")
80 | parser.add_argument("--seed", type=int, default=0, help="A seed for reproducible training.")
81 | parser.add_argument("--dynamo_backend", type=str, default="no", help="Dynamo backend")
82 | parser.add_argument("--mixed_precision", type=str, default="no", help="`no` or `fp16`")
83 | args = parser.parse_args()
84 | return args
85 |
86 |
87 | def main():
88 | args = parse_args()
89 | set_seed(args.seed)
90 | accelerator = Accelerator(dynamo_backend=args.dynamo_backend, mixed_precision=args.mixed_precision)
91 |
92 | # Make one log on every process with the configuration for debugging.
93 | logging.basicConfig(
94 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
95 | datefmt="%m/%d/%Y %H:%M:%S",
96 | level=logging.INFO,
97 | )
98 | logger.info(accelerator.state, main_process_only=False)
99 | if accelerator.is_local_main_process:
100 | datasets.utils.logging.set_verbosity_warning()
101 | transformers.utils.logging.set_verbosity_info()
102 | else:
103 | datasets.utils.logging.set_verbosity_error()
104 | transformers.utils.logging.set_verbosity_error()
105 |
106 | if args.dataset_name is not None:
107 | # Downloading and loading a dataset from the hub.
108 | raw_datasets = load_dataset(args.dataset_name, args.dataset_config_name)
109 | if "validation" not in raw_datasets.keys():
110 | raw_datasets["validation"] = load_dataset(
111 | args.dataset_name,
112 | args.dataset_config_name,
113 | split=f"train[:{args.validation_split_percentage}%]",
114 | )
115 | raw_datasets["train"] = load_dataset(
116 | args.dataset_name,
117 | args.dataset_config_name,
118 | split=f"train[{args.validation_split_percentage}%:]",
119 | )
120 |
121 | tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path)
122 | model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path)
123 |
124 | # Preprocessing the datasets.
125 | # First we tokenize all the texts.
126 | column_names = raw_datasets["train"].column_names
127 | text_column_name = "text" if "text" in column_names else column_names[0]
128 |
129 | def tokenize_function(examples):
130 | return tokenizer(examples[text_column_name])
131 |
132 | with accelerator.main_process_first():
133 | tokenized_datasets = raw_datasets.map(
134 | tokenize_function,
135 | batched=True,
136 | num_proc=4,
137 | remove_columns=column_names,
138 | load_from_cache_file=False,
139 | desc="Running tokenizer on dataset",
140 | )
141 | block_size = tokenizer.model_max_length
142 |
143 | # Main data processing function that will concatenate all texts from our dataset and generate chunks of block_size.
144 | def group_texts(examples):
145 | # Concatenate all texts.
146 | concatenated_examples = {k: list(chain(*examples[k])) for k in examples.keys()}
147 | total_length = len(concatenated_examples[list(examples.keys())[0]])
148 | # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
149 | # customize this part to your needs.
150 | if total_length >= block_size:
151 | total_length = (total_length // block_size) * block_size
152 | # Split by chunks of max_len.
153 | result = {
154 | k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
155 | for k, t in concatenated_examples.items()
156 | }
157 | result["labels"] = result["input_ids"].copy()
158 | return result
159 |
160 | with accelerator.main_process_first():
161 | lm_datasets = tokenized_datasets.map(
162 | group_texts,
163 | batched=True,
164 | num_proc=4,
165 | load_from_cache_file=False,
166 | desc=f"Grouping texts in chunks of {block_size}",
167 | )
168 |
169 | train_dataset = lm_datasets["train"]
170 | eval_dataset = lm_datasets["validation"]
171 |
172 | # DataLoaders creation:
173 | train_dataloader = DataLoader(
174 | train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=args.batch_size, drop_last=True
175 | )
176 | eval_dataloader = DataLoader(
177 | eval_dataset, collate_fn=default_data_collator, batch_size=args.batch_size, drop_last=True
178 | )
179 |
180 | # Optimizer
181 | optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
182 |
183 | # Scheduler.
184 | lr_scheduler = get_scheduler(
185 | name="linear",
186 | optimizer=optimizer,
187 | num_warmup_steps=0,
188 | num_training_steps=len(train_dataloader) * args.num_epochs,
189 | )
190 |
191 | # Prepare everything with our `accelerator`.
192 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(
193 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
194 | )
195 |
196 | # Train!
197 | # Only show the progress bar once on each machine.
198 | train_steps = len(train_dataloader) * args.num_epochs
199 | progress_bar = tqdm(range(train_steps), disable=not accelerator.is_local_main_process)
200 |
201 | start_time = time.time()
202 | for epoch in range(args.num_epochs):
203 | model.train()
204 | total_loss = 0
205 | for step, batch in enumerate(train_dataloader):
206 | outputs = model(**batch)
207 | loss = outputs.loss
208 | total_loss += loss.detach().float()
209 | accelerator.backward(loss)
210 | optimizer.step()
211 | lr_scheduler.step()
212 | optimizer.zero_grad()
213 | progress_bar.update(1)
214 | if step == 0 and epoch == 0:
215 | first_step_time = time.time() - start_time
216 | train_perplexity = torch.exp(total_loss / len(train_dataloader))
217 | print(f"Training Perplexity for backend {args.dynamo_backend} at epoch {epoch}: {train_perplexity}")
218 |
219 | total_training_time = time.time() - start_time
220 | avg_train_iteration_time = (total_training_time - first_step_time) / (train_steps - 1)
221 | print("Training finished.")
222 | print(f"First iteration took: {first_step_time:.2f}s")
223 | print(f"Average time after the first iteration: {avg_train_iteration_time * 1000:.2f}ms")
224 | model.eval()
225 | total_loss = 0
226 | start_time = time.time()
227 | for step, batch in enumerate(eval_dataloader):
228 | with torch.no_grad():
229 | outputs = model(**batch)
230 | loss = outputs.loss
231 | total_loss += loss.detach().float()
232 | if step == 0:
233 | first_step_time = time.time() - start_time
234 |
235 | total_eval_time = time.time() - start_time
236 | total_eval_time = time.time() - start_time
237 | avg_test_iteration_time = (total_eval_time - first_step_time) / (len(eval_dataloader) - 1)
238 | print("Evaluation finished.")
239 | print(f"First iteration took: {first_step_time:.2f}s")
240 | print(f"Average time after the first iteration: {avg_test_iteration_time * 1000:.2f}ms")
241 | test_perplexity = torch.exp(total_loss / len(eval_dataloader))
242 | print(f"Test Perplexity for backend {args.dynamo_backend}: {test_perplexity}")
243 | out_dict = {
244 | "backend": args.dynamo_backend,
245 | "mixed_precision": args.mixed_precision,
246 | "num_epochs": str(args.num_epochs),
247 | "seed": str(args.seed),
248 | "train_perplexity": str(train_perplexity.item()),
249 | "avg_train_time": str(avg_train_iteration_time * 1000),
250 | "test_perplexity": str(test_perplexity.item()),
251 | "avg_test_time": str(avg_test_iteration_time * 1000),
252 | }
253 | prefix = args.model_name_or_path.split("/")[-1]
254 | with open(f"{prefix}_language_modeling_task_results.csv", "a+") as fd:
255 | fd.seek(0)
256 | if len(fd.read(1)) == 0:
257 | fd.write(",".join(out_dict.keys()) + "\n")
258 | else:
259 | fd.write("\n")
260 | fd.write(",".join(out_dict.values()))
261 |
262 |
263 | if __name__ == "__main__":
264 | main()
265 |
--------------------------------------------------------------------------------
/scripts/text_classification.py:
--------------------------------------------------------------------------------
1 | # coding=utf-8
2 | # Copyright 2021 The HuggingFace Inc. team. All rights reserved.
3 | #
4 | # Licensed under the Apache License, Version 2.0 (the "License");
5 | # you may not use this file except in compliance with the License.
6 | # You may obtain a copy of the License at
7 | #
8 | # http://www.apache.org/licenses/LICENSE-2.0
9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | """ Finetuning a 🤗 Transformers model for sequence classification on GLUE."""
16 | import argparse
17 | import logging
18 | import time
19 |
20 | import datasets
21 | import torch
22 | from datasets import load_dataset
23 | from torch.utils.data import DataLoader
24 | from tqdm.auto import tqdm
25 |
26 | import evaluate
27 | import transformers
28 | from accelerate import Accelerator
29 | from accelerate.logging import get_logger
30 | from transformers import (
31 | AutoModelForSequenceClassification,
32 | AutoTokenizer,
33 | DataCollatorWithPadding,
34 | default_data_collator,
35 | get_scheduler,
36 | )
37 |
38 | torch.backends.cuda.matmul.allow_tf32 = True
39 | logger = get_logger(__name__)
40 |
41 | task_to_keys = {
42 | "cola": ("sentence", None),
43 | "mnli": ("premise", "hypothesis"),
44 | "mrpc": ("sentence1", "sentence2"),
45 | "qnli": ("question", "sentence"),
46 | "qqp": ("question1", "question2"),
47 | "rte": ("sentence1", "sentence2"),
48 | "sst2": ("sentence", None),
49 | "stsb": ("sentence1", "sentence2"),
50 | "wnli": ("sentence1", "sentence2"),
51 | }
52 |
53 |
54 | def parse_args():
55 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a text classification task")
56 | parser.add_argument(
57 | "--task_name",
58 | type=str,
59 | default=None,
60 | help="The name of the glue task to train on.",
61 | choices=list(task_to_keys.keys()),
62 | )
63 | parser.add_argument(
64 | "--max_length",
65 | type=int,
66 | default=128,
67 | help=(
68 | "The maximum total input sequence length after tokenization. Sequences longer than this will be truncated,"
69 | " sequences shorter will be padded unless `--dynamic_lengh` is passed."
70 | ),
71 | )
72 | parser.add_argument(
73 | "--dynamic_length",
74 | action="store_true",
75 | help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.",
76 | )
77 | parser.add_argument(
78 | "--model_name_or_path",
79 | type=str,
80 | help="Path to pretrained model or model identifier from huggingface.co/models.",
81 | default="bert-base-cased",
82 | )
83 | parser.add_argument(
84 | "--batch_size",
85 | type=int,
86 | default=16,
87 | help="Batch size (per device) for the dataloaders.",
88 | )
89 | parser.add_argument(
90 | "--num_epochs",
91 | type=int,
92 | default=3,
93 | help="Number of training epochs.",
94 | )
95 | parser.add_argument("--dynamo_backend", type=str, default="no", help="Dynamo backend")
96 | parser.add_argument("--seed", type=int, default=0, help="random seed for torch")
97 | parser.add_argument("--mixed_precision", type=str, default="no", help="`no` or `fp16`")
98 | args = parser.parse_args()
99 | return args
100 |
101 |
102 | def main():
103 | args = parse_args()
104 | torch.manual_seed(args.seed)
105 | accelerator = Accelerator(dynamo_backend=args.dynamo_backend, mixed_precision=args.mixed_precision)
106 |
107 | # Make one log on every process with the configuration for debugging.
108 | logging.basicConfig(
109 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
110 | datefmt="%m/%d/%Y %H:%M:%S",
111 | level=logging.INFO,
112 | )
113 | logger.info(accelerator.state, main_process_only=False)
114 | if accelerator.is_local_main_process:
115 | datasets.utils.logging.set_verbosity_warning()
116 | transformers.utils.logging.set_verbosity_info()
117 | else:
118 | datasets.utils.logging.set_verbosity_error()
119 | transformers.utils.logging.set_verbosity_error()
120 |
121 | # Load data
122 | raw_datasets = load_dataset("glue", args.task_name)
123 |
124 | is_regression = args.task_name == "stsb"
125 | if not is_regression:
126 | label_list = raw_datasets["train"].features["label"].names
127 | num_labels = len(label_list)
128 | else:
129 | num_labels = 1
130 |
131 | # Load pretrained model and tokenizer
132 | tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path)
133 | model = AutoModelForSequenceClassification.from_pretrained(args.model_name_or_path, num_labels=num_labels)
134 |
135 | # Preprocessing the datasets
136 | sentence1_key, sentence2_key = task_to_keys[args.task_name]
137 | padding = False if args.dynamic_length else "max_length"
138 |
139 | def preprocess_function(examples):
140 | # Tokenize the texts
141 | texts = (
142 | (examples[sentence1_key],) if sentence2_key is None else (examples[sentence1_key], examples[sentence2_key])
143 | )
144 | result = tokenizer(*texts, padding=padding, max_length=args.max_length, truncation=True)
145 | result["labels"] = examples["label"]
146 | return result
147 |
148 | with accelerator.main_process_first():
149 | processed_datasets = raw_datasets.map(
150 | preprocess_function,
151 | batched=True,
152 | remove_columns=raw_datasets["train"].column_names,
153 | desc="Running tokenizer on dataset",
154 | )
155 |
156 | train_dataset = processed_datasets["train"]
157 | eval_dataset = processed_datasets["validation_matched" if args.task_name == "mnli" else "validation"]
158 |
159 | # DataLoaders creation:
160 | if not args.dynamic_length:
161 | data_collator = default_data_collator
162 | else:
163 | data_collator = DataCollatorWithPadding(tokenizer, pad_to_multiple_of=8)
164 |
165 | train_dataloader = DataLoader(
166 | train_dataset, shuffle=True, collate_fn=data_collator, batch_size=args.batch_size, drop_last=True
167 | )
168 | eval_dataloader = DataLoader(
169 | eval_dataset, collate_fn=data_collator, batch_size=args.batch_size, drop_last=not args.dynamic_length
170 | )
171 |
172 | # Optimizer
173 | optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
174 |
175 | # Scheduler.
176 | lr_scheduler = get_scheduler(
177 | name="linear",
178 | optimizer=optimizer,
179 | num_warmup_steps=0,
180 | num_training_steps=len(train_dataloader) * args.num_epochs,
181 | )
182 |
183 | # Prepare everything with our `accelerator`.
184 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(
185 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
186 | )
187 |
188 | # Get the metric function
189 | metric = evaluate.load("glue", args.task_name)
190 | # Train!
191 | # Only show the progress bar once on each machine.
192 | train_steps = len(train_dataloader) * args.num_epochs
193 | progress_bar = tqdm(range(train_steps), disable=not accelerator.is_local_main_process)
194 | start_time = time.time()
195 | for epoch in range(args.num_epochs):
196 | model.train()
197 | for step, batch in enumerate(train_dataloader):
198 | # We need to skip steps until we reach the resumed step
199 | outputs = model(**batch)
200 | loss = outputs.loss
201 | predictions, references = accelerator.gather_for_metrics((outputs.logits.argmax(dim=-1), batch["labels"]))
202 | metric.add_batch(predictions=predictions, references=references)
203 | accelerator.backward(loss)
204 | optimizer.step()
205 | lr_scheduler.step()
206 | optimizer.zero_grad()
207 | progress_bar.update(1)
208 | if step == 0 and epoch == 0:
209 | first_step_time = time.time() - start_time
210 |
211 | eval_train_metric = metric.compute()
212 | print(f"Training Accuracy for backend {args.dynamo_backend} at epoch {epoch}: {eval_train_metric}")
213 |
214 | total_training_time = time.time() - start_time
215 | avg_train_iteration_time = (total_training_time - first_step_time) / (train_steps - 1)
216 | print("Training finished.")
217 | print(f"First iteration took: {first_step_time:.2f}s")
218 | print(f"Average time after the first iteration: {avg_train_iteration_time * 1000:.2f}ms")
219 |
220 | model.eval()
221 | start_time = time.time()
222 | for step, batch in enumerate(eval_dataloader):
223 | with torch.no_grad():
224 | outputs = model(**batch)
225 | predictions = outputs.logits.argmax(dim=-1) if not is_regression else outputs.logits.squeeze()
226 | predictions, references = accelerator.gather_for_metrics((predictions, batch["labels"]))
227 | metric.add_batch(predictions=predictions, references=references)
228 |
229 | if step == 0:
230 | first_step_time = time.time() - start_time
231 | total_eval_time = time.time() - start_time
232 | avg_test_iteration_time = (total_eval_time - first_step_time) / (len(eval_dataloader) - 1)
233 | print("Evaluation finished.")
234 | print(f"First iteration took: {first_step_time:.2f}s")
235 | print(f"Average time after the first iteration: {avg_test_iteration_time * 1000:.2f}ms")
236 |
237 | eval_test_metric = metric.compute()
238 | print(f"Test Accuracy for backend {args.dynamo_backend}: {eval_test_metric}")
239 |
240 | out_dict = {
241 | "backend": args.dynamo_backend,
242 | "mixed_precision": args.mixed_precision,
243 | "num_epochs": str(args.num_epochs),
244 | "seed": str(args.seed),
245 | "train_acc": str(eval_train_metric["accuracy"]),
246 | "train_f1": str(eval_train_metric["f1"]),
247 | "avg_train_time": str(avg_train_iteration_time * 1000),
248 | "test_acc": str(eval_test_metric["accuracy"]),
249 | "test_f1": str(eval_test_metric["f1"]),
250 | "avg_test_time": str(avg_test_iteration_time * 1000),
251 | }
252 | prefix = args.model_name_or_path.split("/")[-1]
253 | with open(f"{prefix}_text_classification_results.csv", "a+") as fd:
254 | fd.seek(0)
255 | if len(fd.read(1)) == 0:
256 | fd.write(",".join(out_dict.keys()) + "\n")
257 | else:
258 | fd.write("\n")
259 | fd.write(",".join(out_dict.values()))
260 |
261 |
262 | if __name__ == "__main__":
263 | main()
264 |
--------------------------------------------------------------------------------
/scripts/translation.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # coding=utf-8
3 | # Copyright The HuggingFace Team and The HuggingFace Inc. team. All rights reserved.
4 | #
5 | # Licensed under the Apache License, Version 2.0 (the "License");
6 | # you may not use this file except in compliance with the License.
7 | # You may obtain a copy of the License at
8 | #
9 | # http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 | """
17 | Fine-tuning a 🤗 Transformers model on text translation.
18 | """
19 | # You can also adapt this script on your own text translation task. Pointers for this are left as comments.
20 |
21 | import argparse
22 | import logging
23 | import random
24 | import time
25 |
26 | import datasets
27 | import numpy as np
28 | import torch
29 | from datasets import load_dataset
30 | from torch.utils.data import DataLoader
31 | from tqdm.auto import tqdm
32 |
33 | import evaluate
34 | import transformers
35 | from accelerate import Accelerator
36 | from accelerate.logging import get_logger
37 | from transformers import (
38 | AutoModelForSeq2SeqLM,
39 | AutoTokenizer,
40 | DataCollatorForSeq2Seq,
41 | MBartTokenizer,
42 | MBartTokenizerFast,
43 | default_data_collator,
44 | get_scheduler,
45 | )
46 |
47 | torch.backends.cuda.matmul.allow_tf32 = True
48 | logger = get_logger(__name__)
49 |
50 |
51 | # Parsing input arguments
52 | def parse_args():
53 |
54 | parser = argparse.ArgumentParser(description="Finetune a transformers model on a text classification task")
55 | parser.add_argument(
56 | "--model_name_or_path",
57 | type=str,
58 | help="Path to pretrained model or model identifier from huggingface.co/models.",
59 | default="t5-small",
60 | )
61 | parser.add_argument(
62 | "--max_length",
63 | type=int,
64 | default=128,
65 | help=(
66 | "The maximum total input sequence length after tokenization. Sequences longer than this will be truncated,"
67 | " sequences shorter will be padded unless `--dynamic_lengh` is passed."
68 | ),
69 | )
70 | parser.add_argument(
71 | "--dynamic_length",
72 | action="store_true",
73 | help="If passed, pad all samples to `max_length`. Otherwise, dynamic padding is used.",
74 | )
75 | parser.add_argument(
76 | "--batch_size",
77 | type=int,
78 | default=16,
79 | help="Batch size (per device) for the dataloaders.",
80 | )
81 | parser.add_argument(
82 | "--num_epochs",
83 | type=int,
84 | default=1,
85 | help="Number of training epochs.",
86 | )
87 | parser.add_argument("--seed", type=int, default=0, help="A seed for reproducible training.")
88 | parser.add_argument("--dynamo_backend", type=str, default="no", help="Dynamo backend")
89 | parser.add_argument("--mixed_precision", type=str, default="no", help="`no` or `fp16`")
90 | return parser.parse_args()
91 |
92 |
93 | def main():
94 | args = parse_args()
95 | torch.manual_seed(args.seed)
96 | accelerator = Accelerator(dynamo_backend=args.dynamo_backend, mixed_precision=args.mixed_precision)
97 |
98 | # Make one log on every process with the configuration for debugging.
99 | logging.basicConfig(
100 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
101 | datefmt="%m/%d/%Y %H:%M:%S",
102 | level=logging.INFO,
103 | )
104 | logger.info(accelerator.state, main_process_only=False)
105 | if accelerator.is_local_main_process:
106 | datasets.utils.logging.set_verbosity_warning()
107 | transformers.utils.logging.set_verbosity_info()
108 | else:
109 | datasets.utils.logging.set_verbosity_error()
110 | transformers.utils.logging.set_verbosity_error()
111 |
112 | # Load data
113 | raw_datasets = load_dataset("wmt16", "ro-en")
114 |
115 | # Load pretrained model and tokenizer
116 | tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path)
117 | model = AutoModelForSeq2SeqLM.from_pretrained(args.model_name_or_path)
118 |
119 | # MBART requires some language codes
120 | if isinstance(tokenizer, (MBartTokenizer, MBartTokenizerFast)):
121 | tokenizer.src_lang = "en_XX"
122 | tokenizer.tgt_lang = "ro_RO"
123 | if model.config.decoder_start_token_id is None:
124 | if isinstance(tokenizer, MBartTokenizer):
125 | model.config.decoder_start_token_id = tokenizer.lang_code_to_id["ro_RO"]
126 | else:
127 | model.config.decoder_start_token_id = tokenizer.convert_tokens_to_ids("ro_RO")
128 |
129 | # T5 requires a prefix
130 | if args.model_name_or_path in ["t5-small", "t5-base", "t5-large", "t5-3b", "t5-11b"]:
131 | prefix = "translate English to Romanian: "
132 | else:
133 | prefix = ""
134 |
135 | # Preprocessing the datasets.
136 | padding = False if args.dynamic_length else "max_length"
137 |
138 | def preprocess_function(examples):
139 | inputs = [ex["en"] for ex in examples["translation"]]
140 | targets = [ex["ro"] for ex in examples["translation"]]
141 | inputs = [prefix + inp for inp in inputs]
142 | model_inputs = tokenizer(
143 | inputs, text_target=targets, max_length=args.max_length, padding=padding, truncation=True
144 | )
145 |
146 | # If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
147 | # padding in the loss.
148 | if padding == "max_length":
149 | model_inputs["labels"] = [
150 | [(l if l != tokenizer.pad_token_id else -100) for l in label] for label in model_inputs["labels"]
151 | ]
152 |
153 | return model_inputs
154 |
155 | with accelerator.main_process_first():
156 | processed_datasets = raw_datasets.map(
157 | preprocess_function,
158 | batched=True,
159 | remove_columns=raw_datasets["train"].column_names,
160 | desc="Running tokenizer on dataset",
161 | )
162 |
163 | train_dataset = processed_datasets["train"]
164 | eval_dataset = processed_datasets["validation"]
165 |
166 | # Log a few random samples from the training set:
167 | for index in random.sample(range(len(train_dataset)), 3):
168 | logger.info(f"Sample {index} of the training set: {train_dataset[index]}.")
169 |
170 | # DataLoaders creation:
171 | if not args.dynamic_length:
172 | data_collator = default_data_collator
173 | else:
174 | data_collator = DataCollatorForSeq2Seq(
175 | tokenizer,
176 | model=model,
177 | label_pad_token_id=-100,
178 | pad_to_multiple_of=8,
179 | )
180 |
181 | train_dataloader = DataLoader(
182 | train_dataset, shuffle=True, collate_fn=data_collator, batch_size=args.batch_size, drop_last=True
183 | )
184 | eval_dataloader = DataLoader(
185 | eval_dataset, collate_fn=data_collator, batch_size=args.batch_size, drop_last=not args.dynamic_length
186 | )
187 |
188 | # Optimizer
189 | optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
190 |
191 | # Scheduler.
192 | lr_scheduler = get_scheduler(
193 | name="linear",
194 | optimizer=optimizer,
195 | num_warmup_steps=0,
196 | num_training_steps=len(train_dataloader) * args.num_epochs,
197 | )
198 |
199 | # Prepare everything with our `accelerator`.
200 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(
201 | model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
202 | )
203 |
204 | # Metric
205 | metric = evaluate.load("sacrebleu")
206 |
207 | def postprocess_text(preds, labels):
208 | preds = [pred.strip() for pred in preds]
209 | labels = [[label.strip()] for label in labels]
210 |
211 | return preds, labels
212 |
213 | # Train!
214 | # Only show the progress bar once on each machine.
215 | train_steps = min(len(train_dataloader) * args.num_epochs, 1000)
216 | progress_bar = tqdm(range(train_steps), disable=not accelerator.is_local_main_process)
217 | start_time = time.time()
218 |
219 | for epoch in range(args.num_epochs):
220 | model.train()
221 | for step, batch in enumerate(train_dataloader):
222 | # We need to skip steps until we reach the resumed step
223 | outputs = model(**batch)
224 | loss = outputs.loss
225 | accelerator.backward(loss)
226 | optimizer.step()
227 | lr_scheduler.step()
228 | optimizer.zero_grad()
229 | progress_bar.update(1)
230 | if step == 0 and epoch == 0:
231 | first_step_time = time.time() - start_time
232 | elif step >= 1000:
233 | break
234 |
235 | total_training_time = time.time() - start_time
236 | avg_iteration_time = (total_training_time - first_step_time) / (train_steps - 1)
237 | print("Training finished.")
238 | print(f"First iteration took: {first_step_time:.2f}s")
239 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
240 |
241 | model.eval()
242 | start_time = time.time()
243 | for step, batch in enumerate(eval_dataloader):
244 | with torch.no_grad():
245 | generated_tokens = accelerator.unwrap_model(model).generate(
246 | batch["input_ids"], attention_mask=batch["attention_mask"], max_length=args.max_length
247 | )
248 | generated_tokens = accelerator.pad_across_processes(
249 | generated_tokens, dim=1, pad_index=tokenizer.pad_token_id
250 | )
251 | labels = batch["labels"]
252 | if args.dynamic_length:
253 | labels = accelerator.pad_across_processes(batch["labels"], dim=1, pad_index=tokenizer.pad_token_id)
254 |
255 | generated_tokens = accelerator.gather(generated_tokens).cpu().numpy()
256 | labels = accelerator.gather(labels).cpu().numpy()
257 |
258 | # Replace -100 in the labels as we can't decode them.
259 | labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
260 |
261 | decoded_preds = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
262 | decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
263 |
264 | decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
265 |
266 | metric.add_batch(predictions=decoded_preds, references=decoded_labels)
267 | if step == 0:
268 | first_step_time = time.time() - start_time
269 |
270 | total_eval_time = time.time() - start_time
271 | avg_iteration_time = (total_eval_time - first_step_time) / (len(eval_dataloader) - 1)
272 |
273 | print("Evaluation finished.")
274 | print(f"First iteration took: {first_step_time:.2f}s")
275 | print(f"Average time after the first iteration: {avg_iteration_time * 1000:.2f}ms")
276 |
277 | eval_metric = metric.compute()
278 | print(f"Test BLEU score for backend {args.dynamo_backend}: {eval_metric['score']}")
279 |
280 |
281 | if __name__ == "__main__":
282 | main()
283 |
--------------------------------------------------------------------------------
/tools/summarize.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import os
3 | from collections import defaultdict
4 | import argparse
5 |
6 |
7 | def generate_and_save_plots(df, output_dir):
8 | for mixed_precision in set(df["mixed_precision"].values):
9 | metrics = defaultdict(list)
10 | filtered_df = df[df["mixed_precision"] == mixed_precision]
11 | columns = list(df.columns)[2:]
12 |
13 | # saving performance plots
14 | metrics_columns = [column for column in columns if "time" not in column]
15 | df_metric = filtered_df[metrics_columns + ["backend"]]
16 | inductor_backend_values = df_metric[df_metric["backend"] == "inductor"].values[0]
17 | pytorch_backend_values = df_metric[df_metric["backend"] == "no"].values[0]
18 | for i, column in enumerate(metrics_columns):
19 | metrics["metric"].append(column)
20 | metrics["inductor"].append(inductor_backend_values[i])
21 | metrics["no"].append(pytorch_backend_values[i])
22 | df_metric = pd.DataFrame(metrics)
23 | plot = df_metric.plot.bar(x="metric", rot=0)
24 | fig = plot.get_figure()
25 | fig.savefig(os.path.join(output_dir, f"{mixed_precision=}_metric.png"))
26 |
27 | # saving avg time plots
28 | metrics = defaultdict(list)
29 | time_columns = [column for column in columns if "time" in column]
30 | df_time = filtered_df[time_columns + ["backend"]]
31 | inductor_backend_values = df_time[df_time["backend"] == "inductor"].values[0]
32 | pytorch_backend_values = df_time[df_time["backend"] == "no"].values[0]
33 | for i, column in enumerate(time_columns):
34 | metrics["avg_time"].append(column)
35 | metrics["inductor"].append(inductor_backend_values[i])
36 | metrics["no"].append(pytorch_backend_values[i])
37 | df_metric = pd.DataFrame(metrics)
38 | plot = df_metric.plot.bar(x="avg_time", rot=0)
39 | fig = plot.get_figure()
40 | fig.savefig(os.path.join(output_dir, f"{mixed_precision=}_avg_time.png"))
41 |
42 |
43 | def get_diff_percentage(df):
44 | diff_percentage = defaultdict(list)
45 | for mixed_precision in set(df["mixed_precision"].values):
46 | diff_percentage["mixed_precision"].append(mixed_precision)
47 | filtered_df = df[df["mixed_precision"] == mixed_precision]
48 | columns = list(df.columns)[2:]
49 | inductor_backend_values = filtered_df[filtered_df["backend"] == "inductor"].values[0][2:]
50 | pytorch_backend_values = filtered_df[filtered_df["backend"] == "no"].values[0][2:]
51 |
52 | for i, column in enumerate(columns):
53 | if "time" in column:
54 | diff_percentage[f"{column}_speedup"].append(
55 | str(round((pytorch_backend_values[i] / inductor_backend_values[i]), 2)) + "x"
56 | )
57 | else:
58 | diff_percentage[f"{column}_diff%"].append(
59 | str(round((100 * (inductor_backend_values[i] / pytorch_backend_values[i] - 1)), 2)) + "%"
60 | )
61 | return pd.DataFrame(diff_percentage)
62 |
63 |
64 | def main():
65 | parser = argparse.ArgumentParser(description="Get plots and summary table")
66 | parser.add_argument("--input_csv_file", type=str, required=True)
67 | parser.add_argument("--output_dir", type=str, required=True)
68 |
69 | args = parser.parse_args()
70 | os.makedirs(args.output_dir, exist_ok=True)
71 | df = pd.read_csv(args.input_csv_file)
72 | group_by_columns = ["backend", "mixed_precision"]
73 | drop_columns = ["num_epochs", "seed"]
74 | df.drop(columns=drop_columns, inplace=True)
75 | df = df.groupby(group_by_columns).agg("mean")
76 | df = df.reset_index()
77 |
78 | generate_and_save_plots(df, args.output_dir)
79 | diff_df = get_diff_percentage(df)
80 | file_prefix = args.input_csv_file.split("/")[-1].split(".")[0]
81 | diff_df.to_csv(os.path.join(args.output_dir, f"{file_prefix}_summary_table.csv"), header=True, index=False)
82 |
83 |
84 | if __name__ == "__main__":
85 | main()
86 |
--------------------------------------------------------------------------------
/tools/verify_dynamo.py:
--------------------------------------------------------------------------------
1 | import os
2 | import re
3 | import subprocess
4 | import sys
5 | import traceback
6 | import warnings
7 |
8 | from pkg_resources import packaging
9 |
10 | MIN_CUDA_VERSION = packaging.version.parse("11.6")
11 | MIN_PYTHON_VERSION = (3, 7)
12 |
13 |
14 | class VerifyDynamoError(BaseException):
15 | pass
16 |
17 |
18 | def check_python():
19 | if sys.version_info < MIN_PYTHON_VERSION:
20 | raise VerifyDynamoError(
21 | f"Python version not supported: {sys.version_info} "
22 | f"- minimum requirement: {MIN_PYTHON_VERSION}"
23 | )
24 | return sys.version_info
25 |
26 |
27 | def check_torch():
28 | import torch
29 |
30 | return packaging.version.parse(torch.__version__)
31 |
32 |
33 | # based on torch/utils/cpp_extension.py
34 | def get_cuda_version():
35 | from torch.utils import cpp_extension
36 |
37 | CUDA_HOME = cpp_extension._find_cuda_home()
38 | if not CUDA_HOME:
39 | raise VerifyDynamoError(cpp_extension.CUDA_NOT_FOUND_MESSAGE)
40 |
41 | nvcc = os.path.join(CUDA_HOME, "bin", "nvcc")
42 | cuda_version_str = (
43 | subprocess.check_output([nvcc, "--version"])
44 | .strip()
45 | .decode(*cpp_extension.SUBPROCESS_DECODE_ARGS)
46 | )
47 | cuda_version = re.search(r"release (\d+[.]\d+)", cuda_version_str)
48 | if cuda_version is None:
49 | raise VerifyDynamoError("CUDA version not found in `nvcc --version` output")
50 |
51 | cuda_str_version = cuda_version.group(1)
52 | return packaging.version.parse(cuda_str_version)
53 |
54 |
55 | def check_cuda():
56 | import torch
57 |
58 | if not torch.cuda.is_available():
59 | return None
60 |
61 | torch_cuda_ver = packaging.version.parse(torch.version.cuda)
62 |
63 | # check if torch cuda version matches system cuda version
64 | cuda_ver = get_cuda_version()
65 | if cuda_ver != torch_cuda_ver:
66 | # raise VerifyDynamoError(
67 | warnings.warn(
68 | f"CUDA version mismatch, `torch` version: {torch_cuda_ver}, env version: {cuda_ver}"
69 | )
70 |
71 | if torch_cuda_ver < MIN_CUDA_VERSION:
72 | # raise VerifyDynamoError(
73 | warnings.warn(
74 | f"(`torch`) CUDA version not supported: {torch_cuda_ver} "
75 | f"- minimum requirement: {MIN_CUDA_VERSION}"
76 | )
77 | if cuda_ver < MIN_CUDA_VERSION:
78 | # raise VerifyDynamoError(
79 | warnings.warn(
80 | f"(env) CUDA version not supported: {cuda_ver} "
81 | f"- minimum requirement: {MIN_CUDA_VERSION}"
82 | )
83 |
84 | return cuda_ver
85 |
86 |
87 | def check_dynamo(backend, device, err_msg):
88 | import torch
89 |
90 | if device == "cuda" and not torch.cuda.is_available():
91 | print(f"CUDA not available -- skipping CUDA check on {backend} backend\n")
92 | return
93 |
94 | try:
95 | import torch._dynamo as dynamo
96 |
97 | dynamo.reset()
98 |
99 | @dynamo.optimize(backend, nopython=True)
100 | def fn(x):
101 | return x + x
102 |
103 | class Module(torch.nn.Module):
104 | def __init__(self):
105 | super().__init__()
106 |
107 | def forward(self, x):
108 | return x + x
109 |
110 | mod = Module()
111 | opt_mod = dynamo.optimize(backend, nopython=True)(mod)
112 |
113 | for f in (fn, opt_mod):
114 | x = torch.randn(10, 10).to(device)
115 | x.requires_grad = True
116 | y = f(x)
117 | torch.testing.assert_close(y, x + x)
118 | z = y.sum()
119 | z.backward()
120 | torch.testing.assert_close(x.grad, 2 * torch.ones_like(x))
121 | except Exception:
122 | sys.stderr.write(traceback.format_exc() + "\n" + err_msg + "\n\n")
123 | sys.exit(1)
124 |
125 |
126 | _SANITY_CHECK_ARGS = (
127 | ("eager", "cpu", "CPU eager sanity check failed"),
128 | ("eager", "cuda", "CUDA eager sanity check failed"),
129 | ("aot_eager", "cpu", "CPU aot_eager sanity check failed"),
130 | ("aot_eager", "cuda", "CUDA aot_eager sanity check failed"),
131 | ("inductor", "cpu", "CPU inductor sanity check failed"),
132 | (
133 | "inductor",
134 | "cuda",
135 | "CUDA inductor sanity check failed\n"
136 | + "NOTE: Please check that you installed the correct hash/version of `triton`",
137 | ),
138 | )
139 |
140 |
141 | def main():
142 | python_ver = check_python()
143 | torch_ver = check_torch()
144 | cuda_ver = check_cuda()
145 | print(
146 | f"Python version: {python_ver.major}.{python_ver.minor}.{python_ver.micro}\n"
147 | f"`torch` version: {torch_ver}\n"
148 | f"CUDA version: {cuda_ver}\n"
149 | )
150 | for args in _SANITY_CHECK_ARGS:
151 | check_dynamo(*args)
152 | print("All required checks passed")
153 |
154 |
155 | if __name__ == "__main__":
156 | main()
157 |
--------------------------------------------------------------------------------