├── .gitignore
├── DESIGN_NOTES.md
├── LICENSE
├── MODELS.md
├── README.md
├── setup.py
├── src
└── exporters
│ ├── __init__.py
│ ├── coreml
│ ├── __init__.py
│ ├── __main__.py
│ ├── config.py
│ ├── convert.py
│ ├── features.py
│ ├── models.py
│ └── validate.py
│ └── utils
│ ├── __init__.py
│ └── logging.py
└── tests
├── __init__.py
├── test_coreml.py
└── testing_utils.py
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .nox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | *.py,cover
51 | .hypothesis/
52 | .pytest_cache/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | target/
76 |
77 | # Jupyter Notebook
78 | .ipynb_checkpoints
79 |
80 | # IPython
81 | profile_default/
82 | ipython_config.py
83 |
84 | # pyenv
85 | .python-version
86 |
87 | # pipenv
88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
91 | # install all needed dependencies.
92 | #Pipfile.lock
93 |
94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
95 | __pypackages__/
96 |
97 | # Celery stuff
98 | celerybeat-schedule
99 | celerybeat.pid
100 |
101 | # SageMath parsed files
102 | *.sage.py
103 |
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 |
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 |
117 | # Rope project settings
118 | .ropeproject
119 |
120 | # mkdocs documentation
121 | /site
122 |
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 |
128 | # Pyre type checker
129 | .pyre/
130 |
131 | # .lock
132 | *.lock
133 |
134 | # DS_Store (MacOS)
135 | .DS_Store
136 |
--------------------------------------------------------------------------------
/DESIGN_NOTES.md:
--------------------------------------------------------------------------------
1 | # Design notes for Core ML exporters
2 |
3 | The design of the Core ML exporter for 🤗 Transformers is based on that of the ONNX exporter. Both are used in the same manner and in some places the code is very similar. However, there are also differences due to the way Core ML works. This file documents the decisions that went into building the Core ML exporter.
4 |
5 | ## Philosophy
6 |
7 | An important goal of Core ML is to make using models completely hands-off. For example, if a model requires an image as input, you can simply give it an image object without having to preprocess the image first. And if the model is a classifier, the output is the winning class label instead of a logits tensor. The Core ML exporter will add extra operations to the beginning and end of the model where possible, so that users of these models do not have to do their own pre- and postprocessing if Core ML can already handle this for them.
8 |
9 | The Core ML exporter is built on top of `coremltools`. This library first converts the PyTorch or TensorFlow model into an intermediate representation known as MIL, then performs optimizations on the MIL graph, and finally serializes the result into a `.mlmodel` or `.mlpackage` file (the latter being the preferred format).
10 |
11 | Design of the exporter:
12 |
13 | - The Core ML conversion process is described by a `CoreMLConfig` object, analogous to `OnnxConfig`.
14 |
15 | - In order to distinguish between the `default` task for text models and vision models, the config object must have a `modality` property. Unfortunately, there is no way determine the modality from the `AutoModel` object, so this property must be set in the `CoreMLConfig` subclass.
16 |
17 | - The standard `CoreMLConfig` object already chooses appropriate input and output descriptions for most models. Only models that do something different, for example use BGR input images instead of RGB, need to have their own config object.
18 |
19 | - If a user wants to change properties of the inputs or outputs (name, description, sequence length, other settings), they have to subclass the `XYZCoreMLConfig` object and override these methods. Not very convenient, but it's also not something people will need to do a lot — and if they do, it means we made the wrong default choice.
20 |
21 | - Where possible, the behavior of the converted model is described by the tokenizer or feature extractor. For example, to use a different input image size, the user would need to create the feature extractor with those settings and use that during the conversion instead of the default feature extractor.
22 |
23 | - The `FeaturesManager` code is copied from `transformers.onnx.features` with minimal changes to the logic, only the table with supported models is different (and using `CoreMLConfig` instead of `OnnxConfig`).
24 |
25 | Extra stuff the Core ML exporter does:
26 |
27 | - For image inputs, mean/std normalization is performed by the Core ML model. Resizing and cropping the image still needs to be done by the user but is usually left to other Apple frameworks such as Vision.
28 |
29 | - Tensor inputs may have a different datatype. Specifically, `bool` and `int64` are converted to `int32` inputs, as that is the only integer datatype Core ML can handle.
30 |
31 | - Classifier models that output a single prediction for each input example are treated as special by Core ML. These models have two outputs: one with the class label of the best prediction, and another with a dictionary giving the probabilities for all the classes.
32 |
33 | - Models that perform classification but do not fit into Core ML's definition of a classifier, for example a semantic segmentation model, have the list of class names added to the model's metadata. Core ML ignores these class names but they can be retrieved by writing a few lines of Swift code.
34 |
35 | - Because the goal is to make the converted models as convenient as possible for users, any model that predicts `logits` has the option of applying a softmax, to output probabilities instead of logits. This option is enabled by default for such models. For image segmentation models, there can be two operations inserted: upsampling to the image's original spatial dimensions, followed by an argmax to select the class index for each pixel.
36 |
37 | - The exporter may add extra metadata to allow making predictions from Xcode's model previewer.
38 |
39 | - Quantization and other optimizations can automatically be applied by `coremltools`, and therefore are part of the Core ML exporting workflow. The user can always make additional changes to the Core ML afterwards using `coremltools`, such as renaming the inputs and outputs, applying quantization, etc.
40 |
41 | Note: Tokenizers are not a built-in feature of Core ML. A model that requires tokenized input must be tokenized by the user themselves. This is outside the scope of the Core ML exporter.
42 |
43 | ## Supported tasks
44 |
45 | The Core ML exporter supports most of the tasks that the ONNX exporter supports, except for:
46 |
47 | - `image-segmentation` / `AutoModelForImageSegmentation`
48 |
49 | Tasks that the Core ML exporter supports but the ONNX exporter currently doesn't:
50 |
51 | - `next-sentence-prediction`
52 | - `semantic-segmentation`
53 |
54 | Tasks that neither of them support right now:
55 |
56 | - `AutoModelForAudioClassification`
57 | - `AutoModelForAudioFrameClassification`
58 | - `AutoModelForAudioXVector`
59 | - `AutoModelForCTC`
60 | - `AutoModelForInstanceSegmentation`
61 | - `AutoModelForPreTraining`
62 | - `AutoModelForSpeechSeq2Seq`
63 | - `AutoModelForTableQuestionAnswering`
64 | - `AutoModelForVideoClassification`
65 | - `AutoModelForVision2Seq`
66 | - `AutoModelForVisualQuestionAnswering`
67 | - `...DoubleHeadsModel`
68 | - `...ForImageClassificationWithTeacher`
69 |
70 | Tasks that could be improved:
71 |
72 | - `object-detection`. If a Core ML model outputs the predicted bounding boxes in a certain manner, the user does not have to do any decoding and can directly use these outputs in their app (through the Vision framework). Currently, the Core ML exporter does not add this extra functionality.
73 |
74 | ## Missing features
75 |
76 | The following are not supported yet but would be useful to add:
77 |
78 | - Flexible input sizes. Core ML models typically work with fixed input dimensions, but it also supports flexible image sizes and tensor shapes. The exporter currently supports flexible sequence lengths, but not image sizes.
79 |
80 | - Note: Certain models, notably BERT, currently give conversion errors with a flexible sequence length. This appears to be an issue with coremltools.
81 |
82 | - More quantization options. coremltools 6 adds new quantization options for ML Program models, plus options for sparsifying weights.
83 |
84 | - `validate_model_outputs`: If the model supports a flexible input sequence length, run the test three times: once with the maximum length (that's what happens now), once with the minimum length, and once with a length in between (possibly randomly chosen).
85 |
86 | There are certain models that cannot be converted because of the way they are structured, or due to limitations and bugs in coremltools. Sometimes these can be fixed by making changes to the Transformers code, by implementing missing ops, or by filing bugs against coremltools. Trying to get as many Transformers models to export without issues is a work in progress.
87 |
88 | ### `-with-past` versions for seq2seq models
89 |
90 | The encoder portion of the model is easy: this does not have a `past_key_values` option, so this is always converted with `use_past=False`.
91 |
92 | When the decoder is used with `use_cache=True`, it needs to accept a `past_key_values` tensor that consists of a 4-tuple for each layer with the key/value for the decoder but also the key/value for the encoder. The decoder and encoder tensors have different shapes because they have different sequence lengths.
93 |
94 | The encoder past key/values only need to be computed once, on the first iteration, and then they're simply re-used by the model on subsequent iterations. The decoder past key/values tensors grow in size with each iteration.
95 |
96 | Handling the decoder past key/values tensors in Core ML is not a problem. On the first iteration, you can pass in a tensor with a shape of `(batch, num_layers, 0, num_heads)` or just leave out this tensor completely as it is marked optional. The model returns a new past key/values tensor and you simply pass that in on the next iteration.
97 |
98 | This does not work for the encoder key/values. Core ML cannot perform branching logic in the model (not entirely true but its branching operation involves running a submodel and is rather complicated) and so the JIT trace must always choose one of the paths.
99 |
100 | What this means is: If we specify dummy encoder key/value inputs during the JIT trace, then the cross-attention layer will not perform the `k_proj` and `v_proj` operations on the encoder's hidden state outputs.
101 |
102 | In `BartAttention` that is these lines:
103 |
104 | ```python
105 | if is_cross_attention and past_key_value is not None:
106 | # reuse k,v, cross_attentions
107 | key_states = past_key_value[0]
108 | value_states = past_key_value[1]
109 | elif is_cross_attention:
110 | # cross_attentions
111 | key_states = self._shape(self.k_proj(key_value_states), -1, bsz)
112 | value_states = self._shape(self.v_proj(key_value_states), -1, bsz)
113 | elif past_key_value is not None:
114 | ...
115 | ```
116 |
117 | Here, `past_key_value` is the encoder key/values tensors and `key_value_states` is the encoder's last hidden state. The Core ML model can only include one of these branches, not both.
118 |
119 | If during the JIT trace we pass in dummy tensors for the encoder key/value tensors, then the first branch is taken and `k_proj` and `v_proj` are never executed. The problem is that we need those projection operations to happen on the very first iteration.
120 |
121 | In theory, we could solve this by never using the encoder key/values tensors, so that the second branch is always taken. This is less efficient, since it involves performing the same linear layers over and over, but at least it will work.
122 |
123 | However, this workaround fails when an encoder attention mask is provided. In `BartDecoderLayer` the following happens:
124 |
125 | ```python
126 | cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None
127 | ```
128 |
129 | Since the `past_key_value` tensor is now a 2-tuple instead of a 4-tuple (since we're no longer providing the encoder key/values), the expression `past_key_value[-2:]` will attempt to use the decoder key/values tensors for the cross attention. It should use the tensors at indices 2 and 3, but because the tuple only has two tensors in it now, this will use indices 0 and 1 — which are not the correct tensors!
130 |
131 | Since the key/values from indices 0,1 have the target sequence length from the decoder, the encoder's `attention_mask` cannot be applied.
132 |
133 | And even if we don't use this attention mask, what happens is incorrect anyway. The second branch will still never be taken (as `cross_attn_past_key_value` is not None) and `k_proj` and `v_proj` are never executed.
134 |
135 | I currently don't see a solution to this except perhaps rewriting the decoder layer to do the following instead, but that requires changing a lot of source files in `transformers` and is a suboptimal solution anyway.
136 |
137 | ```python
138 | cross_attn_past_key_value = past_key_value[-2:] if (past_key_value is not None and len(past_key_value) > 2) else None
139 | ```
140 |
141 | We could also export two versions of the decoder model: one for the first iteration and one for the remaining iterations but that's not great either.
142 |
143 | ## Assumptions made by the exporter
144 |
145 | The Core ML exporter needs to make certain assumptions about the Transformers models. These are:
146 |
147 | - A vision `AutoModel` is expected to output hidden states. If there is a second output, this is assumed to be from the pooling layer.
148 |
149 | - The input size for a vision model is given by the feature extractor's `crop_size` property if it exists and `do_center_crop` is true, or otherwise by its `size` property.
150 |
151 | - The image normalization for a vision model is given by the feature extractor's `image_std` and `image_mean` if it has those, otherwise assume `std = 1/255` and `mean = 0`.
152 |
153 | - The `masked-im` task expects a `bool_masked_pos` tensor as the second input. If `bool_masked_pos` is provided, some of these models return the loss value and others don't. If more than one tensor is returned, we assume the first one is the loss and ignore it.
154 |
155 | - If text models have two inputs, the second one is the `attention_mask`. If they have three inputs, the third is `token_type_ids`.
156 |
157 | - The `object-detection` task outputs logits and boxes (only tested with YOLOS so far).
158 |
159 | - If bicubic resizing is used, it gets replaced by bilinear since Core ML doesn't support bicubic. This has a noticeable effect on the predictions, but usually the model is still usable.
160 |
161 | ## Other remarks
162 |
163 | - Just as in the ONNX exporter, the `validate_model_outputs()` function takes an `atol` argument for the absolute tolerance. It might be more appropriate to do this test as `max(abs(coreml - reference)) / max(abs(reference))` to get an error measurement that's relative to the magnitude of the values in the output tensors.
164 |
165 | - Image classifier models have the usual `classLabel` and `probabilities` outputs, but also a "hidden" `var_xxx` output with the softmax results. This appears to be a minor bug in the converter; it doesn't hurt anything to keep this extra output.
166 |
167 | ## Running the tests
168 |
169 | The unit tests attempt to convert all supported models, and verify that their output is close to that of the original models. This can be very slow! These tests require a Mac.
170 |
171 | ```
172 | $ cd exporters
173 | $ RUN_SLOW=1 pytest tests/test_coreml.py --capture=sys -W ignore
174 | ```
175 |
176 | The `--capture=sys` and `-W ignore` arguments are used to suppress the coremltools progress bars and other messages.
177 |
178 | Tip: After running the tests, go into `/private/var/folders/...` and remove all the `.mlpackage` and `.mlmodel` files, as well as the `com.apple.MetalPerformanceShadersGraph` directory. coremtools leaves a lot of junk here that can quickly eat up your local storage space.
179 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/MODELS.md:
--------------------------------------------------------------------------------
1 |
16 |
17 | # Models that are / aren't supported by 🤗 Exporters
18 |
19 | Only models that have a `ModelNameCoreMLConfig` object are currently supported.
20 |
21 | If a model is not supported, this is either because there is some problem with the actual conversion process, or because we simply did not get around to writing a `CoreMLConfig` object for it.
22 |
23 | ## Supported models
24 |
25 | Legend:
26 |
27 | - ✅ = fully supported
28 | - 😓 = works but with hacks
29 | - ⚠️ = partially supported (for example no "with past" version)
30 | - ❌ = errors during conversion
31 | - ➖ = not supported
32 | - ? = unknown
33 |
34 | ### Text Models
35 |
36 | **BART**
37 |
38 | - ⚠️ BartModel (currently supports only `use_past=False`)
39 | - ✅ BartForCausalLM
40 | - ⚠️ BartForConditionalGeneration (currently supports only `use_past=False`)
41 | - ? BartForQuestionAnswering
42 | - ? BartForSequenceClassification
43 |
44 | **BERT**
45 |
46 | - ✅ BertModel
47 | - ➖ BertForPreTraining
48 | - ✅ BertForMaskedLM
49 | - ✅ BertForMultipleChoice
50 | - ✅ BertForNextSentencePrediction
51 | - ✅ BertForQuestionAnswering
52 | - ✅ BertForSequenceClassification
53 | - ✅ BertForTokenClassification
54 | - ⚠️ BertLMHeadModel: works OK with coremltools commit 50c5569, breaks with later versions
55 |
56 | **BigBird**
57 |
58 | - ? BigBirdModel
59 | - ➖ BigBirdForPreTraining
60 | - ⚠️ BigBirdForCausalLM: works OK with coremltools commit 50c5569, breaks with later versions
61 | - ? BigBirdForMaskedLM
62 | - ? BigBirdForMultipleChoice
63 | - ? BigBirdForQuestionAnswering
64 | - ? BigBirdForSequenceClassification
65 | - ? BigBirdForTokenClassification
66 |
67 | **BigBirdPegasus**
68 |
69 | - ⚠️ BigBirdPegasusModel (currently supports only `use_past=False`)
70 | - ✅ BigBirdPegasusForCausalLM
71 | - ⚠️ BigBirdPegasusForConditionalGeneration (currently supports only `use_past=False`)
72 | - ? BigBirdPegasusForQuestionAnswering
73 | - ? BigBirdPegasusForSequenceClassification
74 |
75 | **Blenderbot**
76 |
77 | - ⚠️ BlenderbotModel (currently supports only `use_past=False`)
78 | - ? BlenderbotForCausalLM
79 | - ⚠️ BlenderbotForConditionalGeneration (currently supports only `use_past=False`)
80 |
81 | **Blenderbot Small**
82 |
83 | - ⚠️ BlenderbotSmallModel (currently supports only `use_past=False`)
84 | - ? BlenderbotSmallForCausalLM
85 | - ⚠️ BlenderbotSmallForConditionalGeneration (currently supports only `use_past=False`)
86 |
87 | **CTRL**
88 |
89 | - ✅ CTRLModel
90 | - ✅ CTRLLMHeadModel
91 | - ✅ CTRLForSequenceClassification
92 |
93 | **DistilBERT**
94 |
95 | - ✅ DistilBertModel
96 | - ✅ DistilBertForMaskedLM
97 | - ✅ DistilBertForMultipleChoice
98 | - ✅ DistilBertForQuestionAnswering
99 | - ✅ DistilBertForSequenceClassification
100 | - ✅ DistilBertForTokenClassification
101 |
102 | **ERNIE**
103 |
104 | - ? ErnieModel
105 | - ➖ ErnieForPreTraining
106 | - ⚠️ ErnieForCausalLM: works OK with coremltools commit 50c5569, breaks with later versions
107 | - ? ErnieForMaskedLM
108 | - ? ErnieForMultipleChoice
109 | - ? ErnieForNextSentencePrediction
110 | - ? ErnieForQuestionAnswering
111 | - ? ErnieForSequenceClassification
112 | - ? ErnieForTokenClassification
113 |
114 | **GPT2 / DistilGPT2**
115 |
116 | Does not work with flexible sequence length and therefore does not support `use_past`.
117 |
118 | - ✅ GPT2Model
119 | - ➖ GPT2DoubleHeadsModel
120 | - ✅ GPT2ForSequenceClassification
121 | - ✅ GPT2ForTokenClassification
122 | - ⚠️ GPT2LMHeadModel (no `use_past`)
123 |
124 | **Llama**
125 |
126 | - ✅ LlamaForCausalLM
127 |
128 | **M2M100**
129 |
130 | - ⚠️ M2M100Model (currently supports only `use_past=False`)
131 | - ⚠️ M2M100ForConditionalGeneration (currently supports only `use_past=False`)
132 |
133 | **MarianMT**
134 |
135 | - ⚠️ MarianModel (currently supports only `use_past=False`)
136 | - ? MarianForCausalLM
137 | - ⚠️ MarianMTModel (currently supports only `use_past=False`)
138 |
139 | **Mistral**
140 |
141 | - ✅ MistralForCausalLM
142 |
143 | **MobileBERT**
144 |
145 | - ✅ MobileBertModel
146 | - ➖ MobileBertForPreTraining
147 | - ✅ MobileBertForMaskedLM
148 | - ✅ MobileBertForMultipleChoice
149 | - ✅ MobileBertForNextSentencePrediction
150 | - ✅ MobileBertForQuestionAnswering
151 | - ✅ MobileBertForSequenceClassification
152 | - ✅ MobileBertForTokenClassification
153 |
154 | **MVP**
155 |
156 | - ⚠️ MvpModel (currently supports only `use_past=False`)
157 | - ? MvpForCausalLM
158 | - ⚠️ MvpForConditionalGeneration (currently supports only `use_past=False`)
159 | - ? MvpForSequenceClassification
160 | - ? MvpForQuestionAnswering
161 |
162 | **Pegasus**
163 |
164 | - ⚠️ PegasusModel (currently supports only `use_past=False`)
165 | - ? PegasusForCausalLM
166 | - ⚠️ PegasusForConditionalGeneration (currently supports only `use_past=False`)
167 |
168 | **PLBart**
169 |
170 | - ⚠️ PLBartModel (currently supports only `use_past=False`)
171 | - ? PLBartForCausalLM
172 | - ⚠️ PLBartForConditionalGeneration (currently supports only `use_past=False`)
173 | - ? PLBartForSequenceClassification
174 |
175 | **RoBERTa**
176 |
177 | - ? RobertaModel
178 | - ⚠️ RobertaForCausalLM: works OK with coremltools commit 50c5569, breaks with later versions
179 | - ? RobertaForMaskedLM
180 | - ? RobertaForMultipleChoice
181 | - ? RobertaForQuestionAnswering
182 | - ? RobertaForSequenceClassification
183 | - ? RobertaForTokenClassification
184 |
185 | **RoFormer**
186 |
187 | - ? RoFormerModel
188 | - ❌ RoFormerForCausalLM: Conversion may appear to work but the model does not actually run. Core ML takes forever to load the model, allocates 100+ GB of RAM and eventually crashes.
189 | - ? RoFormerForMaskedLM
190 | - ? RoFormerForSequenceClassification
191 | - ? RoFormerForMultipleChoice
192 | - ? RoFormerForTokenClassification
193 | - ? RoFormerForQuestionAnswering
194 |
195 | **Splinter**
196 |
197 | - ❌ SplinterModel: Conversion may appear to work but the model does not actually run. Core ML takes forever to load the model, allocates 100+ GB of RAM and eventually crashes.
198 | - ➖ SplinterForPreTraining
199 | - SplinterForQuestionAnswering
200 |
201 | **SqueezeBERT**
202 |
203 | - ✅ SqueezeBertModel
204 | - ✅ SqueezeBertForMaskedLM
205 | - ✅ SqueezeBertForMultipleChoice
206 | - ✅ SqueezeBertForQuestionAnswering
207 | - ✅ SqueezeBertForSequenceClassification
208 | - ✅ SqueezeBertForTokenClassification
209 |
210 | **T5**
211 |
212 | - ⚠️ T5Model (currently supports only `use_past=False`)
213 | - ✅ T5EncoderModel
214 | - ⚠️ T5ForConditionalGeneration (currently supports only `use_past=False`)
215 |
216 | ### Vision Models
217 |
218 | **BEiT**
219 |
220 | - ✅ BeitModel
221 | - ✅ BeitForImageClassification
222 | - ✅ BeitForSemanticSegmentation
223 | - ✅ BeitForMaskedImageModeling. Note: this model does not work with AutoModelForMaskedImageModeling and therefore the conversion script cannot load it, but converting from Python is supported.
224 |
225 | **ConvNeXT**
226 |
227 | - ✅ ConvNextModel
228 | - ✅ ConvNextForImageClassification
229 |
230 | **CvT**
231 |
232 | - ✅ CvtModel
233 | - ✅ CvtForImageClassification
234 |
235 | **LeViT**
236 |
237 | - ✅ LevitModel
238 | - ✅ LevitForImageClassification
239 | - ➖ LevitForImageClassificationWithTeacher
240 |
241 | **MobileViT**
242 |
243 | - ✅ MobileViTModel
244 | - ✅ MobileViTForImageClassification
245 | - ✅ MobileViTForSemanticSegmentation
246 |
247 | **MobileViTv2**
248 |
249 | - ✅ MobileViTV2Model
250 | - ✅ MobileViTV2ForImageClassification
251 | - ✅ MobileViTV2ForSemanticSegmentation
252 |
253 | **SegFormer**
254 |
255 | - ✅ SegformerModel
256 | - ✅ SegformerForImageClassification
257 | - ✅ SegformerForSemanticSegmentation
258 |
259 | **Vision Transformer (ViT)**
260 |
261 | - ✅ ViTModel
262 | - ✅ ViTForMaskedImageModeling
263 | - ✅ ViTForImageClassification
264 |
265 | **YOLOS**
266 |
267 | - ✅ YolosModel
268 | - ✅ YolosForObjectDetection
269 |
270 | ### Audio Models
271 |
272 | None
273 |
274 | ### Multimodal Models
275 |
276 | **Data2Vec Audio**
277 |
278 | - ? Data2VecAudioModel: [TODO verify] The conversion completes without errors but the Core ML compiler cannot load the model.
279 | - ? Data2VecAudioForAudioFrameClassification
280 | - ? Data2VecAudioForCTC
281 | - ? Data2VecAudioForSequenceClassification
282 | - ? Data2VecAudioForXVector
283 |
284 | **Data2Vec Text**
285 |
286 | - ? Data2VecTextModel
287 | - ⚠️ Data2VecTextForCausalLM: works OK with coremltools commit 50c5569, breaks with later versions
288 | - ? Data2VecTextForMaskedLM
289 | - ? Data2VecTextForMultipleChoice
290 | - ? Data2VecTextForQuestionAnswering
291 | - ? Data2VecTextForSequenceClassification
292 | - ? Data2VecTextForTokenClassification
293 |
294 | **Data2Vec Vision**
295 |
296 | - ? Data2VecVisionModel
297 | - ? Data2VecVisionForImageClassification
298 | - ? Data2VecVisionForSemanticSegmentation
299 |
300 | ## Models that currently don't work
301 |
302 | The following models are known to give errors when attempting conversion to Core ML format, or simply have not been tried yet.
303 |
304 | ### Text Models
305 |
306 | ALBERT
307 |
308 | BARThez
309 |
310 | BARTpho
311 |
312 | BertGeneration
313 |
314 | BertJapanese
315 |
316 | Bertweet
317 |
318 | **BLOOM** [TODO verify] Conversion error on a slicing operation.
319 |
320 | BORT
321 |
322 | ByT5
323 |
324 | CamemBERT
325 |
326 | CANINE
327 |
328 | **CodeGen** [TODO verify] Conversion error on einsum.
329 |
330 | ConvBERT
331 |
332 | CPM
333 |
334 | DeBERTa
335 |
336 | DeBERTa-v2
337 |
338 | DialoGPT
339 |
340 | DPR
341 |
342 | **ELECTRA**
343 |
344 | - ❌ ElectraForCausalLM: "AttributeError: 'list' object has no attribute 'val'" in `repeat` op. Also, `coreml_config.values_override` doesn't work to set `use_cache` to True for this model.
345 |
346 | Encoder Decoder Models
347 |
348 | ESM
349 |
350 | FlauBERT
351 |
352 | FNet
353 |
354 | **FSMT**
355 |
356 | - ❌ FSMTForConditionalGeneration. Encoder converts OK. For decoder, `Wrapper` outputs wrong size logits tensor; goes wrong somewhere in hidden states output from decoder when `return_dict=False`?
357 |
358 | Funnel Transformer
359 |
360 | GPT
361 |
362 | **GPT Neo**. [TODO verify] Gives no errors during conversion but predicts wrong results, or NaN when `use_legacy_format=True`.
363 |
364 | - GPTNeoModel
365 | - GPTNeoForCausalLM
366 | - GPTNeoForSequenceClassification
367 |
368 | GPT NeoX
369 |
370 | GPT NeoX Japanese
371 |
372 | GPT-J
373 |
374 | HerBERT
375 |
376 | I-BERT
377 |
378 | LayoutLM
379 |
380 | **LED**
381 |
382 | - ❌ LEDForConditionalGeneration: JIT trace fails with the error:
383 |
384 | ```python
385 | RuntimeError: 0INTERNAL ASSERT FAILED at "/Users/distiller/project/pytorch/torch/csrc/jit/ir/alias_analysis.cpp":607, please report a bug to PyTorch. We don't have an op for aten::constant_pad_nd but it isn't a special case. Argument types: Tensor, int[], bool,
386 | ```
387 |
388 | LiLT
389 |
390 | Longformer
391 |
392 | **LongT5**
393 |
394 | - ❌ LongT5ForConditionalGeneration: Conversion error:
395 |
396 | ```python
397 | ValueError: In op, of type not_equal, named 133, the named input `y` must have the same data type as the named input `x`. However, y has dtype fp32 whereas x has dtype int32.
398 | ```
399 |
400 | LUKE
401 |
402 | MarkupLM
403 |
404 | MBart and MBart-50
405 |
406 | MegatronBERT
407 |
408 | MegatronGPT2
409 |
410 | mLUKE
411 |
412 | MPNet
413 |
414 | **MT5**
415 |
416 | - ❌ MT5ForConditionalGeneration: Converter error "User defined pattern has more than one final operation"
417 |
418 | **NEZHA** [TODO verify] Conversion error on a slicing operation.
419 |
420 | NLLB
421 |
422 | Nyströmformer
423 |
424 | **OPT** [TODO verify] Conversion error on a slicing operation.
425 |
426 | **PEGASUS-X**
427 |
428 | - ❌ PegasusXForConditionalGeneration: "AttributeError: 'list' object has no attribute 'val'" in `pad` op. Maybe: needs `remainder` op (added recently in coremltools dev version).
429 |
430 | PhoBERT
431 |
432 | **ProphetNet**
433 |
434 | - ❌ ProphetNetForConditionalGeneration. Conversion error:
435 |
436 | ```python
437 | ValueError: Op "input.3" (op_type: clip) Input x="position_ids" expects tensor or scalar of dtype from type domain ['fp16', 'fp32'] but got tensor[1,is4273,int32]
438 | ```
439 |
440 | QDQBert
441 |
442 | RAG
443 |
444 | REALM
445 |
446 | **Reformer**
447 |
448 | - ❌ ReformerModelWithLMHead: does not have `past_key_values` but `past_buckets_states`
449 |
450 | **RemBERT**
451 |
452 | - ❌ RemBertForCausalLM. Conversion to MIL succeeds after a long time but running the model gives "Error in declaring network." When using legacy mode, the model is too large to fit into protobuf.
453 |
454 | RetriBERT
455 |
456 | T5v1.1
457 |
458 | TAPAS
459 |
460 | TAPEX
461 |
462 | Transformer XL
463 |
464 | UL2
465 |
466 | **XGLM** [TODO verify] Conversion error on a slicing operation.
467 |
468 | XLM
469 |
470 | **XLM-ProphetNet**
471 |
472 | - XLMProphetNetForConditionalGeneration: Conversion error:
473 |
474 | ```python
475 | ValueError: Op "input.3" (op_type: clip) Input x="position_ids" expects tensor or scalar of dtype from type domain ['fp16', 'fp32'] but got tensor[1,is4506,int32]
476 | ```
477 |
478 | XLM-RoBERTa
479 |
480 | XLM-RoBERTa-XL
481 |
482 | **XLNet** [TODO verify] Conversion error.
483 |
484 | YOSO
485 |
486 | ### Vision Models
487 |
488 | Conditional DETR
489 |
490 | Deformable DETR
491 |
492 | DeiT
493 |
494 | **DETR** [TODO verify] The conversion completes without errors but the Core ML compiler cannot load the model. "Invalid operation output name: got 'tensor' when expecting token of type 'ID'"
495 |
496 | DiT
497 |
498 | DPT
499 |
500 | GLPN
501 |
502 | ImageGPT
503 |
504 | MaskFormer
505 |
506 | PoolFormer
507 |
508 | RegNet
509 |
510 | ResNet
511 |
512 | **Swin Transformer** [TODO verify] The PyTorch graph contains unsupported operations: remainder, roll, adaptive_avg_pool1d. (Some of these may be supported in latest dev version.)
513 |
514 | Swin Transformer V2
515 |
516 | VAN
517 |
518 | VideoMAE
519 |
520 | ViTMAE
521 |
522 | ViTMSN
523 |
524 | ### Audio Models
525 |
526 | **Hubert** [TODO verify] Unsupported op for `nn.GroupNorm` (should be possible to solve), invalid broadcasting operations (will be harder to solve), and most likely additional issues.
527 |
528 | MCTCT
529 |
530 | **SEW** [TODO verify] Unsupported op for `nn.GroupNorm` (should be possible to solve), invalid broadcasting operations (will be harder to solve), and most likely additional issues.
531 |
532 | SEW-D
533 |
534 | **Speech2Text** [TODO verify] The "glu" op is not supported by coremltools. Should be possible to solve by defining a `@register_torch_op` function. (Update: should be supported in dev version now.)
535 |
536 | Speech2Text2
537 |
538 | **UniSpeech** [TODO verify] Missing op for `_weight_norm` (possible to work around), also same Core ML compiler error as DETR.
539 |
540 | UniSpeech-SAT
541 |
542 | **Wav2Vec2** [TODO verify] Unsupported op for `nn.GroupNorm` (should be possible to solve), invalid broadcasting operations (will be harder to solve), and most likely additional issues.
543 |
544 | Wav2Vec2-Conformer
545 |
546 | Wav2Vec2Phoneme
547 |
548 | **WavLM** [TODO verify] Missing ops for `_weight_norm`, `add_`, `full_like`.
549 |
550 | Whisper
551 |
552 | XLS-R
553 |
554 | XLSR-Wav2Vec2
555 |
556 | ### Multimodal Models
557 |
558 | CLIP
559 |
560 | Donut
561 |
562 | FLAVA
563 |
564 | **GroupViT** [TODO verify] Conversion issue with `scatter_along_axis` operation.
565 |
566 | LayoutLMV2
567 |
568 | LayoutLMV3
569 |
570 | LayoutXLM
571 |
572 | LXMERT
573 |
574 | OWL-ViT
575 |
576 | Perceiver
577 |
578 | Speech Encoder Decoder Models
579 |
580 | TrOCR
581 |
582 | ViLT
583 |
584 | Vision Encoder Decoder Models
585 |
586 | Vision Text Dual Encoder
587 |
588 | VisualBERT
589 |
590 | X-CLIP
591 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
16 |
17 | # 🤗 Exporters
18 |
19 | 👷 **WORK IN PROGRESS** 👷
20 |
21 | This package lets you export 🤗 Transformers models to Core ML.
22 |
23 | > For converting models to TFLite, we recommend using [Optimum](https://huggingface.co/docs/optimum/exporters/tflite/usage_guides/export_a_model).
24 |
25 | ## When to use 🤗 Exporters
26 |
27 | 🤗 Transformers models are implemented in PyTorch, TensorFlow, or JAX. However, for deployment you might want to use a different framework such as Core ML. This library makes it easy to convert Transformers models to this format.
28 |
29 | The aim of the Exporters package is to be more convenient than writing your own conversion script with *coremltools* and to be tightly integrated with the 🤗 Transformers library and the Hugging Face Hub.
30 |
31 | For an even more convenient approach, `Exporters` powers a [no-code transformers to Core ML conversion Space](https://huggingface.co/spaces/huggingface-projects/transformers-to-coreml). You can try it out without installing anything to check whether the model you are interested in can be converted. If conversion succeeds, the converted Core ML weights will be pushed to the Hub. For additional flexibility and details about the conversion process, please read on.
32 |
33 | Note: Keep in mind that Transformer models are usually quite large and are not always suitable for use on mobile devices. It might be a good idea to [optimize the model for inference](https://github.com/huggingface/optimum) first using 🤗 Optimum.
34 |
35 | ## Installation
36 |
37 | Clone this repo:
38 |
39 | ```bash
40 | $ git clone https://github.com/huggingface/exporters.git
41 | ```
42 |
43 | Install it as a Python package:
44 |
45 | ```bash
46 | $ cd exporters
47 | $ pip install -e .
48 | ```
49 |
50 | All done!
51 |
52 | Note: The Core ML exporter can be used from Linux but macOS is recommended.
53 |
54 | ## Core ML
55 |
56 | [Core ML](https://developer.apple.com/machine-learning/core-ml/) is Apple's software library for fast on-device model inference with neural networks and other types of machine learning models. It can be used on macOS, iOS, tvOS, and watchOS, and is optimized for using the CPU, GPU, and Apple Neural Engine. Although the Core ML framework is proprietary, the Core ML file format is an open format.
57 |
58 | The Core ML exporter uses [coremltools](https://coremltools.readme.io/docs) to perform the conversion from PyTorch or TensorFlow to Core ML.
59 |
60 | The `exporters.coreml` package enables you to convert model checkpoints to a Core ML model by leveraging configuration objects. These configuration objects come ready-made for a number of model architectures, and are designed to be easily extendable to other architectures.
61 |
62 | Ready-made configurations include the following architectures:
63 |
64 | - BEiT
65 | - BERT
66 | - ConvNeXT
67 | - CTRL
68 | - CvT
69 | - DistilBERT
70 | - DistilGPT2
71 | - GPT2
72 | - LeViT
73 | - MobileBERT
74 | - MobileViT
75 | - SegFormer
76 | - SqueezeBERT
77 | - Vision Transformer (ViT)
78 | - YOLOS
79 |
80 |
81 |
82 | [See here](MODELS.md) for a complete list of supported models.
83 |
84 | ### Exporting a model to Core ML
85 |
86 |
95 |
96 | The `exporters.coreml` package can be used as a Python module from the command line. To export a checkpoint using a ready-made configuration, do the following:
97 |
98 | ```bash
99 | python -m exporters.coreml --model=distilbert-base-uncased exported/
100 | ```
101 |
102 | This exports a Core ML version of the checkpoint defined by the `--model` argument. In this example it is `distilbert-base-uncased`, but it can be any checkpoint on the Hugging Face Hub or one that's stored locally.
103 |
104 | The resulting Core ML file will be saved to the `exported` directory as `Model.mlpackage`. Instead of a directory you can specify a filename, such as `DistilBERT.mlpackage`.
105 |
106 | It's normal for the conversion process to output many warning messages and other logging information. You can safely ignore these. If all went well, the export should conclude with the following logs:
107 |
108 | ```bash
109 | Validating Core ML model...
110 | -[✓] Core ML model output names match reference model ({'last_hidden_state'})
111 | - Validating Core ML model output "last_hidden_state":
112 | -[✓] (1, 128, 768) matches (1, 128, 768)
113 | -[✓] all values close (atol: 0.0001)
114 | All good, model saved at: exported/Model.mlpackage
115 | ```
116 |
117 | Note: While it is possible to export models to Core ML on Linux, the validation step will only be performed on Mac, as it requires the Core ML framework to run the model.
118 |
119 | The resulting file is `Model.mlpackage`. This file can be added to an Xcode project and be loaded into a macOS or iOS app.
120 |
121 | The exported Core ML models use the **mlpackage** format with the **ML Program** model type. This format was introduced in 2021 and requires at least iOS 15, macOS 12.0, and Xcode 13. We prefer to use this format as it is the future of Core ML. The Core ML exporter can also make models in the older `.mlmodel` format, but this is not recommended.
122 |
123 | The process is identical for TensorFlow checkpoints on the Hub. For example, you can export a pure TensorFlow checkpoint from the [Keras organization](https://huggingface.co/keras-io) as follows:
124 |
125 | ```bash
126 | python -m exporters.coreml --model=keras-io/transformers-qa exported/
127 | ```
128 |
129 | To export a model that's stored locally, you'll need to have the model's weights and tokenizer files stored in a directory. For example, we can load and save a checkpoint as follows:
130 |
131 | ```python
132 | >>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
133 |
134 | >>> # Load tokenizer and PyTorch weights form the Hub
135 | >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
136 | >>> pt_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
137 | >>> # Save to disk
138 | >>> tokenizer.save_pretrained("local-pt-checkpoint")
139 | >>> pt_model.save_pretrained("local-pt-checkpoint")
140 | ```
141 |
142 | Once the checkpoint is saved, you can export it to Core ML by pointing the `--model` argument to the directory holding the checkpoint files:
143 |
144 | ```bash
145 | python -m exporters.coreml --model=local-pt-checkpoint exported/
146 | ```
147 |
148 |
151 |
152 | ### Selecting features for different model topologies
153 |
154 | Each ready-made configuration comes with a set of _features_ that enable you to export models for different types of topologies or tasks. As shown in the table below, each feature is associated with a different auto class:
155 |
156 | | Feature | Auto Class |
157 | | -------------------------------------------- | ------------------------------------ |
158 | | `default`, `default-with-past` | `AutoModel` |
159 | | `causal-lm`, `causal-lm-with-past` | `AutoModelForCausalLM` |
160 | | `ctc` | `AutoModelForCTC` |
161 | | `image-classification` | `AutoModelForImageClassification` |
162 | | `masked-im` | `AutoModelForMaskedImageModeling` |
163 | | `masked-lm` | `AutoModelForMaskedLM` |
164 | | `multiple-choice` | `AutoModelForMultipleChoice` |
165 | | `next-sentence-prediction` | `AutoModelForNextSentencePrediction` |
166 | | `object-detection` | `AutoModelForObjectDetection` |
167 | | `question-answering` | `AutoModelForQuestionAnswering` |
168 | | `semantic-segmentation` | `AutoModelForSemanticSegmentation` |
169 | | `seq2seq-lm`, `seq2seq-lm-with-past` | `AutoModelForSeq2SeqLM` |
170 | | `sequence-classification` | `AutoModelForSequenceClassification` |
171 | | `speech-seq2seq`, `speech-seq2seq-with-past` | `AutoModelForSpeechSeq2Seq` |
172 | | `token-classification` | `AutoModelForTokenClassification` |
173 |
174 | For each configuration, you can find the list of supported features via the `FeaturesManager`. For example, for DistilBERT we have:
175 |
176 | ```python
177 | >>> from exporters.coreml.features import FeaturesManager
178 |
179 | >>> distilbert_features = list(FeaturesManager.get_supported_features_for_model_type("distilbert").keys())
180 | >>> print(distilbert_features)
181 | ['default', 'masked-lm', 'multiple-choice', 'question-answering', 'sequence-classification', 'token-classification']
182 | ```
183 |
184 | You can then pass one of these features to the `--feature` argument in the `exporters.coreml` package. For example, to export a text-classification model we can pick a fine-tuned model from the Hub and run:
185 |
186 | ```bash
187 | python -m exporters.coreml --model=distilbert-base-uncased-finetuned-sst-2-english \
188 | --feature=sequence-classification exported/
189 | ```
190 |
191 | which will display the following logs:
192 |
193 | ```bash
194 | Validating Core ML model...
195 | - Core ML model is classifier, validating output
196 | -[✓] predicted class NEGATIVE matches NEGATIVE
197 | -[✓] number of classes 2 matches 2
198 | -[✓] all values close (atol: 0.0001)
199 | All good, model saved at: exported/Model.mlpackage
200 | ```
201 |
202 | Notice that in this case, the exported model is a Core ML classifier, which predicts the highest scoring class name in addition to a dictionary of probabilities, instead of the `last_hidden_state` we saw with the `distilbert-base-uncased` checkpoint earlier. This is expected since the fine-tuned model has a sequence classification head.
203 |
204 |
205 |
206 | The features that have a `with-past` suffix (e.g. `causal-lm-with-past`) correspond to model topologies with precomputed hidden states (key and values in the attention blocks) that can be used for fast autoregressive decoding.
207 |
208 |
209 |
210 | ### Configuring the export options
211 |
212 | To see the full list of possible options, run the following from the command line:
213 |
214 | ```bash
215 | python -m exporters.coreml --help
216 | ```
217 |
218 | Exporting a model requires at least these arguments:
219 |
220 | - `-m `: The model ID from the Hugging Face Hub, or a local path to load the model from.
221 | - `--feature `: The task the model should perform, for example `"image-classification"`. See the table above for possible task names.
222 | - `