├── .gitignore ├── DESIGN_NOTES.md ├── LICENSE ├── MODELS.md ├── README.md ├── setup.py ├── src └── exporters │ ├── __init__.py │ ├── coreml │ ├── __init__.py │ ├── __main__.py │ ├── config.py │ ├── convert.py │ ├── features.py │ ├── models.py │ └── validate.py │ └── utils │ ├── __init__.py │ └── logging.py └── tests ├── __init__.py ├── test_coreml.py └── testing_utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | 131 | # .lock 132 | *.lock 133 | 134 | # DS_Store (MacOS) 135 | .DS_Store 136 | -------------------------------------------------------------------------------- /DESIGN_NOTES.md: -------------------------------------------------------------------------------- 1 | # Design notes for Core ML exporters 2 | 3 | The design of the Core ML exporter for 🤗 Transformers is based on that of the ONNX exporter. Both are used in the same manner and in some places the code is very similar. However, there are also differences due to the way Core ML works. This file documents the decisions that went into building the Core ML exporter. 4 | 5 | ## Philosophy 6 | 7 | An important goal of Core ML is to make using models completely hands-off. For example, if a model requires an image as input, you can simply give it an image object without having to preprocess the image first. And if the model is a classifier, the output is the winning class label instead of a logits tensor. The Core ML exporter will add extra operations to the beginning and end of the model where possible, so that users of these models do not have to do their own pre- and postprocessing if Core ML can already handle this for them. 8 | 9 | The Core ML exporter is built on top of `coremltools`. This library first converts the PyTorch or TensorFlow model into an intermediate representation known as MIL, then performs optimizations on the MIL graph, and finally serializes the result into a `.mlmodel` or `.mlpackage` file (the latter being the preferred format). 10 | 11 | Design of the exporter: 12 | 13 | - The Core ML conversion process is described by a `CoreMLConfig` object, analogous to `OnnxConfig`. 14 | 15 | - In order to distinguish between the `default` task for text models and vision models, the config object must have a `modality` property. Unfortunately, there is no way determine the modality from the `AutoModel` object, so this property must be set in the `CoreMLConfig` subclass. 16 | 17 | - The standard `CoreMLConfig` object already chooses appropriate input and output descriptions for most models. Only models that do something different, for example use BGR input images instead of RGB, need to have their own config object. 18 | 19 | - If a user wants to change properties of the inputs or outputs (name, description, sequence length, other settings), they have to subclass the `XYZCoreMLConfig` object and override these methods. Not very convenient, but it's also not something people will need to do a lot — and if they do, it means we made the wrong default choice. 20 | 21 | - Where possible, the behavior of the converted model is described by the tokenizer or feature extractor. For example, to use a different input image size, the user would need to create the feature extractor with those settings and use that during the conversion instead of the default feature extractor. 22 | 23 | - The `FeaturesManager` code is copied from `transformers.onnx.features` with minimal changes to the logic, only the table with supported models is different (and using `CoreMLConfig` instead of `OnnxConfig`). 24 | 25 | Extra stuff the Core ML exporter does: 26 | 27 | - For image inputs, mean/std normalization is performed by the Core ML model. Resizing and cropping the image still needs to be done by the user but is usually left to other Apple frameworks such as Vision. 28 | 29 | - Tensor inputs may have a different datatype. Specifically, `bool` and `int64` are converted to `int32` inputs, as that is the only integer datatype Core ML can handle. 30 | 31 | - Classifier models that output a single prediction for each input example are treated as special by Core ML. These models have two outputs: one with the class label of the best prediction, and another with a dictionary giving the probabilities for all the classes. 32 | 33 | - Models that perform classification but do not fit into Core ML's definition of a classifier, for example a semantic segmentation model, have the list of class names added to the model's metadata. Core ML ignores these class names but they can be retrieved by writing a few lines of Swift code. 34 | 35 | - Because the goal is to make the converted models as convenient as possible for users, any model that predicts `logits` has the option of applying a softmax, to output probabilities instead of logits. This option is enabled by default for such models. For image segmentation models, there can be two operations inserted: upsampling to the image's original spatial dimensions, followed by an argmax to select the class index for each pixel. 36 | 37 | - The exporter may add extra metadata to allow making predictions from Xcode's model previewer. 38 | 39 | - Quantization and other optimizations can automatically be applied by `coremltools`, and therefore are part of the Core ML exporting workflow. The user can always make additional changes to the Core ML afterwards using `coremltools`, such as renaming the inputs and outputs, applying quantization, etc. 40 | 41 | Note: Tokenizers are not a built-in feature of Core ML. A model that requires tokenized input must be tokenized by the user themselves. This is outside the scope of the Core ML exporter. 42 | 43 | ## Supported tasks 44 | 45 | The Core ML exporter supports most of the tasks that the ONNX exporter supports, except for: 46 | 47 | - `image-segmentation` / `AutoModelForImageSegmentation` 48 | 49 | Tasks that the Core ML exporter supports but the ONNX exporter currently doesn't: 50 | 51 | - `next-sentence-prediction` 52 | - `semantic-segmentation` 53 | 54 | Tasks that neither of them support right now: 55 | 56 | - `AutoModelForAudioClassification` 57 | - `AutoModelForAudioFrameClassification` 58 | - `AutoModelForAudioXVector` 59 | - `AutoModelForCTC` 60 | - `AutoModelForInstanceSegmentation` 61 | - `AutoModelForPreTraining` 62 | - `AutoModelForSpeechSeq2Seq` 63 | - `AutoModelForTableQuestionAnswering` 64 | - `AutoModelForVideoClassification` 65 | - `AutoModelForVision2Seq` 66 | - `AutoModelForVisualQuestionAnswering` 67 | - `...DoubleHeadsModel` 68 | - `...ForImageClassificationWithTeacher` 69 | 70 | Tasks that could be improved: 71 | 72 | - `object-detection`. If a Core ML model outputs the predicted bounding boxes in a certain manner, the user does not have to do any decoding and can directly use these outputs in their app (through the Vision framework). Currently, the Core ML exporter does not add this extra functionality. 73 | 74 | ## Missing features 75 | 76 | The following are not supported yet but would be useful to add: 77 | 78 | - Flexible input sizes. Core ML models typically work with fixed input dimensions, but it also supports flexible image sizes and tensor shapes. The exporter currently supports flexible sequence lengths, but not image sizes. 79 | 80 | - Note: Certain models, notably BERT, currently give conversion errors with a flexible sequence length. This appears to be an issue with coremltools. 81 | 82 | - More quantization options. coremltools 6 adds new quantization options for ML Program models, plus options for sparsifying weights. 83 | 84 | - `validate_model_outputs`: If the model supports a flexible input sequence length, run the test three times: once with the maximum length (that's what happens now), once with the minimum length, and once with a length in between (possibly randomly chosen). 85 | 86 | There are certain models that cannot be converted because of the way they are structured, or due to limitations and bugs in coremltools. Sometimes these can be fixed by making changes to the Transformers code, by implementing missing ops, or by filing bugs against coremltools. Trying to get as many Transformers models to export without issues is a work in progress. 87 | 88 | ### `-with-past` versions for seq2seq models 89 | 90 | The encoder portion of the model is easy: this does not have a `past_key_values` option, so this is always converted with `use_past=False`. 91 | 92 | When the decoder is used with `use_cache=True`, it needs to accept a `past_key_values` tensor that consists of a 4-tuple for each layer with the key/value for the decoder but also the key/value for the encoder. The decoder and encoder tensors have different shapes because they have different sequence lengths. 93 | 94 | The encoder past key/values only need to be computed once, on the first iteration, and then they're simply re-used by the model on subsequent iterations. The decoder past key/values tensors grow in size with each iteration. 95 | 96 | Handling the decoder past key/values tensors in Core ML is not a problem. On the first iteration, you can pass in a tensor with a shape of `(batch, num_layers, 0, num_heads)` or just leave out this tensor completely as it is marked optional. The model returns a new past key/values tensor and you simply pass that in on the next iteration. 97 | 98 | This does not work for the encoder key/values. Core ML cannot perform branching logic in the model (not entirely true but its branching operation involves running a submodel and is rather complicated) and so the JIT trace must always choose one of the paths. 99 | 100 | What this means is: If we specify dummy encoder key/value inputs during the JIT trace, then the cross-attention layer will not perform the `k_proj` and `v_proj` operations on the encoder's hidden state outputs. 101 | 102 | In `BartAttention` that is these lines: 103 | 104 | ```python 105 | if is_cross_attention and past_key_value is not None: 106 | # reuse k,v, cross_attentions 107 | key_states = past_key_value[0] 108 | value_states = past_key_value[1] 109 | elif is_cross_attention: 110 | # cross_attentions 111 | key_states = self._shape(self.k_proj(key_value_states), -1, bsz) 112 | value_states = self._shape(self.v_proj(key_value_states), -1, bsz) 113 | elif past_key_value is not None: 114 | ... 115 | ``` 116 | 117 | Here, `past_key_value` is the encoder key/values tensors and `key_value_states` is the encoder's last hidden state. The Core ML model can only include one of these branches, not both. 118 | 119 | If during the JIT trace we pass in dummy tensors for the encoder key/value tensors, then the first branch is taken and `k_proj` and `v_proj` are never executed. The problem is that we need those projection operations to happen on the very first iteration. 120 | 121 | In theory, we could solve this by never using the encoder key/values tensors, so that the second branch is always taken. This is less efficient, since it involves performing the same linear layers over and over, but at least it will work. 122 | 123 | However, this workaround fails when an encoder attention mask is provided. In `BartDecoderLayer` the following happens: 124 | 125 | ```python 126 | cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None 127 | ``` 128 | 129 | Since the `past_key_value` tensor is now a 2-tuple instead of a 4-tuple (since we're no longer providing the encoder key/values), the expression `past_key_value[-2:]` will attempt to use the decoder key/values tensors for the cross attention. It should use the tensors at indices 2 and 3, but because the tuple only has two tensors in it now, this will use indices 0 and 1 — which are not the correct tensors! 130 | 131 | Since the key/values from indices 0,1 have the target sequence length from the decoder, the encoder's `attention_mask` cannot be applied. 132 | 133 | And even if we don't use this attention mask, what happens is incorrect anyway. The second branch will still never be taken (as `cross_attn_past_key_value` is not None) and `k_proj` and `v_proj` are never executed. 134 | 135 | I currently don't see a solution to this except perhaps rewriting the decoder layer to do the following instead, but that requires changing a lot of source files in `transformers` and is a suboptimal solution anyway. 136 | 137 | ```python 138 | cross_attn_past_key_value = past_key_value[-2:] if (past_key_value is not None and len(past_key_value) > 2) else None 139 | ``` 140 | 141 | We could also export two versions of the decoder model: one for the first iteration and one for the remaining iterations but that's not great either. 142 | 143 | ## Assumptions made by the exporter 144 | 145 | The Core ML exporter needs to make certain assumptions about the Transformers models. These are: 146 | 147 | - A vision `AutoModel` is expected to output hidden states. If there is a second output, this is assumed to be from the pooling layer. 148 | 149 | - The input size for a vision model is given by the feature extractor's `crop_size` property if it exists and `do_center_crop` is true, or otherwise by its `size` property. 150 | 151 | - The image normalization for a vision model is given by the feature extractor's `image_std` and `image_mean` if it has those, otherwise assume `std = 1/255` and `mean = 0`. 152 | 153 | - The `masked-im` task expects a `bool_masked_pos` tensor as the second input. If `bool_masked_pos` is provided, some of these models return the loss value and others don't. If more than one tensor is returned, we assume the first one is the loss and ignore it. 154 | 155 | - If text models have two inputs, the second one is the `attention_mask`. If they have three inputs, the third is `token_type_ids`. 156 | 157 | - The `object-detection` task outputs logits and boxes (only tested with YOLOS so far). 158 | 159 | - If bicubic resizing is used, it gets replaced by bilinear since Core ML doesn't support bicubic. This has a noticeable effect on the predictions, but usually the model is still usable. 160 | 161 | ## Other remarks 162 | 163 | - Just as in the ONNX exporter, the `validate_model_outputs()` function takes an `atol` argument for the absolute tolerance. It might be more appropriate to do this test as `max(abs(coreml - reference)) / max(abs(reference))` to get an error measurement that's relative to the magnitude of the values in the output tensors. 164 | 165 | - Image classifier models have the usual `classLabel` and `probabilities` outputs, but also a "hidden" `var_xxx` output with the softmax results. This appears to be a minor bug in the converter; it doesn't hurt anything to keep this extra output. 166 | 167 | ## Running the tests 168 | 169 | The unit tests attempt to convert all supported models, and verify that their output is close to that of the original models. This can be very slow! These tests require a Mac. 170 | 171 | ``` 172 | $ cd exporters 173 | $ RUN_SLOW=1 pytest tests/test_coreml.py --capture=sys -W ignore 174 | ``` 175 | 176 | The `--capture=sys` and `-W ignore` arguments are used to suppress the coremltools progress bars and other messages. 177 | 178 | Tip: After running the tests, go into `/private/var/folders/...` and remove all the `.mlpackage` and `.mlmodel` files, as well as the `com.apple.MetalPerformanceShadersGraph` directory. coremtools leaves a lot of junk here that can quickly eat up your local storage space. 179 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /MODELS.md: -------------------------------------------------------------------------------- 1 | 16 | 17 | # Models that are / aren't supported by 🤗 Exporters 18 | 19 | Only models that have a `ModelNameCoreMLConfig` object are currently supported. 20 | 21 | If a model is not supported, this is either because there is some problem with the actual conversion process, or because we simply did not get around to writing a `CoreMLConfig` object for it. 22 | 23 | ## Supported models 24 | 25 | Legend: 26 | 27 | - ✅ = fully supported 28 | - 😓 = works but with hacks 29 | - ⚠️ = partially supported (for example no "with past" version) 30 | - ❌ = errors during conversion 31 | - ➖ = not supported 32 | - ? = unknown 33 | 34 | ### Text Models 35 | 36 | **BART** 37 | 38 | - ⚠️ BartModel (currently supports only `use_past=False`) 39 | - ✅ BartForCausalLM 40 | - ⚠️ BartForConditionalGeneration (currently supports only `use_past=False`) 41 | - ? BartForQuestionAnswering 42 | - ? BartForSequenceClassification 43 | 44 | **BERT** 45 | 46 | - ✅ BertModel 47 | - ➖ BertForPreTraining 48 | - ✅ BertForMaskedLM 49 | - ✅ BertForMultipleChoice 50 | - ✅ BertForNextSentencePrediction 51 | - ✅ BertForQuestionAnswering 52 | - ✅ BertForSequenceClassification 53 | - ✅ BertForTokenClassification 54 | - ⚠️ BertLMHeadModel: works OK with coremltools commit 50c5569, breaks with later versions 55 | 56 | **BigBird** 57 | 58 | - ? BigBirdModel 59 | - ➖ BigBirdForPreTraining 60 | - ⚠️ BigBirdForCausalLM: works OK with coremltools commit 50c5569, breaks with later versions 61 | - ? BigBirdForMaskedLM 62 | - ? BigBirdForMultipleChoice 63 | - ? BigBirdForQuestionAnswering 64 | - ? BigBirdForSequenceClassification 65 | - ? BigBirdForTokenClassification 66 | 67 | **BigBirdPegasus** 68 | 69 | - ⚠️ BigBirdPegasusModel (currently supports only `use_past=False`) 70 | - ✅ BigBirdPegasusForCausalLM 71 | - ⚠️ BigBirdPegasusForConditionalGeneration (currently supports only `use_past=False`) 72 | - ? BigBirdPegasusForQuestionAnswering 73 | - ? BigBirdPegasusForSequenceClassification 74 | 75 | **Blenderbot** 76 | 77 | - ⚠️ BlenderbotModel (currently supports only `use_past=False`) 78 | - ? BlenderbotForCausalLM 79 | - ⚠️ BlenderbotForConditionalGeneration (currently supports only `use_past=False`) 80 | 81 | **Blenderbot Small** 82 | 83 | - ⚠️ BlenderbotSmallModel (currently supports only `use_past=False`) 84 | - ? BlenderbotSmallForCausalLM 85 | - ⚠️ BlenderbotSmallForConditionalGeneration (currently supports only `use_past=False`) 86 | 87 | **CTRL** 88 | 89 | - ✅ CTRLModel 90 | - ✅ CTRLLMHeadModel 91 | - ✅ CTRLForSequenceClassification 92 | 93 | **DistilBERT** 94 | 95 | - ✅ DistilBertModel 96 | - ✅ DistilBertForMaskedLM 97 | - ✅ DistilBertForMultipleChoice 98 | - ✅ DistilBertForQuestionAnswering 99 | - ✅ DistilBertForSequenceClassification 100 | - ✅ DistilBertForTokenClassification 101 | 102 | **ERNIE** 103 | 104 | - ? ErnieModel 105 | - ➖ ErnieForPreTraining 106 | - ⚠️ ErnieForCausalLM: works OK with coremltools commit 50c5569, breaks with later versions 107 | - ? ErnieForMaskedLM 108 | - ? ErnieForMultipleChoice 109 | - ? ErnieForNextSentencePrediction 110 | - ? ErnieForQuestionAnswering 111 | - ? ErnieForSequenceClassification 112 | - ? ErnieForTokenClassification 113 | 114 | **GPT2 / DistilGPT2** 115 | 116 | Does not work with flexible sequence length and therefore does not support `use_past`. 117 | 118 | - ✅ GPT2Model 119 | - ➖ GPT2DoubleHeadsModel 120 | - ✅ GPT2ForSequenceClassification 121 | - ✅ GPT2ForTokenClassification 122 | - ⚠️ GPT2LMHeadModel (no `use_past`) 123 | 124 | **Llama** 125 | 126 | - ✅ LlamaForCausalLM 127 | 128 | **M2M100** 129 | 130 | - ⚠️ M2M100Model (currently supports only `use_past=False`) 131 | - ⚠️ M2M100ForConditionalGeneration (currently supports only `use_past=False`) 132 | 133 | **MarianMT** 134 | 135 | - ⚠️ MarianModel (currently supports only `use_past=False`) 136 | - ? MarianForCausalLM 137 | - ⚠️ MarianMTModel (currently supports only `use_past=False`) 138 | 139 | **Mistral** 140 | 141 | - ✅ MistralForCausalLM 142 | 143 | **MobileBERT** 144 | 145 | - ✅ MobileBertModel 146 | - ➖ MobileBertForPreTraining 147 | - ✅ MobileBertForMaskedLM 148 | - ✅ MobileBertForMultipleChoice 149 | - ✅ MobileBertForNextSentencePrediction 150 | - ✅ MobileBertForQuestionAnswering 151 | - ✅ MobileBertForSequenceClassification 152 | - ✅ MobileBertForTokenClassification 153 | 154 | **MVP** 155 | 156 | - ⚠️ MvpModel (currently supports only `use_past=False`) 157 | - ? MvpForCausalLM 158 | - ⚠️ MvpForConditionalGeneration (currently supports only `use_past=False`) 159 | - ? MvpForSequenceClassification 160 | - ? MvpForQuestionAnswering 161 | 162 | **Pegasus** 163 | 164 | - ⚠️ PegasusModel (currently supports only `use_past=False`) 165 | - ? PegasusForCausalLM 166 | - ⚠️ PegasusForConditionalGeneration (currently supports only `use_past=False`) 167 | 168 | **PLBart** 169 | 170 | - ⚠️ PLBartModel (currently supports only `use_past=False`) 171 | - ? PLBartForCausalLM 172 | - ⚠️ PLBartForConditionalGeneration (currently supports only `use_past=False`) 173 | - ? PLBartForSequenceClassification 174 | 175 | **RoBERTa** 176 | 177 | - ? RobertaModel 178 | - ⚠️ RobertaForCausalLM: works OK with coremltools commit 50c5569, breaks with later versions 179 | - ? RobertaForMaskedLM 180 | - ? RobertaForMultipleChoice 181 | - ? RobertaForQuestionAnswering 182 | - ? RobertaForSequenceClassification 183 | - ? RobertaForTokenClassification 184 | 185 | **RoFormer** 186 | 187 | - ? RoFormerModel 188 | - ❌ RoFormerForCausalLM: Conversion may appear to work but the model does not actually run. Core ML takes forever to load the model, allocates 100+ GB of RAM and eventually crashes. 189 | - ? RoFormerForMaskedLM 190 | - ? RoFormerForSequenceClassification 191 | - ? RoFormerForMultipleChoice 192 | - ? RoFormerForTokenClassification 193 | - ? RoFormerForQuestionAnswering 194 | 195 | **Splinter** 196 | 197 | - ❌ SplinterModel: Conversion may appear to work but the model does not actually run. Core ML takes forever to load the model, allocates 100+ GB of RAM and eventually crashes. 198 | - ➖ SplinterForPreTraining 199 | - SplinterForQuestionAnswering 200 | 201 | **SqueezeBERT** 202 | 203 | - ✅ SqueezeBertModel 204 | - ✅ SqueezeBertForMaskedLM 205 | - ✅ SqueezeBertForMultipleChoice 206 | - ✅ SqueezeBertForQuestionAnswering 207 | - ✅ SqueezeBertForSequenceClassification 208 | - ✅ SqueezeBertForTokenClassification 209 | 210 | **T5** 211 | 212 | - ⚠️ T5Model (currently supports only `use_past=False`) 213 | - ✅ T5EncoderModel 214 | - ⚠️ T5ForConditionalGeneration (currently supports only `use_past=False`) 215 | 216 | ### Vision Models 217 | 218 | **BEiT** 219 | 220 | - ✅ BeitModel 221 | - ✅ BeitForImageClassification 222 | - ✅ BeitForSemanticSegmentation 223 | - ✅ BeitForMaskedImageModeling. Note: this model does not work with AutoModelForMaskedImageModeling and therefore the conversion script cannot load it, but converting from Python is supported. 224 | 225 | **ConvNeXT** 226 | 227 | - ✅ ConvNextModel 228 | - ✅ ConvNextForImageClassification 229 | 230 | **CvT** 231 | 232 | - ✅ CvtModel 233 | - ✅ CvtForImageClassification 234 | 235 | **LeViT** 236 | 237 | - ✅ LevitModel 238 | - ✅ LevitForImageClassification 239 | - ➖ LevitForImageClassificationWithTeacher 240 | 241 | **MobileViT** 242 | 243 | - ✅ MobileViTModel 244 | - ✅ MobileViTForImageClassification 245 | - ✅ MobileViTForSemanticSegmentation 246 | 247 | **MobileViTv2** 248 | 249 | - ✅ MobileViTV2Model 250 | - ✅ MobileViTV2ForImageClassification 251 | - ✅ MobileViTV2ForSemanticSegmentation 252 | 253 | **SegFormer** 254 | 255 | - ✅ SegformerModel 256 | - ✅ SegformerForImageClassification 257 | - ✅ SegformerForSemanticSegmentation 258 | 259 | **Vision Transformer (ViT)** 260 | 261 | - ✅ ViTModel 262 | - ✅ ViTForMaskedImageModeling 263 | - ✅ ViTForImageClassification 264 | 265 | **YOLOS** 266 | 267 | - ✅ YolosModel 268 | - ✅ YolosForObjectDetection 269 | 270 | ### Audio Models 271 | 272 | None 273 | 274 | ### Multimodal Models 275 | 276 | **Data2Vec Audio** 277 | 278 | - ? Data2VecAudioModel: [TODO verify] The conversion completes without errors but the Core ML compiler cannot load the model. 279 | - ? Data2VecAudioForAudioFrameClassification 280 | - ? Data2VecAudioForCTC 281 | - ? Data2VecAudioForSequenceClassification 282 | - ? Data2VecAudioForXVector 283 | 284 | **Data2Vec Text** 285 | 286 | - ? Data2VecTextModel 287 | - ⚠️ Data2VecTextForCausalLM: works OK with coremltools commit 50c5569, breaks with later versions 288 | - ? Data2VecTextForMaskedLM 289 | - ? Data2VecTextForMultipleChoice 290 | - ? Data2VecTextForQuestionAnswering 291 | - ? Data2VecTextForSequenceClassification 292 | - ? Data2VecTextForTokenClassification 293 | 294 | **Data2Vec Vision** 295 | 296 | - ? Data2VecVisionModel 297 | - ? Data2VecVisionForImageClassification 298 | - ? Data2VecVisionForSemanticSegmentation 299 | 300 | ## Models that currently don't work 301 | 302 | The following models are known to give errors when attempting conversion to Core ML format, or simply have not been tried yet. 303 | 304 | ### Text Models 305 | 306 | ALBERT 307 | 308 | BARThez 309 | 310 | BARTpho 311 | 312 | BertGeneration 313 | 314 | BertJapanese 315 | 316 | Bertweet 317 | 318 | **BLOOM** [TODO verify] Conversion error on a slicing operation. 319 | 320 | BORT 321 | 322 | ByT5 323 | 324 | CamemBERT 325 | 326 | CANINE 327 | 328 | **CodeGen** [TODO verify] Conversion error on einsum. 329 | 330 | ConvBERT 331 | 332 | CPM 333 | 334 | DeBERTa 335 | 336 | DeBERTa-v2 337 | 338 | DialoGPT 339 | 340 | DPR 341 | 342 | **ELECTRA** 343 | 344 | - ❌ ElectraForCausalLM: "AttributeError: 'list' object has no attribute 'val'" in `repeat` op. Also, `coreml_config.values_override` doesn't work to set `use_cache` to True for this model. 345 | 346 | Encoder Decoder Models 347 | 348 | ESM 349 | 350 | FlauBERT 351 | 352 | FNet 353 | 354 | **FSMT** 355 | 356 | - ❌ FSMTForConditionalGeneration. Encoder converts OK. For decoder, `Wrapper` outputs wrong size logits tensor; goes wrong somewhere in hidden states output from decoder when `return_dict=False`? 357 | 358 | Funnel Transformer 359 | 360 | GPT 361 | 362 | **GPT Neo**. [TODO verify] Gives no errors during conversion but predicts wrong results, or NaN when `use_legacy_format=True`. 363 | 364 | - GPTNeoModel 365 | - GPTNeoForCausalLM 366 | - GPTNeoForSequenceClassification 367 | 368 | GPT NeoX 369 | 370 | GPT NeoX Japanese 371 | 372 | GPT-J 373 | 374 | HerBERT 375 | 376 | I-BERT 377 | 378 | LayoutLM 379 | 380 | **LED** 381 | 382 | - ❌ LEDForConditionalGeneration: JIT trace fails with the error: 383 | 384 | ```python 385 | RuntimeError: 0INTERNAL ASSERT FAILED at "/Users/distiller/project/pytorch/torch/csrc/jit/ir/alias_analysis.cpp":607, please report a bug to PyTorch. We don't have an op for aten::constant_pad_nd but it isn't a special case. Argument types: Tensor, int[], bool, 386 | ``` 387 | 388 | LiLT 389 | 390 | Longformer 391 | 392 | **LongT5** 393 | 394 | - ❌ LongT5ForConditionalGeneration: Conversion error: 395 | 396 | ```python 397 | ValueError: In op, of type not_equal, named 133, the named input `y` must have the same data type as the named input `x`. However, y has dtype fp32 whereas x has dtype int32. 398 | ``` 399 | 400 | LUKE 401 | 402 | MarkupLM 403 | 404 | MBart and MBart-50 405 | 406 | MegatronBERT 407 | 408 | MegatronGPT2 409 | 410 | mLUKE 411 | 412 | MPNet 413 | 414 | **MT5** 415 | 416 | - ❌ MT5ForConditionalGeneration: Converter error "User defined pattern has more than one final operation" 417 | 418 | **NEZHA** [TODO verify] Conversion error on a slicing operation. 419 | 420 | NLLB 421 | 422 | Nyströmformer 423 | 424 | **OPT** [TODO verify] Conversion error on a slicing operation. 425 | 426 | **PEGASUS-X** 427 | 428 | - ❌ PegasusXForConditionalGeneration: "AttributeError: 'list' object has no attribute 'val'" in `pad` op. Maybe: needs `remainder` op (added recently in coremltools dev version). 429 | 430 | PhoBERT 431 | 432 | **ProphetNet** 433 | 434 | - ❌ ProphetNetForConditionalGeneration. Conversion error: 435 | 436 | ```python 437 | ValueError: Op "input.3" (op_type: clip) Input x="position_ids" expects tensor or scalar of dtype from type domain ['fp16', 'fp32'] but got tensor[1,is4273,int32] 438 | ``` 439 | 440 | QDQBert 441 | 442 | RAG 443 | 444 | REALM 445 | 446 | **Reformer** 447 | 448 | - ❌ ReformerModelWithLMHead: does not have `past_key_values` but `past_buckets_states` 449 | 450 | **RemBERT** 451 | 452 | - ❌ RemBertForCausalLM. Conversion to MIL succeeds after a long time but running the model gives "Error in declaring network." When using legacy mode, the model is too large to fit into protobuf. 453 | 454 | RetriBERT 455 | 456 | T5v1.1 457 | 458 | TAPAS 459 | 460 | TAPEX 461 | 462 | Transformer XL 463 | 464 | UL2 465 | 466 | **XGLM** [TODO verify] Conversion error on a slicing operation. 467 | 468 | XLM 469 | 470 | **XLM-ProphetNet** 471 | 472 | - XLMProphetNetForConditionalGeneration: Conversion error: 473 | 474 | ```python 475 | ValueError: Op "input.3" (op_type: clip) Input x="position_ids" expects tensor or scalar of dtype from type domain ['fp16', 'fp32'] but got tensor[1,is4506,int32] 476 | ``` 477 | 478 | XLM-RoBERTa 479 | 480 | XLM-RoBERTa-XL 481 | 482 | **XLNet** [TODO verify] Conversion error. 483 | 484 | YOSO 485 | 486 | ### Vision Models 487 | 488 | Conditional DETR 489 | 490 | Deformable DETR 491 | 492 | DeiT 493 | 494 | **DETR** [TODO verify] The conversion completes without errors but the Core ML compiler cannot load the model. "Invalid operation output name: got 'tensor' when expecting token of type 'ID'" 495 | 496 | DiT 497 | 498 | DPT 499 | 500 | GLPN 501 | 502 | ImageGPT 503 | 504 | MaskFormer 505 | 506 | PoolFormer 507 | 508 | RegNet 509 | 510 | ResNet 511 | 512 | **Swin Transformer** [TODO verify] The PyTorch graph contains unsupported operations: remainder, roll, adaptive_avg_pool1d. (Some of these may be supported in latest dev version.) 513 | 514 | Swin Transformer V2 515 | 516 | VAN 517 | 518 | VideoMAE 519 | 520 | ViTMAE 521 | 522 | ViTMSN 523 | 524 | ### Audio Models 525 | 526 | **Hubert** [TODO verify] Unsupported op for `nn.GroupNorm` (should be possible to solve), invalid broadcasting operations (will be harder to solve), and most likely additional issues. 527 | 528 | MCTCT 529 | 530 | **SEW** [TODO verify] Unsupported op for `nn.GroupNorm` (should be possible to solve), invalid broadcasting operations (will be harder to solve), and most likely additional issues. 531 | 532 | SEW-D 533 | 534 | **Speech2Text** [TODO verify] The "glu" op is not supported by coremltools. Should be possible to solve by defining a `@register_torch_op` function. (Update: should be supported in dev version now.) 535 | 536 | Speech2Text2 537 | 538 | **UniSpeech** [TODO verify] Missing op for `_weight_norm` (possible to work around), also same Core ML compiler error as DETR. 539 | 540 | UniSpeech-SAT 541 | 542 | **Wav2Vec2** [TODO verify] Unsupported op for `nn.GroupNorm` (should be possible to solve), invalid broadcasting operations (will be harder to solve), and most likely additional issues. 543 | 544 | Wav2Vec2-Conformer 545 | 546 | Wav2Vec2Phoneme 547 | 548 | **WavLM** [TODO verify] Missing ops for `_weight_norm`, `add_`, `full_like`. 549 | 550 | Whisper 551 | 552 | XLS-R 553 | 554 | XLSR-Wav2Vec2 555 | 556 | ### Multimodal Models 557 | 558 | CLIP 559 | 560 | Donut 561 | 562 | FLAVA 563 | 564 | **GroupViT** [TODO verify] Conversion issue with `scatter_along_axis` operation. 565 | 566 | LayoutLMV2 567 | 568 | LayoutLMV3 569 | 570 | LayoutXLM 571 | 572 | LXMERT 573 | 574 | OWL-ViT 575 | 576 | Perceiver 577 | 578 | Speech Encoder Decoder Models 579 | 580 | TrOCR 581 | 582 | ViLT 583 | 584 | Vision Encoder Decoder Models 585 | 586 | Vision Text Dual Encoder 587 | 588 | VisualBERT 589 | 590 | X-CLIP 591 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 16 | 17 | # 🤗 Exporters 18 | 19 | 👷 **WORK IN PROGRESS** 👷 20 | 21 | This package lets you export 🤗 Transformers models to Core ML. 22 | 23 | > For converting models to TFLite, we recommend using [Optimum](https://huggingface.co/docs/optimum/exporters/tflite/usage_guides/export_a_model). 24 | 25 | ## When to use 🤗 Exporters 26 | 27 | 🤗 Transformers models are implemented in PyTorch, TensorFlow, or JAX. However, for deployment you might want to use a different framework such as Core ML. This library makes it easy to convert Transformers models to this format. 28 | 29 | The aim of the Exporters package is to be more convenient than writing your own conversion script with *coremltools* and to be tightly integrated with the 🤗 Transformers library and the Hugging Face Hub. 30 | 31 | For an even more convenient approach, `Exporters` powers a [no-code transformers to Core ML conversion Space](https://huggingface.co/spaces/huggingface-projects/transformers-to-coreml). You can try it out without installing anything to check whether the model you are interested in can be converted. If conversion succeeds, the converted Core ML weights will be pushed to the Hub. For additional flexibility and details about the conversion process, please read on. 32 | 33 | Note: Keep in mind that Transformer models are usually quite large and are not always suitable for use on mobile devices. It might be a good idea to [optimize the model for inference](https://github.com/huggingface/optimum) first using 🤗 Optimum. 34 | 35 | ## Installation 36 | 37 | Clone this repo: 38 | 39 | ```bash 40 | $ git clone https://github.com/huggingface/exporters.git 41 | ``` 42 | 43 | Install it as a Python package: 44 | 45 | ```bash 46 | $ cd exporters 47 | $ pip install -e . 48 | ``` 49 | 50 | All done! 51 | 52 | Note: The Core ML exporter can be used from Linux but macOS is recommended. 53 | 54 | ## Core ML 55 | 56 | [Core ML](https://developer.apple.com/machine-learning/core-ml/) is Apple's software library for fast on-device model inference with neural networks and other types of machine learning models. It can be used on macOS, iOS, tvOS, and watchOS, and is optimized for using the CPU, GPU, and Apple Neural Engine. Although the Core ML framework is proprietary, the Core ML file format is an open format. 57 | 58 | The Core ML exporter uses [coremltools](https://coremltools.readme.io/docs) to perform the conversion from PyTorch or TensorFlow to Core ML. 59 | 60 | The `exporters.coreml` package enables you to convert model checkpoints to a Core ML model by leveraging configuration objects. These configuration objects come ready-made for a number of model architectures, and are designed to be easily extendable to other architectures. 61 | 62 | Ready-made configurations include the following architectures: 63 | 64 | - BEiT 65 | - BERT 66 | - ConvNeXT 67 | - CTRL 68 | - CvT 69 | - DistilBERT 70 | - DistilGPT2 71 | - GPT2 72 | - LeViT 73 | - MobileBERT 74 | - MobileViT 75 | - SegFormer 76 | - SqueezeBERT 77 | - Vision Transformer (ViT) 78 | - YOLOS 79 | 80 | 81 | 82 | [See here](MODELS.md) for a complete list of supported models. 83 | 84 | ### Exporting a model to Core ML 85 | 86 | 95 | 96 | The `exporters.coreml` package can be used as a Python module from the command line. To export a checkpoint using a ready-made configuration, do the following: 97 | 98 | ```bash 99 | python -m exporters.coreml --model=distilbert-base-uncased exported/ 100 | ``` 101 | 102 | This exports a Core ML version of the checkpoint defined by the `--model` argument. In this example it is `distilbert-base-uncased`, but it can be any checkpoint on the Hugging Face Hub or one that's stored locally. 103 | 104 | The resulting Core ML file will be saved to the `exported` directory as `Model.mlpackage`. Instead of a directory you can specify a filename, such as `DistilBERT.mlpackage`. 105 | 106 | It's normal for the conversion process to output many warning messages and other logging information. You can safely ignore these. If all went well, the export should conclude with the following logs: 107 | 108 | ```bash 109 | Validating Core ML model... 110 | -[✓] Core ML model output names match reference model ({'last_hidden_state'}) 111 | - Validating Core ML model output "last_hidden_state": 112 | -[✓] (1, 128, 768) matches (1, 128, 768) 113 | -[✓] all values close (atol: 0.0001) 114 | All good, model saved at: exported/Model.mlpackage 115 | ``` 116 | 117 | Note: While it is possible to export models to Core ML on Linux, the validation step will only be performed on Mac, as it requires the Core ML framework to run the model. 118 | 119 | The resulting file is `Model.mlpackage`. This file can be added to an Xcode project and be loaded into a macOS or iOS app. 120 | 121 | The exported Core ML models use the **mlpackage** format with the **ML Program** model type. This format was introduced in 2021 and requires at least iOS 15, macOS 12.0, and Xcode 13. We prefer to use this format as it is the future of Core ML. The Core ML exporter can also make models in the older `.mlmodel` format, but this is not recommended. 122 | 123 | The process is identical for TensorFlow checkpoints on the Hub. For example, you can export a pure TensorFlow checkpoint from the [Keras organization](https://huggingface.co/keras-io) as follows: 124 | 125 | ```bash 126 | python -m exporters.coreml --model=keras-io/transformers-qa exported/ 127 | ``` 128 | 129 | To export a model that's stored locally, you'll need to have the model's weights and tokenizer files stored in a directory. For example, we can load and save a checkpoint as follows: 130 | 131 | ```python 132 | >>> from transformers import AutoTokenizer, AutoModelForSequenceClassification 133 | 134 | >>> # Load tokenizer and PyTorch weights form the Hub 135 | >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") 136 | >>> pt_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") 137 | >>> # Save to disk 138 | >>> tokenizer.save_pretrained("local-pt-checkpoint") 139 | >>> pt_model.save_pretrained("local-pt-checkpoint") 140 | ``` 141 | 142 | Once the checkpoint is saved, you can export it to Core ML by pointing the `--model` argument to the directory holding the checkpoint files: 143 | 144 | ```bash 145 | python -m exporters.coreml --model=local-pt-checkpoint exported/ 146 | ``` 147 | 148 | 151 | 152 | ### Selecting features for different model topologies 153 | 154 | Each ready-made configuration comes with a set of _features_ that enable you to export models for different types of topologies or tasks. As shown in the table below, each feature is associated with a different auto class: 155 | 156 | | Feature | Auto Class | 157 | | -------------------------------------------- | ------------------------------------ | 158 | | `default`, `default-with-past` | `AutoModel` | 159 | | `causal-lm`, `causal-lm-with-past` | `AutoModelForCausalLM` | 160 | | `ctc` | `AutoModelForCTC` | 161 | | `image-classification` | `AutoModelForImageClassification` | 162 | | `masked-im` | `AutoModelForMaskedImageModeling` | 163 | | `masked-lm` | `AutoModelForMaskedLM` | 164 | | `multiple-choice` | `AutoModelForMultipleChoice` | 165 | | `next-sentence-prediction` | `AutoModelForNextSentencePrediction` | 166 | | `object-detection` | `AutoModelForObjectDetection` | 167 | | `question-answering` | `AutoModelForQuestionAnswering` | 168 | | `semantic-segmentation` | `AutoModelForSemanticSegmentation` | 169 | | `seq2seq-lm`, `seq2seq-lm-with-past` | `AutoModelForSeq2SeqLM` | 170 | | `sequence-classification` | `AutoModelForSequenceClassification` | 171 | | `speech-seq2seq`, `speech-seq2seq-with-past` | `AutoModelForSpeechSeq2Seq` | 172 | | `token-classification` | `AutoModelForTokenClassification` | 173 | 174 | For each configuration, you can find the list of supported features via the `FeaturesManager`. For example, for DistilBERT we have: 175 | 176 | ```python 177 | >>> from exporters.coreml.features import FeaturesManager 178 | 179 | >>> distilbert_features = list(FeaturesManager.get_supported_features_for_model_type("distilbert").keys()) 180 | >>> print(distilbert_features) 181 | ['default', 'masked-lm', 'multiple-choice', 'question-answering', 'sequence-classification', 'token-classification'] 182 | ``` 183 | 184 | You can then pass one of these features to the `--feature` argument in the `exporters.coreml` package. For example, to export a text-classification model we can pick a fine-tuned model from the Hub and run: 185 | 186 | ```bash 187 | python -m exporters.coreml --model=distilbert-base-uncased-finetuned-sst-2-english \ 188 | --feature=sequence-classification exported/ 189 | ``` 190 | 191 | which will display the following logs: 192 | 193 | ```bash 194 | Validating Core ML model... 195 | - Core ML model is classifier, validating output 196 | -[✓] predicted class NEGATIVE matches NEGATIVE 197 | -[✓] number of classes 2 matches 2 198 | -[✓] all values close (atol: 0.0001) 199 | All good, model saved at: exported/Model.mlpackage 200 | ``` 201 | 202 | Notice that in this case, the exported model is a Core ML classifier, which predicts the highest scoring class name in addition to a dictionary of probabilities, instead of the `last_hidden_state` we saw with the `distilbert-base-uncased` checkpoint earlier. This is expected since the fine-tuned model has a sequence classification head. 203 | 204 | 205 | 206 | The features that have a `with-past` suffix (e.g. `causal-lm-with-past`) correspond to model topologies with precomputed hidden states (key and values in the attention blocks) that can be used for fast autoregressive decoding. 207 | 208 | 209 | 210 | ### Configuring the export options 211 | 212 | To see the full list of possible options, run the following from the command line: 213 | 214 | ```bash 215 | python -m exporters.coreml --help 216 | ``` 217 | 218 | Exporting a model requires at least these arguments: 219 | 220 | - `-m `: The model ID from the Hugging Face Hub, or a local path to load the model from. 221 | - `--feature `: The task the model should perform, for example `"image-classification"`. See the table above for possible task names. 222 | - ``: The path where to store the generated Core ML model. 223 | 224 | The output path can be a folder, in which case the file will be named `Model.mlpackage`, or you can also specify the filename directly. 225 | 226 | Additional arguments that can be provided: 227 | 228 | - `--preprocessor `: Which type of preprocessor to use. `auto` tries to automatically detect it. Possible values are: `auto` (the default), `tokenizer`, `feature_extractor`, `processor`. 229 | - `--atol `: The absolute difference tolerence used when validating the model. The default value is 1e-4. 230 | - `--quantize `: Whether to quantize the model weights. The possible quantization options are: `float32` for no quantization (the default) or `float16` for 16-bit floating point. 231 | - `--compute_units `: Whether to optimize the model for CPU, GPU, and/or Neural Engine. Possible values are: `all` (the default), `cpu_and_gpu`, `cpu_only`, `cpu_and_ne`. 232 | 233 | ### Using the exported model 234 | 235 | Using the exported model in an app is just like using any other Core ML model. After adding the model to Xcode, it will auto-generate a Swift class that lets you make predictions from within the app. 236 | 237 | Depending on the chosen export options, you may still need to preprocess or postprocess the input and output tensors. 238 | 239 | For image inputs, there is no need to perform any preprocessing as the Core ML model will already normalize the pixels. For classifier models, the Core ML model will output the predictions as a dictionary of probabilities. For other models, you might need to do more work. 240 | 241 | Core ML does not have the concept of a tokenizer and so text models will still require manual tokenization of the input data. [Here is an example](https://github.com/huggingface/swift-coreml-transformers) of how to perform tokenization in Swift. 242 | 243 | ### Overriding default choices in the configuration object 244 | 245 | An important goal of Core ML is to make it easy to use the models inside apps. Where possible, the Core ML exporter will add extra operations to the model, so that you do not have to do your own pre- and postprocessing. 246 | 247 | In particular, 248 | 249 | - Image models will automatically perform pixel normalization as part of the model. You do not need to preprocess the image yourself, except potentially resizing or cropping it. 250 | 251 | - For classification models, a softmax layer is added and the labels are included in the model file. Core ML makes a distinction between classifier models and other types of neural networks. For a model that outputs a single classification prediction per input example, Core ML makes it so that the model predicts the winning class label and a dictionary of probabilities instead of a raw logits tensor. Where possible, the exporter uses this special classifier model type. 252 | 253 | - Other models predict logits but do not fit into Core ML's definition of a classifier, such as the `token-classificaton` task that outputs a prediction for each token in the sequence. Here, the exporter also adds a softmax to convert the logits into probabilities. The label names are added to the model's metadata. Core ML ignores these label names but they can be retrieved by writing a few lines of Swift code. 254 | 255 | - A `semantic-segmentation` model will upsample the output image to the original spatial dimensions and apply an argmax to obtain the predicted class label indices. It does not automatically apply a softmax. 256 | 257 | The Core ML exporter makes these choices because they are the settings you're most likely to need. To override any of the above defaults, you must create a subclass of the configuration object, and then export the model to Core ML by writing a short Python program. 258 | 259 | Example: To prevent the MobileViT semantic segmentation model from upsampling the output image, you would create a subclass of `MobileViTCoreMLConfig` and override the `outputs` property to set `do_upsample` to False. Other options you can set for this output are `do_argmax` and `do_softmax`. 260 | 261 | ```python 262 | from collections import OrderedDict 263 | from exporters.coreml.models import MobileViTCoreMLConfig 264 | from exporters.coreml.config import OutputDescription 265 | 266 | class MyCoreMLConfig(MobileViTCoreMLConfig): 267 | @property 268 | def outputs(self) -> OrderedDict[str, OutputDescription]: 269 | return OrderedDict( 270 | [ 271 | ( 272 | "logits", 273 | OutputDescription( 274 | "classLabels", 275 | "Classification scores for each pixel", 276 | do_softmax=True, 277 | do_upsample=False, 278 | do_argmax=False, 279 | ) 280 | ), 281 | ] 282 | ) 283 | 284 | config = MyCoreMLConfig(model.config, "semantic-segmentation") 285 | ``` 286 | 287 | Here you can also change the name of the output from `classLabels` to something else, or fill in the output description ("Classification scores for each pixel"). 288 | 289 | It is also possible to change the properties of the model inputs. For example, for text models the default sequence length is between 1 and 128 tokens. To set the input sequence length on a DistilBERT model to a fixed length of 32 tokens, you could override the config object as follows: 290 | 291 | ```python 292 | from collections import OrderedDict 293 | from exporters.coreml.models import DistilBertCoreMLConfig 294 | from exporters.coreml.config import InputDescription 295 | 296 | class MyCoreMLConfig(DistilBertCoreMLConfig): 297 | @property 298 | def inputs(self) -> OrderedDict[str, InputDescription]: 299 | input_descs = super().inputs 300 | input_descs["input_ids"].sequence_length = 32 301 | return input_descs 302 | 303 | config = MyCoreMLConfig(model.config, "text-classification") 304 | ``` 305 | 306 | Using a fixed sequence length generally outputs a simpler, and possibly faster, Core ML model. However, for many models the input needs to have a flexible length. In that case, specify a tuple for `sequence_length` to set the (min, max) lengths. Use (1, -1) to have no upper limit on the sequence length. (Note: if `sequence_length` is set to a fixed value, then the batch size is fixed to 1.) 307 | 308 | To find out what input and output options are available for the model you're interested in, create its `CoreMLConfig` object and examine the `config.inputs` and `config.outputs` properties. 309 | 310 | Not all inputs or outputs are always required: For text models, you may remove the `attention_mask` input. Without this input, the attention mask is always assumed to be filled with ones (no padding). However, if the task requires a `token_type_ids` input, there must also be an `attention_mask` input. 311 | 312 | Removing inputs and/or outputs is accomplished by making a subclass of `CoreMLConfig` and overriding the `inputs` and `outputs` properties. 313 | 314 | By default, a model is generated in the ML Program format. By overriding the `use_legacy_format` property to return `True`, the older NeuralNetwork format will be used. This is not recommended and only exists as a workaround for models that fail to convert to the ML Program format. 315 | 316 | Once you have the modified `config` instance, you can use it to export the model following the instructions from the section "Exporting the model" below. 317 | 318 | Not everything is described by the configuration objects. The behavior of the converted model is also determined by the model's tokenizer or feature extractor. For example, to use a different input image size, you'd create the feature extractor with different resizing or cropping settings and use that during the conversion instead of the default feature extractor. 319 | 320 | ### Exporting a model for an unsupported architecture 321 | 322 | If you wish to export a model whose architecture is not natively supported by the library, there are three main steps to follow: 323 | 324 | 1. Implement a custom Core ML configuration. 325 | 2. Export the model to Core ML. 326 | 3. Validate the outputs of the PyTorch and exported models. 327 | 328 | In this section, we'll look at how DistilBERT was implemented to show what's involved with each step. 329 | 330 | #### Implementing a custom Core ML configuration 331 | 332 | TODO: didn't write this section yet because the implementation is not done yet 333 | 334 | Let’s start with the configuration object. We provide an abstract classes that you should inherit from, `CoreMLConfig`. 335 | 336 | ```python 337 | from exporters.coreml import CoreMLConfig 338 | ``` 339 | 340 | TODO: stuff to cover here: 341 | 342 | - `modality` property 343 | - how to implement custom ops + link to coremltools documentation on this topic 344 | - decoder models (`use_past`) and encoder-decoder models (`seq2seq`) 345 | 346 | #### Exporting the model 347 | 348 | Once you have implemented the Core ML configuration, the next step is to export the model. Here we can use the `export()` function provided by the `exporters.coreml` package. This function expects the Core ML configuration, along with the base model and tokenizer (for text models) or feature extractor (for vision models): 349 | 350 | ```python 351 | from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer 352 | from exporters.coreml import export 353 | from exporters.coreml.models import DistilBertCoreMLConfig 354 | 355 | model_ckpt = "distilbert-base-uncased" 356 | base_model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, torchscript=True) 357 | preprocessor = AutoTokenizer.from_pretrained(model_ckpt) 358 | 359 | coreml_config = DistilBertCoreMLConfig(base_model.config, task="text-classification") 360 | mlmodel = export(preprocessor, base_model, coreml_config) 361 | ``` 362 | 363 | Note: For the best results, pass the argument `torchscript=True` to `from_pretrained` when loading the model. This allows the model to configure itself for PyTorch tracing, which is needed for the Core ML conversion. 364 | 365 | Additional options that can be passed into `export()`: 366 | 367 | - `quantize`: Use `"float32"` for no quantization (the default), `"float16"` to quantize the weights to 16-bit floats. 368 | - `compute_units`: Whether to optimize the model for CPU, GPU, and/or Neural Engine. Defaults to `coremltools.ComputeUnit.ALL`. 369 | 370 | To export the model with precomputed hidden states (key and values in the attention blocks) for fast autoregressive decoding, pass the argument `use_past=True` when creating the `CoreMLConfig` object. 371 | 372 | It is normal for the Core ML exporter to print out a lot of warning and information messages. In particular, you might see messages such as these: 373 | 374 | > TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! 375 | 376 | Those messages are to be expected and are a normal part of the conversion process. If there is a real problem, the converter will throw an error. 377 | 378 | If the export succeeded, the return value from `export()` is a `coremltools.models.MLModel` object. Write `print(mlmodel)` to examine the Core ML model's inputs, outputs, and metadata. 379 | 380 | Optionally fill in the model's metadata: 381 | 382 | ```python 383 | mlmodel.short_description = "Your awesome model" 384 | mlmodel.author = "Your name" 385 | mlmodel.license = "Fill in the copyright information here" 386 | mlmodel.version = "1.0" 387 | ``` 388 | 389 | Finally, save the model. You can open the resulting **mlpackage** file in Xcode and examine it there. 390 | 391 | ```python 392 | mlmodel.save("DistilBert.mlpackage") 393 | ``` 394 | 395 | Note: If the configuration object used returns `True` from `use_legacy_format`, the model can be saved as `ModelName.mlmodel` instead of `.mlpackage`. 396 | 397 | #### Exporting a decoder model 398 | 399 | Decoder-based models can use a `past_key_values` input that ontains pre-computed hidden-states (key and values in the self-attention blocks), which allows for much faster sequential decoding. This feature is enabled by passing `use_cache=True` to the Transformer model. 400 | 401 | To enable this feature with the Core ML exporter, set the `use_past=True` argument when creating the `CoreMLConfig` object: 402 | 403 | ```python 404 | coreml_config = CTRLCoreMLConfig(base_model.config, task="text-generation", use_past=True) 405 | 406 | # or: 407 | coreml_config = CTRLCoreMLConfig.with_past(base_model.config, task="text-generation") 408 | ``` 409 | 410 | This adds multiple new inputs and outputs to the model with names such as `past_key_values_0_key`, `past_key_values_0_value`, ... (inputs) and `present_key_values_0_key`, `present_key_values_0_value`, ... (outputs). 411 | 412 | Enabling this option makes the model less convenient to use, since you will have to keep track of many additional tensors, but it does make inference much faster on sequences. 413 | 414 | The Transformers model must be loaded with `is_decoder=True`, for example: 415 | 416 | ```python 417 | base_model = BigBirdForCausalLM.from_pretrained("google/bigbird-roberta-base", torchscript=True, is_decoder=True) 418 | ``` 419 | 420 | TODO: Example of how to use this in Core ML. The `past_key_values` tensors will grow larger over time. The `attention_mask` tensor must have the size of `past_key_values` plus new `input_ids`. 421 | 422 | #### Exporting an encoder-decoder model 423 | 424 | TODO: properly write this section 425 | 426 | You'll need to export the model as two separate Core ML models: the encoder and the decoder. 427 | 428 | Export the model like so: 429 | 430 | ```python 431 | coreml_config = TODOCoreMLConfig(base_model.config, task="text2text-generation", seq2seq="encoder") 432 | encoder_mlmodel = export(preprocessor, base_model.get_encoder(), coreml_config) 433 | 434 | coreml_config = TODOCoreMLConfig(base_model.config, task="text2text-generation", seq2seq="decoder") 435 | decoder_mlmodel = export(preprocessor, base_model, coreml_config) 436 | ``` 437 | 438 | When the `seq2seq` option is used, the sequence length in the Core ML model is always unbounded. The `sequence_length` specified in the configuration object is ignored. 439 | 440 | This can also be combined with `use_past=True`. TODO: explain how to use this. 441 | 442 | #### Validating the model outputs 443 | 444 | The final step is to validate that the outputs from the base and exported model agree within some absolute tolerance. You can use the `validate_model_outputs()` function provided by the `exporters.coreml` package as follows. 445 | 446 | First enable logging: 447 | 448 | ```python 449 | from exporters.utils import logging 450 | logger = logging.get_logger("exporters.coreml") 451 | logger.setLevel(logging.INFO) 452 | ``` 453 | 454 | Then validate the model: 455 | 456 | ```python 457 | from exporters.coreml import validate_model_outputs 458 | 459 | validate_model_outputs( 460 | coreml_config, preprocessor, base_model, mlmodel, coreml_config.atol_for_validation 461 | ) 462 | ``` 463 | 464 | Note: `validate_model_outputs` only works on Mac computers, as it depends on the Core ML framework to make predictions with the model. 465 | 466 | This function uses the `CoreMLConfig.generate_dummy_inputs()` method to generate inputs for the base and exported model, and the absolute tolerance can be defined in the configuration. We generally find numerical agreement in the 1e-6 to 1e-4 range, although anything smaller than 1e-3 is likely to be OK. 467 | 468 | If validation fails with an error such as the following, it doesn't necessarily mean the model is broken: 469 | 470 | > ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.12345 471 | 472 | The comparison is done using an absolute difference value, which in this example is 0.12345. That is much larger than the default tolerance value of 1e-4, hence the reported error. However, the magnitude of the activations also matters. For a model whose activations are on the order of 1e+3, a maximum absolute difference of 0.12345 would usually be acceptable. 473 | 474 | If validation fails with this error and you're not entirely sure if this is a true problem, call `mlmodel.predict()` on a dummy input tensor and look at the largest absolute magnitude in the output tensor. 475 | 476 | ### Contributing a new configuration to 🤗 Transformers 477 | 478 | We are looking to expand the set of ready-made configurations and welcome contributions from the community! If you would like to contribute your addition to the library, you will need to: 479 | 480 | * Implement the Core ML configuration in the `models.py` file 481 | * Include the model architecture and corresponding features in [`~coreml.features.FeatureManager`] 482 | * Add your model architecture to the tests in `test_coreml.py` 483 | 484 | ### Troubleshooting: What if Core ML Exporters doesn't work for your model? 485 | 486 | It's possible that the model you wish to export fails to convert using Core ML Exporters or even when you try to use `coremltools` directly. When running these automated conversion tools, it's quite possible the conversion bails out with an inscrutable error message. Or, the conversion may appear to succeed but the model does not work or produces incorrect outputs. 487 | 488 | The most common reasons for conversion errors are: 489 | 490 | - You provided incorrect arguments to the converter. The `task` argument should match the chosen model architecture. For example, the `"feature-extraction"` task should only be used with models of type `AutoModel`, not `AutoModelForXYZ`. Additionally, the `seq2seq` argument is required to tell apart encoder-decoder type models from encoder-only or decoder-only models. Passing invalid choices for these arguments may give an error during the conversion process or it may create a model that works but does the wrong thing. 491 | 492 | - The model performs an operation that is not supported by Core ML or coremltools. It's also possible coremltools has a bug or can't handle particularly complex models. 493 | 494 | If the Core ML export fails due to the latter, you have a couple of options: 495 | 496 | 1. Implement the missing operator in the `CoreMLConfig`'s `patch_pytorch_ops()` function. 497 | 498 | 2. Fix the original model. This requires a deep understanding of how the model works and is not trivial. However, sometimes the fix is to hardcode certain values rather than letting PyTorch or TensorFlow calculate them from the shapes of tensors. 499 | 500 | 3. Fix coremltools. It is sometimes possible to hack coremltools so that it ignores the issue. 501 | 502 | 4. Forget about automated conversion and [build the model from scratch using MIL](https://coremltools.readme.io/docs/model-intermediate-language). This is the intermediate language that coremltools uses internally to represent models. It's similar in many ways to PyTorch. 503 | 504 | 5. Submit an issue and we'll see what we can do. 😀 505 | 506 | ### Known issues 507 | 508 | The Core ML exporter writes models in the **mlpackage** format. Unfortunately, for some models the generated ML Program is incorrect, in which case it's recommended to convert the model to the older NeuralNetwork format by setting the configuration object's `use_legacy_format` property to `True`. On certain hardware, the older format may also run more efficiently. If you're not sure which one to use, export the model twice and compare the two versions. 509 | 510 | Known models that need to be exported with `use_legacy_format=True` are: GPT2, DistilGPT2. 511 | 512 | Using flexible input sequence length with GPT2 or GPT-Neo causes the converter to be extremely slow and allocate over 200 GB of RAM. This is clearly a bug in coremltools or the Core ML framework, as the allocated memory is never used (the computer won't start swapping). After many minutes, the conversion does succeed, but the model may not be 100% correct. Loading the model afterwards takes a very long time and makes similar memory allocations. Likewise for making predictions. While theoretically the conversion succeeds (if you have enough patience), the model is not really usable like this. 513 | 514 | ## Pushing the model to the Hugging Face Hub 515 | 516 | The [Hugging Face Hub](https://huggingface.co) can also host your Core ML models. You can use the [`huggingface_hub` package](https://huggingface.co/docs/huggingface_hub/main/en/index) to upload the converted model to the Hub from Python. 517 | 518 | First log in to your Hugging Face account account with the following command: 519 | 520 | ```bash 521 | huggingface-cli login 522 | ``` 523 | 524 | Once you are logged in, save the **mlpackage** to the Hub as follows: 525 | 526 | ```python 527 | from huggingface_hub import Repository 528 | 529 | with Repository( 530 | "", clone_from="https://huggingface.co//", 531 | use_auth_token=True).commit(commit_message="add Core ML model"): 532 | mlmodel.save(".mlpackage") 533 | ``` 534 | 535 | Make sure to replace `` with the name of the model and `` with your Hugging Face username. 536 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | setup( 4 | name="exporters", 5 | version="0.0.1", 6 | description="Core ML exporter for Hugging Face Transformers", 7 | long_description="", 8 | author="The HuggingFace team", 9 | author_email="matthijs@huggingface.co", 10 | url="https://github.com/huggingface/exporters", 11 | package_dir={"": "src"}, 12 | packages=find_packages("src"), 13 | include_package_data=True, 14 | python_requires=">=3.8.0", 15 | install_requires=[ 16 | "transformers >= 4.30.0", 17 | "coremltools >= 7", 18 | ], 19 | classifiers=[ 20 | ], 21 | license="Apache", 22 | ) 23 | -------------------------------------------------------------------------------- /src/exporters/__init__.py: -------------------------------------------------------------------------------- 1 | __version__ = "0.0.1" 2 | -------------------------------------------------------------------------------- /src/exporters/coreml/__init__.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2022 The HuggingFace Team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | """Core ML conversion for Hugging Face Transformers models.""" 16 | 17 | from .config import CoreMLConfig 18 | from .convert import export 19 | from .validate import validate_model_outputs 20 | -------------------------------------------------------------------------------- /src/exporters/coreml/__main__.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2021-2022 The HuggingFace Team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import warnings 17 | 18 | from argparse import ArgumentParser 19 | from pathlib import Path 20 | 21 | from coremltools import ComputeUnit 22 | from coremltools.models import MLModel 23 | from coremltools.models.utils import _is_macos, _macos_version 24 | 25 | from transformers.models.auto import AutoFeatureExtractor, AutoProcessor, AutoTokenizer 26 | from transformers.onnx.utils import get_preprocessor 27 | 28 | from .convert import export 29 | from .features import FeaturesManager 30 | from .validate import validate_model_outputs 31 | from ..utils import logging 32 | 33 | 34 | def convert_model(preprocessor, model, model_coreml_config, args, use_past=False, seq2seq=None): 35 | coreml_config = model_coreml_config(model.config, use_past=use_past, seq2seq=seq2seq) 36 | 37 | compute_units = ComputeUnit.ALL 38 | if args.compute_units == "cpu_and_gpu": 39 | compute_units = ComputeUnit.CPU_AND_GPU 40 | elif args.compute_units == "cpu_only": 41 | compute_units = ComputeUnit.CPU_ONLY 42 | elif args.compute_units == "cpu_and_ne": 43 | compute_units = ComputeUnit.CPU_AND_NE 44 | 45 | mlmodel = export( 46 | preprocessor, 47 | model, 48 | coreml_config, 49 | quantize=args.quantize, 50 | compute_units=compute_units, 51 | ) 52 | 53 | filename = args.output 54 | if seq2seq == "encoder": 55 | filename = filename.parent / ("encoder_" + filename.name) 56 | elif seq2seq == "decoder": 57 | filename = filename.parent / ("decoder_" + filename.name) 58 | filename = filename.as_posix() 59 | 60 | mlmodel.save(filename) 61 | 62 | if args.atol is None: 63 | args.atol = coreml_config.atol_for_validation 64 | 65 | if not _is_macos() or _macos_version() < (12, 0): 66 | logger.info("Skipping model validation, requires macOS 12.0 or later") 67 | else: 68 | # Run validation on CPU 69 | mlmodel = MLModel(filename, compute_units=ComputeUnit.CPU_ONLY) 70 | validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol) 71 | 72 | logger.info(f"All good, model saved at: {filename}") 73 | 74 | 75 | def main(): 76 | parser = ArgumentParser("Hugging Face Transformers Core ML exporter") 77 | parser.add_argument( 78 | "-m", "--model", type=str, required=True, help="Model ID on huggingface.co or path on disk to load model from." 79 | ) 80 | parser.add_argument( 81 | "--feature", 82 | choices=list(FeaturesManager.AVAILABLE_FEATURES_INCLUDING_LEGACY), 83 | default="feature-extraction", 84 | help="The type of features to export the model with.", 85 | ) 86 | parser.add_argument( 87 | "--atol", type=float, default=None, help="Absolute difference tolerence when validating the model." 88 | ) 89 | parser.add_argument( 90 | "--use_past", action="store_true", help="Export the model with precomputed hidden states (key and values in the attention blocks) for fast autoregressive decoding." 91 | ) 92 | parser.add_argument( 93 | "--framework", type=str, choices=["pt", "tf"], default="pt", help="The framework to use for the Core ML export." 94 | ) 95 | parser.add_argument( 96 | "--quantize", type=str, choices=["float32", "float16"], default="float16", help="Quantization option for the model weights." 97 | ) 98 | parser.add_argument( 99 | "--compute_units", type=str, choices=["all", "cpu_and_gpu", "cpu_only", "cpu_and_ne"], default="all", help="Optimize the model for CPU, GPU, and/or Neural Engine." 100 | ) 101 | # parser.add_argument("--cache_dir", type=str, default=None, help="Path indicating where to store cache.") 102 | parser.add_argument( 103 | "--preprocessor", 104 | type=str, 105 | choices=["auto", "tokenizer", "feature_extractor", "processor"], 106 | default="auto", 107 | help="Which type of preprocessor to use. 'auto' tries to automatically detect it.", 108 | ) 109 | parser.add_argument("output", type=Path, help="Path indicating where to store generated Core ML model.") 110 | 111 | args = parser.parse_args() 112 | 113 | if (not args.output.is_file()) and (args.output.suffix not in [".mlpackage", ".mlmodel"]): 114 | args.output = args.output.joinpath("Model.mlpackage") 115 | if not args.output.parent.exists(): 116 | args.output.parent.mkdir(parents=True) 117 | 118 | # Instantiate the appropriate preprocessor 119 | if args.preprocessor == "auto": 120 | preprocessor = get_preprocessor(args.model) 121 | elif args.preprocessor == "tokenizer": 122 | preprocessor = AutoTokenizer.from_pretrained(args.model) 123 | elif args.preprocessor == "feature_extractor": 124 | preprocessor = AutoFeatureExtractor.from_pretrained(args.model) 125 | elif args.preprocessor == "processor": 126 | preprocessor = AutoProcessor.from_pretrained(args.model) 127 | else: 128 | raise ValueError(f"Unknown preprocessor type '{args.preprocessor}'") 129 | 130 | # Support legacy task names in CLI only 131 | feature = args.feature 132 | args.feature = FeaturesManager.map_from_synonym(args.feature) 133 | if feature != args.feature: 134 | deprecation_message = f"Feature '{feature}' is deprecated, please use '{args.feature}' instead." 135 | warnings.warn(deprecation_message, FutureWarning) 136 | 137 | # Allocate the model 138 | model = FeaturesManager.get_model_from_feature( 139 | args.feature, args.model, framework=args.framework, #cache_dir=args.cache_dir 140 | ) 141 | model_kind, model_coreml_config = FeaturesManager.check_supported_model_or_raise(model, feature=args.feature) 142 | 143 | if args.feature in ["text2text-generation", "speech-seq2seq"]: 144 | logger.info(f"Converting encoder model...") 145 | 146 | convert_model( 147 | preprocessor, 148 | model, 149 | model_coreml_config, 150 | args, 151 | use_past=False, 152 | seq2seq="encoder" 153 | ) 154 | 155 | logger.info(f"Converting decoder model...") 156 | 157 | convert_model( 158 | preprocessor, 159 | model, 160 | model_coreml_config, 161 | args, 162 | use_past=args.use_past, 163 | seq2seq="decoder" 164 | ) 165 | else: 166 | convert_model( 167 | preprocessor, 168 | model, 169 | model_coreml_config, 170 | args, 171 | use_past=args.use_past, 172 | ) 173 | 174 | 175 | if __name__ == "__main__": 176 | logger = logging.get_logger("exporters.coreml") # pylint: disable=invalid-name 177 | logger.setLevel(logging.INFO) 178 | main() 179 | -------------------------------------------------------------------------------- /src/exporters/coreml/convert.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2022 The HuggingFace Team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import json 17 | from typing import TYPE_CHECKING, List, Union, Mapping 18 | 19 | import coremltools as ct 20 | from coremltools.converters.mil.frontend.torch.torch_op_registry import _TORCH_OPS_REGISTRY 21 | 22 | import numpy as np 23 | 24 | from transformers.utils import ( 25 | TensorType, 26 | is_torch_available, 27 | is_tf_available, 28 | ) 29 | from .config import CoreMLConfig 30 | from ..utils import logging 31 | 32 | 33 | if is_torch_available(): 34 | from transformers.modeling_utils import PreTrainedModel 35 | 36 | if is_tf_available(): 37 | from transformers.modeling_tf_utils import TFPreTrainedModel 38 | 39 | if TYPE_CHECKING: 40 | from transformers.feature_extraction_utils import FeatureExtractionMixin 41 | from transformers.processing_utils import ProcessorMixin 42 | from transformers.tokenization_utils import PreTrainedTokenizer 43 | 44 | 45 | logger = logging.get_logger(__name__) # pylint: disable=invalid-name 46 | 47 | 48 | def get_output_names(spec): 49 | """Return a list of all output names in the Core ML model.""" 50 | outputs = [] 51 | for out in spec.description.output: 52 | outputs.append(out.name) 53 | return outputs 54 | 55 | 56 | def get_output_named(spec, name): 57 | """Return the output node with the given name in the Core ML model.""" 58 | for out in spec.description.output: 59 | if out.name == name: 60 | return out 61 | return None 62 | 63 | 64 | def set_multiarray_shape(node, shape): 65 | """Change the shape of the specified input or output in the Core ML model.""" 66 | del node.type.multiArrayType.shape[:] 67 | for x in shape: 68 | node.type.multiArrayType.shape.append(x) 69 | 70 | 71 | def get_labels_as_list(model): 72 | """Return the labels of a classifier model as a sorted list.""" 73 | labels = [] 74 | for i in range(len(model.config.id2label)): 75 | if i in model.config.id2label.keys(): 76 | labels.append(model.config.id2label[i]) 77 | return labels 78 | 79 | 80 | def is_image_std_same(preprocessor: "FeatureExtractionMixin") -> bool: 81 | """Is the image_std normalization the same for all color channels?""" 82 | return preprocessor.image_std[0] == preprocessor.image_std[1] == preprocessor.image_std[2] 83 | 84 | 85 | def get_shape(config, input_desc, dummy_input, axis=-1): 86 | """ 87 | Returns the ct.Shape object for the given input. 88 | """ 89 | default_shape = dummy_input[0].shape 90 | shape = list(default_shape) 91 | 92 | # Does the input shape need to be flexible? 93 | if config.use_past: 94 | #shape[0] = ct.RangeDim() # batch size #TODO 95 | shape[axis] = ct.RangeDim() 96 | default_shape = None 97 | elif isinstance(input_desc.sequence_length, tuple): 98 | min_length, max_length = input_desc.sequence_length 99 | #shape[0] = ct.RangeDim() # batch size #TODO 100 | shape[axis] = ct.RangeDim(min_length, max_length) 101 | default_shape = None 102 | 103 | return ct.Shape(shape, default=default_shape) 104 | 105 | 106 | def get_input_types( 107 | preprocessor: Union["PreTrainedTokenizer", "FeatureExtractionMixin", "ProcessorMixin"], 108 | config: CoreMLConfig, 109 | dummy_inputs: Mapping[str, np.ndarray], 110 | ) -> List[Union[ct.ImageType, ct.TensorType]]: 111 | """ 112 | Create the ct.InputType objects that describe the inputs to the Core ML model. 113 | 114 | Args: 115 | preprocessor ([`PreTrainedTokenizer`], [`FeatureExtractionMixin`] or [`ProcessorMixin`]): 116 | The preprocessor used for encoding the data. 117 | config ([`~coreml.config.CoreMLConfig`]): 118 | The Core ML configuration associated with the exported model. 119 | dummy_inputs (`Mapping[str, np.ndarray]`): 120 | The dummy input tensors that describe the expected shapes of the inputs. 121 | 122 | Returns: 123 | `List[Union[ct.ImageType, ct.TensorType]]`: ordered list of input types 124 | """ 125 | input_descs = config.inputs 126 | input_types = [] 127 | 128 | if config.modality == "text" or config.seq2seq == "decoder": 129 | if config.seq2seq == "decoder": 130 | input_ids_name = "decoder_input_ids" 131 | attention_mask_name = "decoder_attention_mask" 132 | else: 133 | input_ids_name = "input_ids" 134 | attention_mask_name = "attention_mask" 135 | 136 | input_desc = input_descs[input_ids_name] 137 | dummy_input = dummy_inputs[input_ids_name] 138 | shape = get_shape(config, input_desc, dummy_input) 139 | input_types.append( 140 | ct.TensorType(name=input_desc.name, shape=shape, dtype=np.int32) 141 | ) 142 | 143 | if attention_mask_name in input_descs: 144 | input_desc = input_descs[attention_mask_name] 145 | input_types.append( 146 | ct.TensorType(name=input_desc.name, shape=shape, dtype=np.int32) 147 | ) 148 | else: 149 | logger.info(f"Skipping {attention_mask_name} input") 150 | 151 | if "token_type_ids" in input_descs: 152 | input_desc = input_descs["token_type_ids"] 153 | input_types.append( 154 | ct.TensorType(name=input_desc.name, shape=shape, dtype=np.int32) 155 | ) 156 | else: 157 | logger.info("Skipping token_type_ids input") 158 | 159 | if "encoder_outputs" in input_descs: 160 | input_desc = input_descs["encoder_outputs"] 161 | shape = list(dummy_inputs["encoder_outputs"][0].shape) 162 | #shape[0] = ct.RangeDim() # batch size #TODO 163 | # TODO: only disable if we are using fixed shapes (which could be part of the configuration) 164 | # shape[1] = ct.RangeDim() 165 | input_types.append( 166 | ct.TensorType(name=input_desc.name, shape=ct.Shape(shape), dtype=np.float32) 167 | ) 168 | 169 | if config.seq2seq == "decoder" and "attention_mask" in input_descs: 170 | input_desc = input_descs["attention_mask"] 171 | shape = get_shape(config, input_desc, dummy_input) 172 | input_types.append( 173 | ct.TensorType(name=input_desc.name, shape=shape, dtype=np.int32) 174 | ) 175 | 176 | if config.task == "feature-extraction" and "decoder_input_ids" in input_descs: 177 | # Special case for T5 178 | input_desc = input_descs["decoder_input_ids"] 179 | shape = get_shape(config, input_desc, dummy_inputs["decoder_input_ids"]) 180 | input_types.append( 181 | ct.TensorType(name=input_desc.name, shape=shape, dtype=np.int32) 182 | ) 183 | input_desc = input_descs["decoder_attention_mask"] 184 | shape = get_shape(config, input_desc, dummy_inputs["decoder_attention_mask"]) 185 | input_types.append( 186 | ct.TensorType(name=input_desc.name, shape=shape, dtype=np.int32) 187 | ) 188 | 189 | if config.use_past: 190 | # TODO: Temporarily disabled until we can solve the issue with encoder past key/values 191 | # name = "decoder_past_key_values" if config.seq2seq == "decoder" else "past_key_values" 192 | name = "past_key_values" 193 | shape = list(dummy_inputs[f"{name}_0_key"][1].shape) 194 | #shape[0] = ct.RangeDim() # batch size #TODO 195 | shape[2] = ct.RangeDim(0, -1) 196 | shape = ct.Shape(shape) 197 | 198 | for i in range(config.num_layers): 199 | input_types.append(ct.TensorType(name=f"{name}_{i}_key", shape=shape)) 200 | input_types.append(ct.TensorType(name=f"{name}_{i}_value", shape=shape)) 201 | 202 | # TODO: Temporarily disabled until we can solve the issue with encoder past key/values 203 | # if config.seq2seq == "decoder": 204 | # name = "encoder_past_key_values" 205 | # shape = list(dummy_inputs[f"{name}_0_key"][1].shape) 206 | # #shape[0] = ct.RangeDim() # batch size #TODO 207 | # shape[2] = ct.RangeDim(0, -1) 208 | # shape = ct.Shape(shape) 209 | 210 | # for i in range(config.num_encoder_layers): 211 | # input_types.append(ct.TensorType(name=f"{name}_{i}_key", shape=shape)) 212 | # input_types.append(ct.TensorType(name=f"{name}_{i}_value", shape=shape)) 213 | 214 | 215 | elif config.modality == "vision": 216 | if hasattr(preprocessor, "image_mean"): 217 | bias = [ 218 | -preprocessor.image_mean[0], 219 | -preprocessor.image_mean[1], 220 | -preprocessor.image_mean[2], 221 | ] 222 | else: 223 | bias = [0.0, 0.0, 0.0] 224 | 225 | # If the stddev values are all equal, they can be folded into `bias` and 226 | # `scale`. If not, Wrapper will insert an additional division operation. 227 | if hasattr(preprocessor, "image_std") and is_image_std_same(preprocessor): 228 | bias[0] /= preprocessor.image_std[0] 229 | bias[1] /= preprocessor.image_std[1] 230 | bias[2] /= preprocessor.image_std[2] 231 | scale = 1.0 / (preprocessor.image_std[0] * 255.0) 232 | else: 233 | scale = 1.0 / 255 234 | 235 | input_desc = input_descs["pixel_values"] 236 | input_types.append( 237 | ct.ImageType( 238 | name=input_desc.name, 239 | shape=dummy_inputs["pixel_values"][0].shape, 240 | scale=scale, 241 | bias=bias, 242 | color_layout=input_desc.color_layout or "RGB", 243 | channel_first=True, 244 | ) 245 | ) 246 | 247 | if config.task == "masked-im": 248 | input_desc = input_descs["bool_masked_pos"] 249 | input_types.append( 250 | ct.TensorType( 251 | name=input_desc.name, 252 | shape=dummy_inputs["bool_masked_pos"][0].shape, 253 | dtype=np.int32 254 | ) 255 | ) 256 | 257 | elif config.modality == "audio": 258 | if "input_features" in input_descs: 259 | input_desc = input_descs["input_features"] 260 | dummy_input = dummy_inputs["input_features"] 261 | shape = get_shape(config, input_desc, dummy_input, axis=1) 262 | input_types.append( 263 | ct.TensorType(name=input_desc.name, shape=shape, dtype=np.int32) 264 | ) 265 | else: 266 | input_desc = input_descs["input_values"] 267 | dummy_input = dummy_inputs["input_values"] 268 | shape = get_shape(config, input_desc, dummy_input, axis=1) 269 | input_types.append( 270 | ct.TensorType(name=input_desc.name, shape=shape, dtype=np.float32) 271 | ) 272 | 273 | if "attention_mask" in input_descs: 274 | input_desc = input_descs["attention_mask"] 275 | attn_shape = list(dummy_inputs["attention_mask"][0].shape) 276 | if isinstance(shape.shape[1], ct.RangeDim): 277 | #attn_shape[0] = shape.shape[0] # batch size #TODO 278 | attn_shape[-1] = shape.shape[1] 279 | 280 | input_types.append( 281 | ct.TensorType(name=input_desc.name, shape=ct.Shape(attn_shape), dtype=np.int32) 282 | ) 283 | else: 284 | logger.info("Skipping attention_mask input") 285 | 286 | return input_types 287 | 288 | 289 | if is_torch_available(): 290 | import torch 291 | 292 | class Wrapper(torch.nn.Module): 293 | def __init__(self, preprocessor, model, config): 294 | super().__init__() 295 | self.preprocessor = preprocessor 296 | self.model = model.eval() 297 | self.config = config 298 | 299 | def forward(self, *all_inputs): 300 | remaining = len(all_inputs) 301 | inputs = all_inputs[0] 302 | 303 | # Core ML's image preprocessing does not allow a different scaling 304 | # factor for each color channel, so do this manually. 305 | if hasattr(self.preprocessor, "image_std") and not is_image_std_same(self.preprocessor): 306 | image_std = torch.tensor(self.preprocessor.image_std).reshape(1, -1, 1, 1) 307 | inputs = inputs / image_std 308 | 309 | model_kwargs = { 310 | "return_dict": False, 311 | 312 | # CoreMLConfig's values_override is supposed to do this, but not all 313 | # models look at self.config.use_cache (e.g. ElectraForCausalLM) 314 | # Can't do it here either because it doesn't work with all models! 315 | #"use_cache": self.config.use_past or self.config.seq2seq, 316 | } 317 | 318 | # Convert the past_key_values_x_key and _value inputs back into tuples, 319 | # as that is what the original model expects. 320 | # Assumes past_key_values are always the last inputs to the Wrapper. 321 | # An encoder-decoder model first gets all the decoder past_key_values 322 | # tensors, followed by the encoder ones, but they get combined into the 323 | # same 4-tuples. 324 | if self.config.use_past: 325 | # TODO: Temporarily disabled until we can solve the issue with encoder past key/values 326 | if False and self.config.seq2seq == "decoder": 327 | num_decoder_layers = self.config.num_layers 328 | num_encoder_layers = self.config.num_encoder_layers 329 | remaining -= (num_decoder_layers + num_encoder_layers) * 2 330 | past_key_values = [] 331 | for i in range(min(num_decoder_layers, num_encoder_layers)): 332 | past_key_values.append(( 333 | all_inputs[remaining + i*2], 334 | all_inputs[remaining + i*2 + 1], 335 | all_inputs[remaining + num_decoder_layers*2 + i*2], 336 | all_inputs[remaining + num_decoder_layers*2 + i*2 + 1], 337 | )) 338 | model_kwargs["past_key_values"] = past_key_values 339 | else: 340 | remaining -= self.config.num_layers * 2 341 | past_key_values = [] 342 | for i in range(self.config.num_layers): 343 | past_key_values.append(( 344 | all_inputs[remaining + i*2], 345 | all_inputs[remaining + i*2 + 1], 346 | )) 347 | model_kwargs["past_key_values"] = past_key_values 348 | 349 | if self.config.seq2seq == "decoder": 350 | model_kwargs["decoder_input_ids"] = all_inputs[0] 351 | model_kwargs["decoder_attention_mask"] = all_inputs[1] 352 | model_kwargs["encoder_outputs"] = (all_inputs[2],) 353 | if remaining >= 4: 354 | model_kwargs["attention_mask"] = all_inputs[3] 355 | elif self.config.modality == "text": 356 | if remaining >= 2: 357 | model_kwargs["attention_mask"] = all_inputs[1] 358 | if remaining >= 4: 359 | # Special case for T5 360 | model_kwargs["decoder_input_ids"] = all_inputs[2] 361 | model_kwargs["decoder_attention_mask"] = all_inputs[3] 362 | elif remaining == 3: 363 | model_kwargs["token_type_ids"] = all_inputs[2] 364 | elif self.config.modality == "vision": 365 | if self.config.task == "masked-im": 366 | model_kwargs["bool_masked_pos"] = all_inputs[1] 367 | 368 | # Run the model with the provided inputs. 369 | if self.config.seq2seq == "encoder": 370 | outputs = self.model.get_encoder()(inputs, **model_kwargs) 371 | elif self.config.seq2seq == "decoder": 372 | outputs = self.model(**model_kwargs) 373 | else: 374 | outputs = self.model(inputs, **model_kwargs) 375 | 376 | # Unpack the output `past_key_values` into a single tuple. 377 | presents = () 378 | if self.config.use_past: 379 | if len(outputs) < 2: 380 | raise ValueError("expected at least two output tensors, got one") 381 | 382 | past_key_values_index = -2 if self.config.seq2seq == "decoder" else -1 383 | past_key_values = outputs[past_key_values_index] 384 | 385 | # TODO: Temporarily disabled until we can solve the issue with encoder past key/values 386 | if False and self.config.seq2seq == "decoder": 387 | decoder_presents = () 388 | encoder_presents = () 389 | for i in range(len(past_key_values)): 390 | for j in range(2): 391 | decoder_presents = decoder_presents + (past_key_values[i][j],) 392 | encoder_presents = encoder_presents + (past_key_values[i][j + 2],) 393 | 394 | presents = decoder_presents + encoder_presents 395 | else: 396 | for i in range(len(past_key_values)): 397 | for j in range(2): 398 | presents = presents + (past_key_values[i][j],) 399 | 400 | output_descs = self.config.outputs 401 | 402 | if self.config.task == "image-classification": 403 | output_desc = output_descs["logits"] 404 | if output_desc.do_softmax: 405 | return torch.nn.functional.softmax(outputs[0], dim=1) 406 | else: 407 | return outputs[0] # logits 408 | 409 | if self.config.task == "masked-im": 410 | # Some models also return loss even if no labels provided (e.g. ViT) 411 | # so skip that output if it's present. 412 | return outputs[1] if len(outputs) >= 2 else outputs[0] # logits 413 | 414 | if self.config.seq2seq != "encoder" and self.config.task in [ 415 | "text-generation", 416 | "automatic-speech-recognition", 417 | "fill-mask", 418 | "multiple-choice", 419 | "next-sentence-prediction", 420 | "text2text-generation", 421 | "text-classification", 422 | "speech-seq2seq", 423 | "token-classification", 424 | ]: 425 | output_desc = output_descs["logits"] 426 | if output_desc.do_softmax: 427 | prediction = torch.nn.functional.softmax(outputs[0], dim=-1) 428 | else: 429 | prediction = outputs[0] # logits 430 | 431 | return (prediction,) + presents 432 | 433 | if self.config.task == "object-detection": 434 | return outputs[0], outputs[1] # logits, pred_boxes 435 | 436 | if self.config.task == "question-answering": 437 | output_desc = output_descs["start_logits"] 438 | if output_desc.do_softmax: 439 | start_scores = torch.nn.functional.softmax(outputs[0], dim=-1) 440 | end_scores = torch.nn.functional.softmax(outputs[1], dim=-1) 441 | return start_scores, end_scores 442 | else: 443 | return outputs[0], outputs[1] # start_logits, end_logits 444 | 445 | if self.config.task == "semantic-segmentation": 446 | x = outputs[0] # logits 447 | output_desc = output_descs["logits"] 448 | if output_desc.do_upsample: 449 | x = torch.nn.functional.interpolate(x, size=inputs.shape[-2:], mode="bilinear", align_corners=False) 450 | if output_desc.do_softmax: 451 | x = torch.nn.functional.softmax(x, dim=1) 452 | if output_desc.do_argmax: 453 | x = x.argmax(1) 454 | return x 455 | 456 | if self.config.seq2seq == "encoder" and self.config.task in ["text2text-generation", "speech-seq2seq"]: 457 | return outputs[0] # last_hidden_state 458 | 459 | if self.config.task == "feature-extraction": 460 | if self.config.use_past: 461 | return (outputs[0],) + presents 462 | elif len(output_descs) > 1 and len(outputs) > 1: 463 | return outputs[0], outputs[1] # last_hidden_state, pooler_output 464 | else: 465 | return outputs[0] # last_hidden_state 466 | 467 | raise AssertionError(f"Cannot compute outputs for unknown task '{self.config.task}'") 468 | 469 | 470 | def export_pytorch( 471 | preprocessor: Union["PreTrainedTokenizer", "FeatureExtractionMixin", "ProcessorMixin"], 472 | model: "PreTrainedModel", 473 | config: CoreMLConfig, 474 | quantize: str = "float32", 475 | compute_units: ct.ComputeUnit = ct.ComputeUnit.ALL, 476 | ) -> ct.models.MLModel: 477 | """ 478 | Export a PyTorch model to Core ML format. 479 | 480 | Args: 481 | preprocessor ([`PreTrainedTokenizer`], [`FeatureExtractionMixin`] or [`ProcessorMixin`]): 482 | The preprocessor used for encoding the data. 483 | model ([`PreTrainedModel`]): 484 | The model to export. 485 | config ([`~coreml.config.CoreMLConfig`]): 486 | The Core ML configuration associated with the exported model. 487 | quantize (`str`, *optional*, defaults to `"float32"`): 488 | Quantization options. Possible values: `"float32"`, `"float16"`. 489 | compute_units (`ct.ComputeUnit`, *optional*, defaults to `ct.ComputeUnit.ALL`): 490 | Whether to optimize the model for CPU, GPU, and/or Neural Engine. 491 | 492 | Returns: 493 | `ct.models.MLModel`: the Core ML model object 494 | """ 495 | if not issubclass(type(model), PreTrainedModel): 496 | raise ValueError(f"Cannot convert unknown model type: {type(model)}") 497 | 498 | logger.info(f"Using framework PyTorch: {torch.__version__}") 499 | 500 | # Check if we need to override certain configuration items 501 | if config.values_override is not None: 502 | logger.info(f"Overriding {len(config.values_override)} configuration item(s)") 503 | for override_config_key, override_config_value in config.values_override.items(): 504 | logger.info(f"\t- {override_config_key} -> {override_config_value}") 505 | setattr(model.config, override_config_key, override_config_value) 506 | 507 | # Create dummy input data for doing the JIT trace. 508 | dummy_inputs = config.generate_dummy_inputs(preprocessor, framework=TensorType.PYTORCH) 509 | 510 | # Put the inputs in the order from the config. 511 | example_input = [dummy_inputs[key][0] for key in list(config.inputs.keys())] 512 | 513 | wrapper = Wrapper(preprocessor, model, config).eval() 514 | 515 | # Running the model once with gradients disabled prevents an error during JIT tracing 516 | # that happens with certain models such as LeViT. The error message is: "Cannot insert 517 | # a Tensor that requires grad as a constant." 518 | with torch.no_grad(): 519 | dummy_output = wrapper(*example_input) 520 | 521 | traced_model = torch.jit.trace(wrapper, example_input, strict=True) 522 | 523 | # Run the traced PyTorch model to get the shapes of the output tensors. 524 | with torch.no_grad(): 525 | example_output = traced_model(*example_input) 526 | 527 | if isinstance(example_output, (tuple, list)): 528 | example_output = [x.numpy() for x in example_output] 529 | else: 530 | example_output = [example_output.numpy()] 531 | 532 | convert_kwargs = {} 533 | if not config.use_legacy_format: 534 | convert_kwargs["compute_precision"] = ct.precision.FLOAT16 if quantize == "float16" else ct.precision.FLOAT32 535 | 536 | # For classification models, add the labels into the Core ML model and 537 | # designate it as the special "classifier" model type. 538 | if config.is_classifier: 539 | convert_kwargs['classifier_config'] = ct.ClassifierConfig(config.get_class_labels()) 540 | 541 | input_tensors = get_input_types(preprocessor, config, dummy_inputs) 542 | 543 | patched_ops = config.patch_pytorch_ops() 544 | restore_ops = {} 545 | if patched_ops is not None: 546 | for name, func in patched_ops.items(): 547 | logger.info(f"Patching PyTorch conversion '{name}' with {func}") 548 | if name in _TORCH_OPS_REGISTRY: 549 | restore_ops[name] = _TORCH_OPS_REGISTRY[name] 550 | del _TORCH_OPS_REGISTRY[name] 551 | _TORCH_OPS_REGISTRY[name] = func 552 | 553 | mlmodel = ct.convert( 554 | traced_model, 555 | inputs=input_tensors, 556 | convert_to="neuralnetwork" if config.use_legacy_format else "mlprogram", 557 | compute_units=compute_units, 558 | **convert_kwargs, 559 | ) 560 | 561 | if restore_ops is not None: 562 | for name, func in restore_ops.items(): 563 | if func is not None: 564 | logger.info(f"Restoring PyTorch conversion op '{name}' to {func}") 565 | _TORCH_OPS_REGISTRY[name] = func 566 | 567 | spec = mlmodel._spec 568 | 569 | input_descs = config.inputs 570 | output_descs = config.outputs 571 | 572 | for (i, input_desc) in enumerate(input_descs.values()): 573 | mlmodel.input_description[input_desc.name] = input_desc.description 574 | 575 | if input_desc.is_optional: 576 | spec.description.input[i].type.isOptional = True 577 | 578 | user_defined_metadata = { 579 | "co.huggingface.exporters.name": model.name_or_path, 580 | "co.huggingface.exporters.task": config.task, 581 | "co.huggingface.exporters.architecture": next(iter(model.config.architectures), ""), 582 | "co.huggingface.exporters.framework": "pytorch", 583 | "co.huggingface.exporters.precision": quantize, 584 | } 585 | if model.config.transformers_version: 586 | user_defined_metadata["transformers_version"] = model.config.transformers_version 587 | 588 | if config.is_classifier: 589 | output_desc = output_descs["logits"] 590 | ct.utils.rename_feature(spec, spec.description.predictedProbabilitiesName, output_desc.name) 591 | spec.description.predictedProbabilitiesName = output_desc.name 592 | mlmodel.output_description[output_desc.name] = output_desc.description 593 | 594 | output_desc = output_descs["class_labels"] 595 | ct.utils.rename_feature(spec, spec.description.predictedFeatureName, output_desc.name) 596 | spec.description.predictedFeatureName = output_desc.name 597 | mlmodel.output_description[output_desc.name] = output_desc.description 598 | else: 599 | for i, (key, output_desc) in enumerate(output_descs.items()): 600 | if i < len(example_output): 601 | output = spec.description.output[i] 602 | ct.utils.rename_feature(spec, output.name, output_desc.name, rename_inputs=False) 603 | mlmodel.output_description[output_desc.name] = output_desc.description 604 | 605 | if config.task in ["object-detection", "semantic-segmentation", "token-classification"]: 606 | labels = get_labels_as_list(model) 607 | user_defined_metadata["classes"] = ",".join(labels) 608 | 609 | if config.task == "semantic-segmentation": 610 | # Make the model available in Xcode's previewer. 611 | mlmodel.user_defined_metadata["com.apple.coreml.model.preview.type"] = "imageSegmenter" 612 | mlmodel.user_defined_metadata["com.apple.coreml.model.preview.params"] = json.dumps({"labels": labels}) 613 | 614 | if len(user_defined_metadata) > 0: 615 | spec.description.metadata.userDefined.update(user_defined_metadata) 616 | 617 | spec.description.metadata.shortDescription = config.short_description 618 | 619 | # Reload the model in case any input / output names were changed. 620 | mlmodel = ct.models.MLModel(mlmodel._spec, weights_dir=mlmodel.weights_dir) 621 | 622 | if config.use_legacy_format and quantize == "float16": 623 | mlmodel = ct.models.neural_network.quantization_utils.quantize_weights(mlmodel, nbits=16) 624 | 625 | return mlmodel 626 | 627 | 628 | def export( 629 | preprocessor: Union["PreTrainedTokenizer", "FeatureExtractionMixin", "ProcessorMixin"], 630 | model: Union["PreTrainedModel", "TFPreTrainedModel"], 631 | config: CoreMLConfig, 632 | quantize: str = "float32", 633 | compute_units: ct.ComputeUnit = ct.ComputeUnit.ALL, 634 | ) -> ct.models.MLModel: 635 | """ 636 | Export a Pytorch or TensorFlow model to Core ML format. 637 | 638 | Args: 639 | preprocessor ([`PreTrainedTokenizer`], [`FeatureExtractionMixin`] or [`ProcessorMixin`]): 640 | The preprocessor used for encoding the data. 641 | model ([`PreTrainedModel`] or [`TFPreTrainedModel`]): 642 | The model to export. 643 | config ([`~coreml.config.CoreMLConfig`]): 644 | The Core ML configuration associated with the exported model. 645 | quantize (`str`, *optional*, defaults to `"float32"`): 646 | Quantization options. Possible values: `"float32"`, `"float16"`. 647 | compute_units (`ct.ComputeUnit`, *optional*, defaults to `ct.ComputeUnit.ALL`): 648 | Whether to optimize the model for CPU, GPU, and/or Neural Engine. 649 | 650 | Returns: 651 | `ct.models.MLModel`: the Core ML model object 652 | """ 653 | if not (is_torch_available() or is_tf_available()): 654 | raise ImportError( 655 | "Cannot convert because neither PyTorch nor TensorFlow are installed. " 656 | "Please install torch or tensorflow first." 657 | ) 658 | 659 | if is_torch_available() and issubclass(type(model), PreTrainedModel): 660 | return export_pytorch(preprocessor, model, config, quantize, compute_units) 661 | else: 662 | raise ValueError(f"Cannot convert unknown model type: {type(model)}") 663 | -------------------------------------------------------------------------------- /src/exporters/coreml/features.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2022 The HuggingFace Team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | from functools import partial, reduce 17 | from typing import TYPE_CHECKING, Callable, Dict, Optional, Tuple, Type, Union 18 | 19 | from transformers import PretrainedConfig, is_tf_available, is_torch_available 20 | from .config import CoreMLConfig 21 | from ..utils import logging 22 | 23 | 24 | if TYPE_CHECKING: 25 | from transformers import PreTrainedModel, TFPreTrainedModel 26 | 27 | 28 | logger = logging.get_logger(__name__) # pylint: disable=invalid-name 29 | 30 | if is_torch_available(): 31 | from transformers.models.auto import ( 32 | AutoModel, 33 | AutoModelForCausalLM, 34 | AutoModelForCTC, 35 | AutoModelForImageClassification, 36 | # AutoModelForImageSegmentation, 37 | AutoModelForMaskedImageModeling, 38 | AutoModelForMaskedLM, 39 | AutoModelForMultipleChoice, 40 | AutoModelForNextSentencePrediction, 41 | AutoModelForObjectDetection, 42 | AutoModelForQuestionAnswering, 43 | AutoModelForSeq2SeqLM, 44 | AutoModelForSemanticSegmentation, 45 | AutoModelForSequenceClassification, 46 | AutoModelForSpeechSeq2Seq, 47 | AutoModelForTokenClassification, 48 | ) 49 | if is_tf_available(): 50 | from transformers.models.auto import ( 51 | TFAutoModel, 52 | # TFAutoModelForCausalLM, 53 | # TFAutoModelForMaskedLM, 54 | # TFAutoModelForMultipleChoice, 55 | # TFAutoModelForQuestionAnswering, 56 | # TFAutoModelForSeq2SeqLM, 57 | # TFAutoModelForSequenceClassification, 58 | # TFAutoModelForTokenClassification, 59 | ) 60 | if not is_torch_available() and not is_tf_available(): 61 | logger.warning( 62 | "The Core ML export features are only supported for PyTorch or TensorFlow. You will not be able to export models" 63 | " without one of these libraries installed." 64 | ) 65 | 66 | 67 | def supported_features_mapping( 68 | *supported_features: str, coreml_config_cls: str = None 69 | ) -> Dict[str, Callable[[PretrainedConfig], CoreMLConfig]]: 70 | """ 71 | Generate the mapping between supported the features and their corresponding CoreMLConfig for a given model. 72 | 73 | Args: 74 | *supported_features: The names of the supported features. 75 | coreml_config_cls: The CoreMLConfig full name corresponding to the model. 76 | 77 | Returns: 78 | The dictionary mapping a feature to an CoreMLConfig constructor. 79 | """ 80 | if coreml_config_cls is None: 81 | raise ValueError("A CoreMLConfig class must be provided") 82 | 83 | import exporters.coreml.models 84 | config_cls = exporters.coreml 85 | for attr_name in coreml_config_cls.split("."): 86 | if not hasattr(config_cls, attr_name): continue # hack! 87 | config_cls = getattr(config_cls, attr_name) 88 | mapping = {} 89 | for feature in supported_features: 90 | if "-with-past" in feature: 91 | task = feature.replace("-with-past", "") 92 | mapping[feature] = partial(config_cls.with_past, task=task) 93 | else: 94 | mapping[feature] = partial(config_cls.from_model_config, task=feature) 95 | 96 | return mapping 97 | 98 | 99 | class FeaturesManager: 100 | _TASKS_TO_AUTOMODELS = {} 101 | _TASKS_TO_TF_AUTOMODELS = {} 102 | if is_torch_available(): 103 | _TASKS_TO_AUTOMODELS = { 104 | "feature-extraction": AutoModel, 105 | "text-generation": AutoModelForCausalLM, 106 | "automatic-speech-recognition": AutoModelForCTC, 107 | "image-classification": AutoModelForImageClassification, 108 | # "image-segmentation": AutoModelForImageSegmentation, 109 | "masked-im": AutoModelForMaskedImageModeling, 110 | "fill-mask": AutoModelForMaskedLM, 111 | "multiple-choice": AutoModelForMultipleChoice, 112 | "next-sentence-prediction": AutoModelForNextSentencePrediction, 113 | "object-detection": AutoModelForObjectDetection, 114 | "question-answering": AutoModelForQuestionAnswering, 115 | "semantic-segmentation": AutoModelForSemanticSegmentation, 116 | "text2text-generation": AutoModelForSeq2SeqLM, 117 | "text-classification": AutoModelForSequenceClassification, 118 | "speech-seq2seq": AutoModelForSpeechSeq2Seq, 119 | "token-classification": AutoModelForTokenClassification, 120 | } 121 | if is_tf_available(): 122 | _TASKS_TO_TF_AUTOMODELS = { 123 | # "feature-extraction": TFAutoModel, 124 | # "text-generation": TFAutoModelForCausalLM, 125 | # "fill-mask": TFAutoModelForMaskedLM, 126 | # "multiple-choice": TFAutoModelForMultipleChoice, 127 | # "question-answering": TFAutoModelForQuestionAnswering, 128 | # "text2text-generation": TFAutoModelForSeq2SeqLM, 129 | # "text-classification": TFAutoModelForSequenceClassification, 130 | # "token-classification": TFAutoModelForTokenClassification, 131 | } 132 | 133 | _SYNONYM_TASK_MAP = { 134 | "sequence-classification": "text-classification", 135 | "causal-lm": "text-generation", 136 | "causal-lm-with-past": "text-generation-with-past", 137 | "seq2seq-lm": "text2text-generation", 138 | "seq2seq-lm-with-past": "text2text-generation-with-past", 139 | "speech2seq-lm": "automatic-speech-recognition", 140 | "speech2seq-lm-with-past": "automatic-speech-recognition-with-past", 141 | "masked-lm": "fill-mask", 142 | "vision2seq-lm": "image-to-text", 143 | "default": "feature-extraction", 144 | "default-with-past": "feature-extraction-with-past", 145 | "automatic-speech-recognition": "automatic-speech-recognition", 146 | "ctc": "automatic-speech-recognition", 147 | } 148 | 149 | _SUPPORTED_MODEL_TYPE = { 150 | "bart": supported_features_mapping( 151 | "feature-extraction", 152 | "text-generation", 153 | "text2text-generation", 154 | coreml_config_cls="models.bart.BartCoreMLConfig", 155 | ), 156 | # BEiT cannot be used with the masked image modeling autoclass, so this feature is excluded here 157 | "beit": supported_features_mapping( 158 | "feature-extraction", 159 | "image-classification", 160 | "semantic-segmentation", 161 | coreml_config_cls="models.beit.BeitCoreMLConfig" 162 | ), 163 | "bert": supported_features_mapping( 164 | "feature-extraction", 165 | "fill-mask", 166 | "text-generation", 167 | "text-generation-with-past", 168 | "multiple-choice", 169 | "next-sentence-prediction", 170 | "question-answering", 171 | "text-classification", 172 | "token-classification", 173 | coreml_config_cls="models.bert.BertCoreMLConfig", 174 | ), 175 | "big_bird": supported_features_mapping( 176 | "text-generation", 177 | "text-generation-with-past", 178 | coreml_config_cls="models.big_bird.BigBirdCoreMLConfig", 179 | ), 180 | "bigbird_pegasus": supported_features_mapping( 181 | "feature-extraction", 182 | "text-generation", 183 | "text-generation-with-past", 184 | "text2text-generation", 185 | coreml_config_cls="models.bigbird_pegasus.BigBirdPegasusCoreMLConfig", 186 | ), 187 | "blenderbot": supported_features_mapping( 188 | "feature-extraction", 189 | "text2text-generation", 190 | coreml_config_cls="models.blenderbot.BlenderbotCoreMLConfig", 191 | ), 192 | "blenderbot_small": supported_features_mapping( 193 | "feature-extraction", 194 | "text2text-generation", 195 | coreml_config_cls="models.blenderbot_small.BlenderbotSmallCoreMLConfig", 196 | ), 197 | "bloom": supported_features_mapping( 198 | "feature-extraction", 199 | "text-generation", 200 | coreml_config_cls="models.bloom.BloomCoreMLConfig", 201 | ), 202 | "convnext": supported_features_mapping( 203 | "feature-extraction", 204 | "image-classification", 205 | coreml_config_cls="models.convnext.ConvNextCoreMLConfig", 206 | ), 207 | "ctrl": supported_features_mapping( 208 | "feature-extraction", 209 | "feature-extraction-with-past", 210 | "text-generation", 211 | "text-generation-with-past", 212 | "text-classification", 213 | coreml_config_cls="models.ctrl.CTRLCoreMLConfig", 214 | ), 215 | "cvt": supported_features_mapping( 216 | "feature-extraction", 217 | "image-classification", 218 | coreml_config_cls="models.cvt.CvtCoreMLConfig", 219 | ), 220 | "data2vec": supported_features_mapping( 221 | "text-generation", 222 | "text-generation-with-past", 223 | coreml_config_cls="models.data2vec.Data2VecTextCoreMLConfig", 224 | ), 225 | "distilbert": supported_features_mapping( 226 | "feature-extraction", 227 | "fill-mask", 228 | "multiple-choice", 229 | "question-answering", 230 | "text-classification", 231 | "token-classification", 232 | coreml_config_cls="models.distilbert.DistilBertCoreMLConfig", 233 | ), 234 | "ernie": supported_features_mapping( 235 | "text-generation", 236 | "text-generation-with-past", 237 | coreml_config_cls="models.ernie.ErnieCoreMLConfig", 238 | ), 239 | "falcon": supported_features_mapping( 240 | "feature-extraction", 241 | "text-generation", 242 | "text-classification", 243 | coreml_config_cls="models.falcon.FalconCoreMLConfig", 244 | ), 245 | "gpt2": supported_features_mapping( 246 | "feature-extraction", 247 | #"feature-extraction-with-past", 248 | "text-generation", 249 | #"text-generation-with-past", 250 | "text-classification", 251 | "token-classification", 252 | coreml_config_cls="models.gpt2.GPT2CoreMLConfig", 253 | ), 254 | "gpt_bigcode": supported_features_mapping( 255 | "feature-extraction", 256 | "text-generation", 257 | "text-classification", 258 | coreml_config_cls="models.gpt_bigcode.GPTBigcodeCoreMLConfig", 259 | ), 260 | "gptj": supported_features_mapping( 261 | "feature-extraction", 262 | "text-generation", 263 | coreml_config_cls="models.gpt2.GPTJCoreMLConfig", 264 | ), 265 | "gpt_neo": supported_features_mapping( 266 | "feature-extraction", 267 | #"feature-extraction-with-past", 268 | "text-generation", 269 | #"text-generation-with-past", 270 | "text-classification", 271 | coreml_config_cls="models.gpt_neo.GPTNeoCoreMLConfig", 272 | ), 273 | "gpt_neox": supported_features_mapping( 274 | "feature-extraction", 275 | #"feature-extraction-with-past", 276 | "text-generation", 277 | #"text-generation-with-past", 278 | "text-classification", 279 | coreml_config_cls="models.gpt_neox.GPTNeoXCoreMLConfig", 280 | ), 281 | "levit": supported_features_mapping( 282 | "feature-extraction", "image-classification", coreml_config_cls="models.levit.LevitCoreMLConfig" 283 | ), 284 | "llama": supported_features_mapping( 285 | "feature-extraction", 286 | "text-generation", 287 | "text-classification", 288 | coreml_config_cls="models.llama.LlamaCoreMLConfig", 289 | ), 290 | "m2m_100": supported_features_mapping( 291 | "feature-extraction", 292 | "text2text-generation", 293 | coreml_config_cls="models.m2m_100.M2M100CoreMLConfig", 294 | ), 295 | "marian": supported_features_mapping( 296 | "feature-extraction", 297 | "text2text-generation", 298 | coreml_config_cls="models.marian.MarianMTCoreMLConfig", 299 | ), 300 | "mistral": supported_features_mapping( 301 | "feature-extraction", 302 | "text-generation", 303 | "text-classification", 304 | coreml_config_cls="models.mistral.MistralCoreMLConfig", 305 | ), 306 | "mobilebert": supported_features_mapping( 307 | "feature-extraction", 308 | "fill-mask", 309 | "multiple-choice", 310 | "next-sentence-prediction", 311 | "question-answering", 312 | "text-classification", 313 | "token-classification", 314 | coreml_config_cls="models.mobilebert.MobileBertCoreMLConfig", 315 | ), 316 | "mobilevit": supported_features_mapping( 317 | "feature-extraction", 318 | "image-classification", 319 | "semantic-segmentation", 320 | coreml_config_cls="models.mobilevit.MobileViTCoreMLConfig", 321 | ), 322 | "mobilevitv2": supported_features_mapping( 323 | "feature-extraction", 324 | "image-classification", 325 | "semantic-segmentation", 326 | coreml_config_cls="models.mobilevit.MobileViTCoreMLConfig", 327 | ), 328 | "mvp": supported_features_mapping( 329 | "feature-extraction", 330 | "text2text-generation", 331 | coreml_config_cls="models.mvp.MvpCoreMLConfig", 332 | ), 333 | "pegasus": supported_features_mapping( 334 | "feature-extraction", 335 | "text2text-generation", 336 | coreml_config_cls="models.pegasus.PegasusCoreMLConfig", 337 | ), 338 | "plbart": supported_features_mapping( 339 | "feature-extraction", 340 | "text2text-generation", 341 | coreml_config_cls="models.plbart.PLBartCoreMLConfig", 342 | ), 343 | "roberta": supported_features_mapping( 344 | "text-generation", 345 | "text-generation-with-past", 346 | coreml_config_cls="models.roberta.RobertaCoreMLConfig", 347 | ), 348 | "roformer": supported_features_mapping( 349 | "text-generation", 350 | "text-generation-with-past", 351 | coreml_config_cls="models.roformer.RoFormerCoreMLConfig", 352 | ), 353 | "segformer": supported_features_mapping( 354 | "feature-extraction", 355 | "image-classification", 356 | "semantic-segmentation", 357 | coreml_config_cls="models.segformer.SegformerCoreMLConfig", 358 | ), 359 | "splinter": supported_features_mapping( 360 | "feature-extraction", 361 | "text-generation-with-past", 362 | coreml_config_cls="models.splinter.SplinterCoreMLConfig", 363 | ), 364 | "squeezebert": supported_features_mapping( 365 | "feature-extraction", 366 | "fill-mask", 367 | "multiple-choice", 368 | "question-answering", 369 | "text-classification", 370 | "token-classification", 371 | coreml_config_cls="models.squeezebert.SqueezeBertCoreMLConfig", 372 | ), 373 | "t5": supported_features_mapping( 374 | "feature-extraction", 375 | "text2text-generation", 376 | coreml_config_cls="models.t5.T5CoreMLConfig", 377 | ), 378 | "vit": supported_features_mapping( 379 | "feature-extraction", "image-classification", "masked-im", coreml_config_cls="models.vit.ViTCoreMLConfig" 380 | ), 381 | "yolos": supported_features_mapping( 382 | "feature-extraction", 383 | "object-detection", 384 | coreml_config_cls="models.yolos.YolosCoreMLConfig", 385 | ), 386 | } 387 | 388 | AVAILABLE_FEATURES = sorted(reduce(lambda s1, s2: s1 | s2, (v.keys() for v in _SUPPORTED_MODEL_TYPE.values()))) 389 | AVAILABLE_FEATURES_INCLUDING_LEGACY = AVAILABLE_FEATURES + list(_SYNONYM_TASK_MAP.keys()) 390 | 391 | @staticmethod 392 | def get_supported_features_for_model_type( 393 | model_type: str, model_name: Optional[str] = None 394 | ) -> Dict[str, Callable[[PretrainedConfig], CoreMLConfig]]: 395 | """ 396 | Tries to retrieve the feature -> CoreMLConfig constructor map from the model type. 397 | 398 | Args: 399 | model_type (`str`): 400 | The model type to retrieve the supported features for. 401 | model_name (`str`, *optional*): 402 | The name attribute of the model object, only used for the exception message. 403 | 404 | Returns: 405 | The dictionary mapping each feature to a corresponding CoreMLConfig constructor. 406 | """ 407 | model_type = model_type.lower() 408 | if model_type not in FeaturesManager._SUPPORTED_MODEL_TYPE: 409 | model_type_and_model_name = f"{model_type} ({model_name})" if model_name else model_type 410 | raise KeyError( 411 | f"{model_type_and_model_name} is not supported yet. " 412 | f"Only {list(FeaturesManager._SUPPORTED_MODEL_TYPE.keys())} are supported. " 413 | f"If you want to support {model_type} please propose a PR or open up an issue." 414 | ) 415 | return FeaturesManager._SUPPORTED_MODEL_TYPE[model_type] 416 | 417 | @staticmethod 418 | def feature_to_task(feature: str) -> str: 419 | return feature.replace("-with-past", "") 420 | 421 | @staticmethod 422 | def map_from_synonym(feature: str) -> str: 423 | if feature in FeaturesManager._SYNONYM_TASK_MAP: 424 | feature = FeaturesManager._SYNONYM_TASK_MAP[feature] 425 | return feature 426 | 427 | @staticmethod 428 | def _validate_framework_choice(framework: str): 429 | """ 430 | Validates if the framework requested for the export is both correct and available, otherwise throws an 431 | exception. 432 | """ 433 | if framework not in ["pt", "tf"]: 434 | raise ValueError( 435 | f"Only two frameworks are supported for Core ML export: pt or tf, but {framework} was provided." 436 | ) 437 | elif framework == "pt" and not is_torch_available(): 438 | raise RuntimeError("Cannot export model to Core ML using PyTorch because no PyTorch package was found.") 439 | elif framework == "tf" and not is_tf_available(): 440 | raise RuntimeError("Cannot export model to Core ML using TensorFlow because no TensorFlow package was found.") 441 | 442 | @staticmethod 443 | def get_model_class_for_feature(feature: str, framework: str = "pt") -> Type: 444 | """ 445 | Attempts to retrieve an AutoModel class from a feature name. 446 | 447 | Args: 448 | feature (`str`): 449 | The feature required. 450 | framework (`str`, *optional*, defaults to `"pt"`): 451 | The framework to use for the export. 452 | 453 | Returns: 454 | The AutoModel class corresponding to the feature. 455 | """ 456 | task = FeaturesManager.feature_to_task(feature) 457 | FeaturesManager._validate_framework_choice(framework) 458 | if framework == "pt": 459 | task_to_automodel = FeaturesManager._TASKS_TO_AUTOMODELS 460 | else: 461 | task_to_automodel = FeaturesManager._TASKS_TO_TF_AUTOMODELS 462 | if task not in task_to_automodel: 463 | raise KeyError( 464 | f"Unknown task: {feature}. Possible values are {list(FeaturesManager._TASKS_TO_AUTOMODELS.values())}" 465 | ) 466 | return task_to_automodel[task] 467 | 468 | @staticmethod 469 | def get_model_from_feature( 470 | feature: str, model: str, framework: str = "pt", cache_dir: str = None 471 | ) -> Union["PreTrainedModel", "TFPreTrainedModel"]: 472 | """ 473 | Attempts to retrieve a model from a model's name and the feature to be enabled. 474 | 475 | Args: 476 | feature (`str`): 477 | The feature required. 478 | model (`str`): 479 | The name of the model to export. 480 | framework (`str`, *optional*, defaults to `"pt"`): 481 | The framework to use for the export. 482 | 483 | Returns: 484 | The instance of the model. 485 | 486 | """ 487 | model_class = FeaturesManager.get_model_class_for_feature(feature, framework) 488 | try: 489 | model = model_class.from_pretrained(model, cache_dir=cache_dir, torchscript=True) 490 | except OSError: 491 | if framework == "pt": 492 | model = model_class.from_pretrained(model, from_tf=True, cache_dir=cache_dir) 493 | else: 494 | model = model_class.from_pretrained(model, from_pt=True, cache_dir=cache_dir, torchscript=True) 495 | return model 496 | 497 | @staticmethod 498 | def check_supported_model_or_raise( 499 | model: Union["PreTrainedModel", "TFPreTrainedModel"], feature: str = "feature-extraction" 500 | ) -> Tuple[str, Callable]: 501 | """ 502 | Check whether or not the model has the requested features. 503 | 504 | Args: 505 | model: The model to export. 506 | feature: The name of the feature to check if it is available. 507 | 508 | Returns: 509 | (str) The type of the model (CoreMLConfig) The CoreMLConfig instance holding the model export properties. 510 | 511 | """ 512 | model_type = model.config.model_type.replace("-", "_") 513 | model_name = getattr(model, "name", "") 514 | model_features = FeaturesManager.get_supported_features_for_model_type(model_type, model_name=model_name) 515 | if feature not in model_features: 516 | raise ValueError( 517 | f"{model.config.model_type} doesn't support feature {feature}. Supported values are: {model_features}" 518 | ) 519 | 520 | return model.config.model_type, FeaturesManager._SUPPORTED_MODEL_TYPE[model_type][feature] 521 | 522 | @staticmethod 523 | def get_config(model_type: str, feature: str) -> CoreMLConfig: 524 | """ 525 | Gets the `CoreMLConfig` for a model_type and feature combination. 526 | 527 | Args: 528 | model_type (`str`): 529 | The model type to retrieve the config for. 530 | feature (`str`): 531 | The feature to retrieve the config for. 532 | 533 | Returns: 534 | `CoreMLConfig`: config for the combination 535 | """ 536 | return FeaturesManager._SUPPORTED_MODEL_TYPE[model_type][feature] 537 | -------------------------------------------------------------------------------- /src/exporters/coreml/models.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2022 The HuggingFace Team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | from collections import OrderedDict 17 | 18 | from .config import ( 19 | CoreMLConfig, 20 | InputDescription, 21 | OutputDescription, 22 | ) 23 | 24 | 25 | def patch_common_pytorch_ops(): 26 | """ 27 | Workarounds for issues that haven't been fixed yet in coremltools that 28 | affect many of our models. 29 | """ 30 | # from coremltools.converters.mil import Builder as mb 31 | return {} 32 | 33 | 34 | class BartCoreMLConfig(CoreMLConfig): 35 | modality = "text" 36 | 37 | 38 | class BeitCoreMLConfig(CoreMLConfig): 39 | modality = "vision" 40 | 41 | @property 42 | def outputs(self) -> OrderedDict[str, OutputDescription]: 43 | output_descs = super().outputs 44 | self._add_pooler_output(output_descs) 45 | return output_descs 46 | 47 | @property 48 | def atol_for_validation(self) -> float: 49 | return 0.01 50 | 51 | 52 | class BertCoreMLConfig(CoreMLConfig): 53 | modality = "text" 54 | 55 | @property 56 | def outputs(self) -> OrderedDict[str, OutputDescription]: 57 | output_descs = super().outputs 58 | self._add_pooler_output(output_descs) 59 | return output_descs 60 | 61 | 62 | class BigBirdCoreMLConfig(CoreMLConfig): 63 | modality = "text" 64 | 65 | 66 | class BigBirdPegasusCoreMLConfig(CoreMLConfig): 67 | modality = "text" 68 | 69 | 70 | class BlenderbotCoreMLConfig(CoreMLConfig): 71 | modality = "text" 72 | 73 | 74 | class BlenderbotSmallCoreMLConfig(CoreMLConfig): 75 | modality = "text" 76 | 77 | 78 | class BloomCoreMLConfig(CoreMLConfig): 79 | modality = "text" 80 | 81 | 82 | class ConvNextCoreMLConfig(CoreMLConfig): 83 | modality = "vision" 84 | 85 | @property 86 | def outputs(self) -> OrderedDict[str, OutputDescription]: 87 | output_descs = super().outputs 88 | self._add_pooler_output(output_descs) 89 | return output_descs 90 | 91 | @property 92 | def atol_for_validation(self) -> float: 93 | return 1e-3 94 | 95 | 96 | class CTRLCoreMLConfig(CoreMLConfig): 97 | modality = "text" 98 | 99 | def patch_pytorch_ops(self): 100 | """Implement lift_fresh as a noop, unless it's already available in a future update.""" 101 | import coremltools.converters.mil.frontend.torch.ops as ops 102 | if hasattr(ops, "lift_fresh"): 103 | return {} 104 | 105 | def lift_fresh(context, node): 106 | a = context[node.inputs[0]] 107 | context.add(a, node.name) 108 | 109 | return {"lift_fresh": lift_fresh} 110 | 111 | 112 | class CvtCoreMLConfig(CoreMLConfig): 113 | modality = "vision" 114 | 115 | @property 116 | def outputs(self) -> OrderedDict[str, OutputDescription]: 117 | if self.task == "feature-extraction": 118 | return OrderedDict( 119 | [ 120 | ( 121 | "last_hidden_state", 122 | OutputDescription( 123 | "last_hidden_state", 124 | "Sequence of hidden-states at the output of the last layer of the model", 125 | ) 126 | ), 127 | ( 128 | "cls_token_value", 129 | OutputDescription( 130 | "cls_token_value", 131 | "Classification token at the output of the last layer of the model", 132 | ) 133 | ), 134 | ] 135 | ) 136 | else: 137 | return super().outputs 138 | 139 | def patch_pytorch_ops(self): 140 | # coremltools does support einsum but not the equation "bhlt,bhtv->bhlv" 141 | # so override the implementation of this operation 142 | def einsum(context, node): 143 | from coremltools.converters.mil import Builder as mb 144 | from coremltools.converters.mil.frontend._utils import build_einsum_mil 145 | 146 | a = context[node.inputs[1]][0] 147 | b = context[node.inputs[1]][1] 148 | equation = context[node.inputs[0]].val 149 | 150 | if equation == "bhlt,bhtv->bhlv": 151 | x = mb.matmul(x=a, y=b, transpose_x=False, transpose_y=False, name=node.name) 152 | else: 153 | x = build_einsum_mil(a, b, equation, node.name) 154 | 155 | context.add(x) 156 | 157 | return {"einsum": einsum} 158 | 159 | @property 160 | def atol_for_validation(self) -> float: 161 | return 0.01 162 | 163 | 164 | class Data2VecTextCoreMLConfig(CoreMLConfig): 165 | modality = "text" 166 | 167 | 168 | class DistilBertCoreMLConfig(CoreMLConfig): 169 | modality = "text" 170 | 171 | @property 172 | def inputs(self) -> OrderedDict[str, InputDescription]: 173 | if self.task == "multiple-choice": 174 | return OrderedDict( 175 | [ 176 | ( 177 | "input_ids", 178 | InputDescription( 179 | "input_ids", 180 | "Indices of input sequence tokens in the vocabulary", 181 | sequence_length=self.input_ids_sequence_length, 182 | ) 183 | ), 184 | ( 185 | "attention_mask", 186 | InputDescription( 187 | "attention_mask", 188 | "Mask to avoid performing attention on padding token indices (1 = not masked, 0 = masked)", 189 | ) 190 | ), 191 | ] 192 | ) 193 | else: 194 | return super().inputs 195 | 196 | 197 | class ErnieCoreMLConfig(CoreMLConfig): 198 | modality = "text" 199 | 200 | 201 | class FalconCoreMLConfig(CoreMLConfig): 202 | modality = "text" 203 | 204 | def patch_pytorch_ops(self): 205 | # Copied from https://github.com/apple/coremltools/blob/b2f719075dc5bc19280a3045c1762d7d32bd3fdc/coremltools/converters/mil/frontend/torch/ops.py#L4326 206 | # with fallback of `bfloat16` to `float32`. 207 | def to(context, node): 208 | from coremltools.converters.mil import Builder as mb 209 | from coremltools.converters.mil.mil import types 210 | from coremltools.converters.mil.frontend.torch.ops import ( 211 | _get_inputs, 212 | NUMPY_DTYPE_TO_TORCH_NUM, 213 | NUM_TO_TORCH_DTYPE, 214 | NUM_TO_DTYPE_STRING, 215 | NUM_TO_NUMPY_DTYPE, 216 | TORCH_DTYPE_TO_NUM, 217 | ) 218 | from coremltools.converters.mil.mil.types import nptype_from_builtin 219 | from coremltools.converters.mil.mil.var import Var 220 | import numpy as _np 221 | import torch 222 | 223 | inputs = _get_inputs(context, node) 224 | 225 | # There are a lot of variants of `to` op. 226 | # - When len(inputs) is 7 or 8, we only care about the first two params (input and dtype). 227 | # - When len(inputs) == 6, the parameter is (input, _, dtype, non_blocking, copy, memory_format) 228 | # - When len(inputs) == 5, the parameter is (input, dtype, non_blocking, copy, memory_format) 229 | # - When len(inputs) == 4, the parameter is (input, dtype, non_blocking, copy) 230 | # - When len(inputs) == 3, the parameter is (input, non_blocking, copy) 231 | # We only use `input` and `dtype`, and `non_blocking` and `copy` are unused. 232 | _input = inputs[0] 233 | 234 | inputs_len = len(inputs) 235 | if inputs_len in (4, 5, 7, 8): 236 | target_dtype = inputs[1] 237 | elif inputs_len == 6: 238 | target_dtype = inputs[2] 239 | elif inputs_len <= 3: 240 | target_dtype = None 241 | else: 242 | raise ValueError( 243 | "Received invalid arguments for PyTorch conversion of op {}".format(node) 244 | ) 245 | 246 | if target_dtype is None: 247 | # When target_dtype is None, it means the input's dtype is already the target dtype. 248 | context.add(_input, torch_name=node.name) 249 | return 250 | elif types.is_scalar(target_dtype.sym_type) and target_dtype.val is not None: 251 | dtype = target_dtype.val 252 | else: 253 | # When the val of dtype is not available, bridge from the np dtype. 254 | np_type = nptype_from_builtin(target_dtype.dtype) 255 | dtype = NUMPY_DTYPE_TO_TORCH_NUM[np_type] 256 | 257 | if dtype in NUM_TO_TORCH_DTYPE: 258 | torch_dtype = NUM_TO_TORCH_DTYPE[dtype] 259 | else: 260 | # Fallback `bfloat32` to `fp32` for now. 261 | torch_dtype = torch.float32 262 | 263 | if isinstance(_input, Var) and _input.can_be_folded_to_const(): 264 | # numpy -> torch -> torch cast -> numpy 265 | # This path is needed to use the mapping of passed in dtypes to torch dtypes. 266 | casted_input = torch.tensor(_input.val).type(torch_dtype).cpu().numpy() 267 | res = mb.const(val=casted_input, name=node.name) 268 | else: 269 | if dtype in NUM_TO_DTYPE_STRING: 270 | res = mb.cast(x=_input, dtype=NUM_TO_DTYPE_STRING[dtype], name=node.name) 271 | else: 272 | # For dtype that is not supported by mb.cast, we do it in best-effort to cast it to int 273 | # or float based on the dtype. 274 | np_dtype = NUM_TO_NUMPY_DTYPE[dtype] 275 | if _np.issubdtype(np_dtype, _np.integer): 276 | res = mb.cast(x=_input, dtype="int32", name=node.name) 277 | elif _np.issubdtype(np_dtype, _np.floating): 278 | res = mb.cast(x=_input, dtype="fp32", name=node.name) 279 | else: 280 | raise ValueError(f"Unsupported op {node} with target dtype {np_dtype}") 281 | context.add(res) 282 | 283 | # Workaround until https://github.com/apple/coremltools/pull/2046 is released 284 | def numpy_t(context, node): 285 | from coremltools.converters.mil import Builder as mb 286 | 287 | assert len(node.outputs) == 1 288 | assert len(node.inputs) == 1 289 | 290 | x = context[node.inputs[0]] 291 | assert len(x.shape) == 2 292 | 293 | res = mb.transpose(x=x, perm=[1, 0], name=node.name) 294 | context.add(res) 295 | 296 | return {"to": to, "numpy_t": numpy_t} 297 | 298 | @property 299 | def atol_for_validation(self) -> float: 300 | # Possibly required because of internal `bfloat16` conversions in the model 301 | # float32 conversion requires ~0.03, whereas `float16` requires ~0.1 302 | return 0.1 303 | 304 | 305 | class GPT2CoreMLConfig(CoreMLConfig): 306 | modality = "text" 307 | 308 | 309 | class GPTBigcodeCoreMLConfig(CoreMLConfig): 310 | modality = "text" 311 | 312 | 313 | class GPTJCoreMLConfig(CoreMLConfig): 314 | modality = "text" 315 | 316 | def patch_pytorch_ops(self): 317 | # https://github.com/apple/coremltools/issues/1852 318 | def einsum(context, node): 319 | from coremltools.converters.mil import Builder as mb 320 | from coremltools.converters.mil.frontend._utils import build_einsum_mil 321 | from coremltools.converters.mil.mil import types 322 | 323 | a = context[node.inputs[1]][0] 324 | b = context[node.inputs[1]][1] 325 | equation = context[node.inputs[0]].val 326 | equation = "".join(equation.split(" ")) 327 | if equation == "i,j->ij" and types.is_int(a.dtype): 328 | a = mb.cast(x=a, dtype="fp32") 329 | x = build_einsum_mil(a, b, equation, node.name) 330 | 331 | context.add(x) 332 | 333 | return {"einsum": einsum} 334 | 335 | 336 | class GPTNeoCoreMLConfig(CoreMLConfig): 337 | modality = "text" 338 | 339 | 340 | class GPTNeoXCoreMLConfig(CoreMLConfig): 341 | modality = "text" 342 | 343 | @property 344 | def inputs(self) -> OrderedDict[str, InputDescription]: 345 | input_descs = super().inputs 346 | # Flexible shapes are incompatible with gather (https://github.com/huggingface/exporters/issues/43) 347 | input_descs["input_ids"].sequence_length = 128 348 | return input_descs 349 | 350 | 351 | class LevitCoreMLConfig(CoreMLConfig): 352 | modality = "vision" 353 | 354 | @property 355 | def outputs(self) -> OrderedDict[str, OutputDescription]: 356 | output_descs = super().outputs 357 | self._add_pooler_output(output_descs) 358 | return output_descs 359 | 360 | def patch_pytorch_ops(self): 361 | def reshape_as(context, node): 362 | from coremltools.converters.mil import Builder as mb 363 | 364 | a = context[node.inputs[0]] 365 | b = context[node.inputs[1]] 366 | y = mb.shape(x=b) 367 | x = mb.reshape(x=a, shape=y, name=node.name) 368 | context.add(x) 369 | 370 | return {"reshape_as": reshape_as} 371 | 372 | @property 373 | def atol_for_validation(self) -> float: 374 | return 0.01 375 | 376 | 377 | class LlamaCoreMLConfig(CoreMLConfig): 378 | modality = "text" 379 | 380 | 381 | class M2M100CoreMLConfig(CoreMLConfig): 382 | modality = "text" 383 | 384 | 385 | class MarianMTCoreMLConfig(CoreMLConfig): 386 | modality = "text" 387 | 388 | 389 | class MistralCoreMLConfig(CoreMLConfig): 390 | modality = "text" 391 | 392 | def patch_pytorch_ops(self): 393 | # Workaround for https://github.com/apple/coremltools/pull/2017 394 | def log(context, node): 395 | from coremltools.converters.mil import Builder as mb 396 | from coremltools.converters.mil.mil import types 397 | 398 | a = context[node.inputs[0]] 399 | if types.is_int(a.dtype): 400 | a = mb.cast(x=a, dtype="fp32") 401 | x = mb.log(x=a, name=node.name) 402 | context.add(x) 403 | 404 | return {"log": log} 405 | 406 | 407 | class MobileBertCoreMLConfig(CoreMLConfig): 408 | modality = "text" 409 | 410 | @property 411 | def outputs(self) -> OrderedDict[str, OutputDescription]: 412 | output_descs = super().outputs 413 | self._add_pooler_output(output_descs) 414 | return output_descs 415 | 416 | 417 | class MobileViTCoreMLConfig(CoreMLConfig): 418 | modality = "vision" 419 | 420 | @property 421 | def inputs(self) -> OrderedDict[str, InputDescription]: 422 | input_descs = super().inputs 423 | input_descs["pixel_values"].color_layout = "BGR" 424 | return input_descs 425 | 426 | @property 427 | def outputs(self) -> OrderedDict[str, OutputDescription]: 428 | output_descs = super().outputs 429 | self._add_pooler_output(output_descs) 430 | return output_descs 431 | 432 | 433 | class MvpCoreMLConfig(CoreMLConfig): 434 | modality = "text" 435 | 436 | 437 | class PegasusCoreMLConfig(CoreMLConfig): 438 | modality = "text" 439 | 440 | 441 | class PLBartCoreMLConfig(CoreMLConfig): 442 | modality = "text" 443 | 444 | 445 | class RobertaCoreMLConfig(CoreMLConfig): 446 | modality = "text" 447 | 448 | 449 | class RoFormerCoreMLConfig(CoreMLConfig): 450 | modality = "text" 451 | 452 | 453 | class SegformerCoreMLConfig(CoreMLConfig): 454 | modality = "vision" 455 | 456 | 457 | class SplinterCoreMLConfig(CoreMLConfig): 458 | modality = "text" 459 | 460 | 461 | class SqueezeBertCoreMLConfig(CoreMLConfig): 462 | modality = "text" 463 | 464 | @property 465 | def outputs(self) -> OrderedDict[str, OutputDescription]: 466 | output_descs = super().outputs 467 | self._add_pooler_output(output_descs) 468 | return output_descs 469 | 470 | 471 | class T5CoreMLConfig(CoreMLConfig): 472 | modality = "text" 473 | 474 | @property 475 | def _input_descriptions(self) -> OrderedDict[str, InputDescription]: 476 | if self.task == "feature-extraction": 477 | return OrderedDict( 478 | [ 479 | ( 480 | "input_ids", 481 | InputDescription( 482 | "input_ids", 483 | "Indices of input sequence tokens in the vocabulary", 484 | sequence_length=self.input_ids_sequence_length, 485 | ) 486 | ), 487 | ( 488 | "attention_mask", 489 | InputDescription( 490 | "attention_mask", 491 | "Mask to avoid performing attention on padding token indices (1 = not masked, 0 = masked)", 492 | ) 493 | ), 494 | ( 495 | "decoder_input_ids", 496 | InputDescription( 497 | "decoder_input_ids", 498 | "Indices of decoder input sequence tokens in the vocabulary", 499 | ) 500 | ), 501 | ( 502 | "decoder_attention_mask", 503 | InputDescription( 504 | "decoder_attention_mask", 505 | "Mask to avoid performing attention on padding token indices (1 = not masked, 0 = masked)", 506 | ) 507 | ), 508 | ] 509 | ) 510 | return super()._input_descriptions 511 | 512 | 513 | class ViTCoreMLConfig(CoreMLConfig): 514 | modality = "vision" 515 | 516 | @property 517 | def outputs(self) -> OrderedDict[str, OutputDescription]: 518 | output_descs = super().outputs 519 | self._add_pooler_output(output_descs) 520 | return output_descs 521 | 522 | 523 | class YolosCoreMLConfig(CoreMLConfig): 524 | modality = "vision" 525 | 526 | def patch_pytorch_ops(self): 527 | # There is no bicubic upsampling in Core ML, so we'll have to use bilinear. 528 | # Still seems to work well enough. Note: the bilinear resize is applied to 529 | # constant tensors, so we could actually remove this op completely! 530 | def upsample_bicubic2d(context, node): 531 | from coremltools.converters.mil import Builder as mb 532 | 533 | a = context[node.inputs[0]] 534 | b = context[node.inputs[1]] 535 | x = mb.resize_bilinear(x=a, target_size_height=b.val[0], target_size_width=b.val[1], name=node.name) 536 | context.add(x) 537 | 538 | return {"upsample_bicubic2d": upsample_bicubic2d} 539 | 540 | @property 541 | def atol_for_validation(self) -> float: 542 | # because of bilinear instead of bicubic, atol must be very large here 543 | return 10 544 | -------------------------------------------------------------------------------- /src/exporters/coreml/validate.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2022 The HuggingFace Team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Tuple, Union 17 | 18 | import coremltools as ct 19 | import numpy as np 20 | 21 | from transformers.utils import TensorType, is_torch_available 22 | from transformers.modeling_utils import PreTrainedModel 23 | 24 | from .config import CoreMLConfig 25 | from ..utils import logging 26 | 27 | 28 | logger = logging.get_logger(__name__) # pylint: disable=invalid-name 29 | 30 | 31 | def softmax(x, axis=-1): 32 | maxes = np.max(x, axis=axis, keepdims=True) 33 | shifted_exp = np.exp(x - maxes) 34 | return shifted_exp / shifted_exp.sum(axis=axis, keepdims=True) 35 | 36 | 37 | def validate_model_outputs( 38 | config: CoreMLConfig, 39 | preprocessor: Union["PreTrainedTokenizer", "FeatureExtractionMixin", "ProcessorMixin"], 40 | reference_model: Union["PreTrainedModel", "TFPreTrainedModel"], 41 | mlmodel: ct.models.MLModel, 42 | atol: float, 43 | ): 44 | """ 45 | Validate that the outputs from the base and exported model agree within some absolute tolerance. 46 | 47 | Args: 48 | config ([`~coreml.config.CoreMLConfig`]): 49 | The Core ML configuration associated with the exported model. 50 | preprocessor ([`PreTrainedTokenizer`], [`FeatureExtractionMixin`] or [`ProcessorMixin`]): 51 | The preprocessor used for encoding the data. 52 | reference_model ([`PreTrainedModel`] or [`TFPreTrainedModel`]): 53 | The model to export. 54 | mlmodel (`ct.models.MLModel`): 55 | The exported Core ML model. 56 | atol (`float`): 57 | Absolute tolerance. Differences larger than this value are considered problematic. 58 | """ 59 | logger.info("Validating Core ML model...") 60 | 61 | input_descs = config.inputs 62 | output_descs = config.outputs 63 | 64 | if is_torch_available() and issubclass(type(reference_model), PreTrainedModel): 65 | framework = TensorType.PYTORCH 66 | else: 67 | framework = TensorType.TENSORFLOW 68 | 69 | dummy_inputs = config.generate_dummy_inputs(preprocessor, framework) 70 | 71 | reference_model_inputs = {} 72 | past_key_values = [] 73 | coreml_inputs = {} 74 | 75 | # Put the dummy inputs into Core ML and reference model input dictionaries. 76 | # The separate past_key_values inputs are combined into a tuple of tuples. 77 | for name in input_descs.keys(): 78 | ref_value, coreml_value = dummy_inputs[name] 79 | if name.startswith("past_key_values_"): 80 | if name.endswith("_key"): 81 | past_key_values.append((ref_value,)) 82 | else: 83 | past_key_values[-1] += (ref_value,) 84 | elif name == "encoder_outputs": 85 | reference_model_inputs[name] = (ref_value,) 86 | else: 87 | reference_model_inputs[name] = ref_value 88 | coreml_inputs[input_descs[name].name] = coreml_value 89 | 90 | if len(past_key_values) > 0: 91 | reference_model_inputs["past_key_values"] = past_key_values 92 | 93 | # Compute outputs from the reference model 94 | if is_torch_available() and issubclass(type(reference_model), PreTrainedModel): 95 | reference_model.to("cpu").eval() 96 | if config.seq2seq == "encoder": 97 | reference_model = reference_model.get_encoder() 98 | ref_outputs_dict = reference_model(**reference_model_inputs, return_dict=True) 99 | 100 | # Unpack the past_key_values output into separate outputs, as that is also 101 | # how the Core ML mdel does it. 102 | if "past_key_values" in ref_outputs_dict: 103 | for i in range(len(ref_outputs_dict["past_key_values"])): 104 | ref_outputs_dict[f"present_{i}_key"] = ref_outputs_dict["past_key_values"][i][0] 105 | ref_outputs_dict[f"present_{i}_value"] = ref_outputs_dict["past_key_values"][i][1] 106 | 107 | # Compute outputs from the Core ML model 108 | coreml_outputs = mlmodel.predict(coreml_inputs) 109 | 110 | # Map the Core ML output names back to the names used by the reference model 111 | coreml_output_names = list(coreml_outputs.keys()) 112 | coreml_output_internal_names = [] 113 | for name, desc in output_descs.items(): 114 | if desc.name in coreml_output_names: 115 | coreml_output_internal_names.append(name) 116 | 117 | spec = mlmodel._spec 118 | 119 | # Classifier models are special in Core ML 120 | if config.is_classifier: 121 | logger.info("\t- Core ML model is classifier, validating output") 122 | 123 | if is_torch_available() and issubclass(type(reference_model), PreTrainedModel): 124 | ref_logits = ref_outputs_dict["logits"].detach().numpy() 125 | else: 126 | ref_logits = ref_outputs_dict["logits"].numpy() 127 | 128 | labels_name = spec.description.predictedFeatureName 129 | coreml_value = coreml_outputs[labels_name] 130 | 131 | ref_value = reference_model.config.id2label[np.argmax(ref_logits, axis=-1)[0]] 132 | if coreml_value != ref_value: 133 | logger.info(f"\t\t-[x] predicted class '{coreml_value}' doesn't match '{ref_value}'") 134 | raise ValueError( 135 | "Predicted class doesn't match between reference model and Core ML exported model: " 136 | f"Got {ref_value} (reference) and {coreml_value} (Core ML)" 137 | ) 138 | else: 139 | logger.info(f"\t\t-[✓] predicted class '{coreml_value}' matches '{ref_value}'") 140 | 141 | probs_name = spec.description.predictedProbabilitiesName 142 | coreml_value = coreml_outputs[probs_name] 143 | ref_value = softmax(ref_logits, axis=-1)[0] 144 | 145 | # Shape 146 | if len(coreml_value) != len(ref_value): 147 | logger.info(f"\t\t-[x] number of classes {len(coreml_value)} doesn't match {len(ref_value)}") 148 | raise ValueError( 149 | "Output shape doesn't match between reference model and Core ML exported model: " 150 | f"Got {len(ref_value)} (reference) and {len(coreml_value)} (Core ML)" 151 | ) 152 | else: 153 | logger.info(f"\t\t-[✓] number of classes {len(coreml_value)} matches {len(ref_value)}") 154 | 155 | # Core ML probabilities are in a dict, put in sorted list for comparing 156 | class_labels = config.get_class_labels() 157 | coreml_probs = np.zeros_like(ref_value) 158 | for i in range(len(ref_value)): 159 | coreml_probs[i] = coreml_value[class_labels[i]] 160 | 161 | # Values 162 | if not np.allclose(ref_value, coreml_probs, atol=atol): 163 | logger.info(f"\t\t-[x] values not close enough (atol: {atol})") 164 | raise ValueError( 165 | "Output values do not match between reference model and Core ML exported model: " 166 | f"Got max absolute difference of: {np.amax(np.abs(ref_value - coreml_probs))}" 167 | ) 168 | else: 169 | logger.info(f"\t\t-[✓] all values close (atol: {atol})") 170 | 171 | return 172 | 173 | # Check that keys in coreml_output_internal are a subset of keys from ref_outputs 174 | ref_outputs_set = set(ref_outputs_dict.keys()) 175 | coreml_outputs_set = set(coreml_output_internal_names) 176 | if not coreml_outputs_set.issubset(ref_outputs_set): 177 | logger.info( 178 | f"\t-[x] Core ML model output names {coreml_outputs_set} do not match reference model {ref_outputs_set}" 179 | ) 180 | raise ValueError( 181 | "Output names do not match between reference model and Core ML exported model: " 182 | f"{coreml_outputs_set.difference(ref_outputs_set)}" 183 | ) 184 | else: 185 | logger.info(f"\t-[✓] Core ML model output names match reference model ({coreml_outputs_set})") 186 | 187 | # Check the shape and values match 188 | for name in coreml_output_internal_names: 189 | coreml_name = output_descs[name].name 190 | coreml_value = coreml_outputs[coreml_name] 191 | 192 | if is_torch_available() and issubclass(type(reference_model), PreTrainedModel): 193 | ref_value = ref_outputs_dict[name].detach().numpy() 194 | else: 195 | ref_value = ref_outputs_dict[name].numpy() 196 | 197 | if output_descs[name].do_softmax: 198 | axis = 1 if config.task == "semantic-segmentation" else -1 199 | ref_value = softmax(ref_value, axis=axis) 200 | 201 | logger.info(f'\t- Validating Core ML model output "{name}":') 202 | 203 | # Shape 204 | if not coreml_value.shape == ref_value.shape: 205 | if config.task == "semantic-segmentation" and (output_descs[name].do_upsample or output_descs[name].do_argmax): 206 | logger.info("\t\t-[ ] cannot compare outputs because of do_upsample or do_argmax options") 207 | continue 208 | else: 209 | logger.info(f"\t\t-[x] shape {coreml_value.shape} doesn't match {ref_value.shape}") 210 | raise ValueError( 211 | "Output shape doesn't match between reference model and Core ML exported model: " 212 | f"Got {ref_value.shape} (reference) and {coreml_value.shape} (Core ML)" 213 | ) 214 | else: 215 | logger.info(f"\t\t-[✓] {coreml_value.shape} matches {ref_value.shape}") 216 | 217 | # Values 218 | if not np.allclose(ref_value, coreml_value, atol=atol): 219 | logger.info(f"\t\t-[x] values not close enough (atol: {atol})") 220 | raise ValueError( 221 | "Output values do not match between reference model and Core ML exported model: " 222 | f"Got max absolute difference of: {np.amax(np.abs(ref_value - coreml_value))}" 223 | ) 224 | else: 225 | logger.info(f"\t\t-[✓] all values close (atol: {atol})") 226 | -------------------------------------------------------------------------------- /src/exporters/utils/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2022 The HuggingFace Team. All rights reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | """Hugging Face Exporters utils.""" 15 | -------------------------------------------------------------------------------- /src/exporters/utils/logging.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # This file was directly copied from transformers. 3 | # Copyright 2020 Optuna, Hugging Face 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | """ Logging utilities. """ 17 | 18 | import logging 19 | import os 20 | import sys 21 | import threading 22 | from logging import CRITICAL # NOQA 23 | from logging import DEBUG # NOQA 24 | from logging import ERROR # NOQA 25 | from logging import FATAL # NOQA 26 | from logging import INFO # NOQA 27 | from logging import NOTSET # NOQA 28 | from logging import WARN # NOQA 29 | from logging import WARNING # NOQA 30 | from typing import Optional 31 | 32 | 33 | _lock = threading.Lock() 34 | _default_handler: Optional[logging.Handler] = None 35 | 36 | log_levels = { 37 | "debug": logging.DEBUG, 38 | "info": logging.INFO, 39 | "warning": logging.WARNING, 40 | "error": logging.ERROR, 41 | "critical": logging.CRITICAL, 42 | } 43 | 44 | _default_log_level = logging.WARNING 45 | 46 | 47 | def _get_default_logging_level(): 48 | """ 49 | If TRANSFORMERS_VERBOSITY env var is set to one of the valid choices return that as the new default level. If it is 50 | not - fall back to ``_default_log_level`` 51 | """ 52 | env_level_str = os.getenv("TRANSFORMERS_VERBOSITY", None) 53 | if env_level_str: 54 | if env_level_str in log_levels: 55 | return log_levels[env_level_str] 56 | else: 57 | logging.getLogger().warning( 58 | f"Unknown option TRANSFORMERS_VERBOSITY={env_level_str}, " 59 | f"has to be one of: { ', '.join(log_levels.keys()) }" 60 | ) 61 | return _default_log_level 62 | 63 | 64 | def _get_library_name() -> str: 65 | 66 | return __name__.split(".")[0] 67 | 68 | 69 | def _get_library_root_logger() -> logging.Logger: 70 | 71 | return logging.getLogger(_get_library_name()) 72 | 73 | 74 | def _configure_library_root_logger() -> None: 75 | 76 | global _default_handler 77 | 78 | with _lock: 79 | if _default_handler: 80 | # This library has already configured the library root logger. 81 | return 82 | _default_handler = logging.StreamHandler() # Set sys.stderr as stream. 83 | _default_handler.flush = sys.stderr.flush 84 | 85 | # Apply our default configuration to the library root logger. 86 | library_root_logger = _get_library_root_logger() 87 | library_root_logger.addHandler(_default_handler) 88 | library_root_logger.setLevel(_get_default_logging_level()) 89 | library_root_logger.propagate = False 90 | 91 | 92 | def _reset_library_root_logger() -> None: 93 | 94 | global _default_handler 95 | 96 | with _lock: 97 | if not _default_handler: 98 | return 99 | 100 | library_root_logger = _get_library_root_logger() 101 | library_root_logger.removeHandler(_default_handler) 102 | library_root_logger.setLevel(logging.NOTSET) 103 | _default_handler = None 104 | 105 | 106 | def get_log_levels_dict(): 107 | return log_levels 108 | 109 | 110 | def get_logger(name: Optional[str] = None) -> logging.Logger: 111 | """ 112 | Return a logger with the specified name. 113 | 114 | This function is not supposed to be directly accessed unless you are writing a custom transformers module. 115 | """ 116 | 117 | if name is None: 118 | name = _get_library_name() 119 | 120 | _configure_library_root_logger() 121 | return logging.getLogger(name) 122 | 123 | 124 | def get_verbosity() -> int: 125 | """ 126 | Return the current level for the 🤗 Transformers's root logger as an int. 127 | 128 | Returns: 129 | :obj:`int`: The logging level. 130 | 131 | .. note:: 132 | 133 | 🤗 Transformers has following logging levels: 134 | 135 | - 50: ``transformers.logging.CRITICAL`` or ``transformers.logging.FATAL`` 136 | - 40: ``transformers.logging.ERROR`` 137 | - 30: ``transformers.logging.WARNING`` or ``transformers.logging.WARN`` 138 | - 20: ``transformers.logging.INFO`` 139 | - 10: ``transformers.logging.DEBUG`` 140 | """ 141 | 142 | _configure_library_root_logger() 143 | return _get_library_root_logger().getEffectiveLevel() 144 | 145 | 146 | def set_verbosity(verbosity: int) -> None: 147 | """ 148 | Set the verbosity level for the 🤗 Transformers's root logger. 149 | 150 | Args: 151 | verbosity (:obj:`int`): 152 | Logging level, e.g., one of: 153 | 154 | - ``transformers.logging.CRITICAL`` or ``transformers.logging.FATAL`` 155 | - ``transformers.logging.ERROR`` 156 | - ``transformers.logging.WARNING`` or ``transformers.logging.WARN`` 157 | - ``transformers.logging.INFO`` 158 | - ``transformers.logging.DEBUG`` 159 | """ 160 | 161 | _configure_library_root_logger() 162 | _get_library_root_logger().setLevel(verbosity) 163 | 164 | 165 | def set_verbosity_info(): 166 | """Set the verbosity to the :obj:`INFO` level.""" 167 | return set_verbosity(INFO) 168 | 169 | 170 | def set_verbosity_warning(): 171 | """Set the verbosity to the :obj:`WARNING` level.""" 172 | return set_verbosity(WARNING) 173 | 174 | 175 | def set_verbosity_debug(): 176 | """Set the verbosity to the :obj:`DEBUG` level.""" 177 | return set_verbosity(DEBUG) 178 | 179 | 180 | def set_verbosity_error(): 181 | """Set the verbosity to the :obj:`ERROR` level.""" 182 | return set_verbosity(ERROR) 183 | 184 | 185 | def disable_default_handler() -> None: 186 | """Disable the default handler of the HuggingFace Transformers's root logger.""" 187 | 188 | _configure_library_root_logger() 189 | 190 | assert _default_handler is not None 191 | _get_library_root_logger().removeHandler(_default_handler) 192 | 193 | 194 | def enable_default_handler() -> None: 195 | """Enable the default handler of the HuggingFace Transformers's root logger.""" 196 | 197 | _configure_library_root_logger() 198 | 199 | assert _default_handler is not None 200 | _get_library_root_logger().addHandler(_default_handler) 201 | 202 | 203 | def add_handler(handler: logging.Handler) -> None: 204 | """adds a handler to the HuggingFace Transformers's root logger.""" 205 | 206 | _configure_library_root_logger() 207 | 208 | assert handler is not None 209 | _get_library_root_logger().addHandler(handler) 210 | 211 | 212 | def remove_handler(handler: logging.Handler) -> None: 213 | """removes given handler from the HuggingFace Transformers's root logger.""" 214 | 215 | _configure_library_root_logger() 216 | 217 | assert handler is not None and handler not in _get_library_root_logger().handlers 218 | _get_library_root_logger().removeHandler(handler) 219 | 220 | 221 | def disable_propagation() -> None: 222 | """ 223 | Disable propagation of the library log outputs. Note that log propagation is disabled by default. 224 | """ 225 | 226 | _configure_library_root_logger() 227 | _get_library_root_logger().propagate = False 228 | 229 | 230 | def enable_propagation() -> None: 231 | """ 232 | Enable propagation of the library log outputs. Please disable the HuggingFace Transformers's default handler to 233 | prevent double logging if the root logger has been configured. 234 | """ 235 | 236 | _configure_library_root_logger() 237 | _get_library_root_logger().propagate = True 238 | 239 | 240 | def enable_explicit_format() -> None: 241 | """ 242 | Enable explicit formatting for every HuggingFace Transformers's logger. The explicit formatter is as follows: 243 | 244 | :: 245 | 246 | [LEVELNAME|FILENAME|LINE NUMBER] TIME >> MESSAGE 247 | 248 | All handlers currently bound to the root logger are affected by this method. 249 | """ 250 | handlers = _get_library_root_logger().handlers 251 | 252 | for handler in handlers: 253 | formatter = logging.Formatter("[%(levelname)s|%(filename)s:%(lineno)s] %(asctime)s >> %(message)s") 254 | handler.setFormatter(formatter) 255 | 256 | 257 | def reset_format() -> None: 258 | """ 259 | Resets the formatting for HuggingFace Transformers's loggers. 260 | 261 | All handlers currently bound to the root logger are affected by this method. 262 | """ 263 | handlers = _get_library_root_logger().handlers 264 | 265 | for handler in handlers: 266 | handler.setFormatter(None) 267 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/exporters/7a545974275c7af167a2fa4e16c4574359f2acec/tests/__init__.py -------------------------------------------------------------------------------- /tests/test_coreml.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2021-2022 The HuggingFace Team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import pytest 17 | 18 | from unittest import TestCase 19 | from parameterized import parameterized 20 | from transformers import AutoConfig, is_tf_available, is_torch_available 21 | 22 | from exporters.coreml import ( 23 | CoreMLConfig, 24 | export, 25 | validate_model_outputs, 26 | ) 27 | from transformers.onnx.utils import get_preprocessor 28 | from transformers.testing_utils import require_tf, require_torch, require_vision, slow 29 | from .testing_utils import require_coreml, require_macos 30 | 31 | 32 | if is_torch_available() or is_tf_available(): 33 | from exporters.coreml.features import FeaturesManager 34 | 35 | 36 | class TextCoreMLConfig(CoreMLConfig): 37 | modality = "text" 38 | 39 | 40 | class CoreMLConfigTestCase(TestCase): 41 | def test_unknown_modality(self): 42 | with pytest.raises(ValueError): 43 | config = CoreMLConfig(None, task="feature-extraction") 44 | 45 | def test_unknown_task(self): 46 | with pytest.raises(AssertionError): 47 | config = TextCoreMLConfig(None, task="unknown-task") 48 | _ = config.inputs 49 | 50 | def test_sequence_length(self): 51 | config = TextCoreMLConfig(None, task="feature-extraction") 52 | flexible_outputs = config.get_flexible_outputs() 53 | self.assertEqual(len(flexible_outputs), 1) 54 | self.assertIn("last_hidden_state", flexible_outputs) 55 | 56 | flexible_output = flexible_outputs["last_hidden_state"] 57 | self.assertEqual(len(flexible_output), 1) 58 | self.assertEqual(flexible_output[0]["axis"], 1) 59 | self.assertEqual(flexible_output[0]["min"], 1) 60 | self.assertEqual(flexible_output[0]["max"], config.max_sequence_length) 61 | 62 | config = TextCoreMLConfig(None, task="text-classification") 63 | flexible_outputs = config.get_flexible_outputs() 64 | self.assertTrue(len(flexible_outputs) == 0) 65 | 66 | 67 | PYTORCH_EXPORT_MODELS = { 68 | ("beit", "microsoft/beit-base-patch16-224"), 69 | ("bert", "bert-base-cased"), 70 | ("convnext", "facebook/convnext-tiny-224"), 71 | ("cvt", "microsoft/cvt-21-384-22k"), 72 | ("distilbert", "distilbert-base-cased"), 73 | ("gpt2", "distilgpt2"), 74 | ("levit", "facebook/levit-128S"), 75 | ("mobilebert", "google/mobilebert-uncased"), 76 | ("mobilevit", "apple/mobilevit-small"), 77 | ("mobilevitv2", "apple/mobilevitv2-1.0-imagenet1k-256"), 78 | ("segformer", "nvidia/mit-b0"), 79 | ("squeezebert", "squeezebert/squeezebert-uncased"), 80 | ("t5", "t5-small"), 81 | ("vit", "google/vit-base-patch16-224"), 82 | ("yolos", "hustvl/yolos-tiny"), 83 | } 84 | 85 | PYTORCH_EXPORT_WITH_PAST_MODELS = { 86 | ("ctrl", "sshleifer/tiny-ctrl"), 87 | #TODO ("gpt2", "distilgpt2"), 88 | } 89 | 90 | PYTORCH_EXPORT_SEQ2SEQ_WITH_PAST_MODELS = {} 91 | 92 | TENSORFLOW_EXPORT_DEFAULT_MODELS = {} 93 | 94 | TENSORFLOW_EXPORT_WITH_PAST_MODELS = {} 95 | 96 | TENSORFLOW_EXPORT_SEQ2SEQ_WITH_PAST_MODELS = {} 97 | 98 | 99 | # Copied from tests.onnx.test_onnx_v2._get_models_to_test 100 | def _get_models_to_test(export_models_list): 101 | models_to_test = [] 102 | if is_torch_available() or is_tf_available(): 103 | for name, model, *features in export_models_list: 104 | if features: 105 | feature_config_mapping = { 106 | feature: FeaturesManager.get_config(name, feature) for _ in features for feature in _ 107 | } 108 | else: 109 | feature_config_mapping = FeaturesManager.get_supported_features_for_model_type(name) 110 | 111 | for feature, coreml_config_class_constructor in feature_config_mapping.items(): 112 | models_to_test.append((f"{name}_{feature}", name, model, feature, coreml_config_class_constructor)) 113 | return sorted(models_to_test) 114 | else: 115 | # Returning some dummy test that should not be ever called because of the @require_torch / @require_tf 116 | # decorators. 117 | # The reason for not returning an empty list is because parameterized.expand complains when it's empty. 118 | return [("dummy", "dummy", "dummy", "dummy", CoreMLConfig.from_model_config)] 119 | 120 | 121 | @require_coreml 122 | @require_macos 123 | class CoreMLExportTestCase(TestCase): 124 | """ 125 | Integration tests ensuring supported models are correctly exported 126 | """ 127 | 128 | def _coreml_export(self, test_name, name, model_name, feature, coreml_config_class_constructor): 129 | model_class = FeaturesManager.get_model_class_for_feature(feature) 130 | config = AutoConfig.from_pretrained(model_name) 131 | model = model_class.from_config(config) 132 | coreml_config = coreml_config_class_constructor(model.config) 133 | preprocessor = get_preprocessor(model_name) 134 | 135 | try: 136 | if feature in ["text2text-generation", "speech-seq2seq"]: 137 | coreml_config.seq2seq = "encoder" 138 | mlmodel = export( 139 | preprocessor, 140 | model, 141 | coreml_config, 142 | quantize="float32", 143 | ) 144 | validate_model_outputs( 145 | coreml_config, 146 | preprocessor, 147 | model, 148 | mlmodel, 149 | coreml_config.atol_for_validation, 150 | ) 151 | 152 | coreml_config.seq2seq = "decoder" 153 | mlmodel = export( 154 | preprocessor, 155 | model, 156 | coreml_config, 157 | quantize="float32", 158 | ) 159 | validate_model_outputs( 160 | coreml_config, 161 | preprocessor, 162 | model, 163 | mlmodel, 164 | coreml_config.atol_for_validation, 165 | ) 166 | else: 167 | mlmodel = export( 168 | preprocessor, 169 | model, 170 | coreml_config, 171 | quantize="float32", 172 | ) 173 | 174 | validate_model_outputs( 175 | coreml_config, 176 | preprocessor, 177 | model, 178 | mlmodel, 179 | coreml_config.atol_for_validation, 180 | ) 181 | except (RuntimeError, ValueError) as e: 182 | self.fail(f"{name}, {feature} -> {e}") 183 | 184 | @parameterized.expand(_get_models_to_test(PYTORCH_EXPORT_MODELS)) 185 | @slow 186 | @require_torch 187 | @require_vision 188 | def test_pytorch_export(self, test_name, name, model_name, feature, coreml_config_class_constructor): 189 | self._coreml_export(test_name, name, model_name, feature, coreml_config_class_constructor) 190 | 191 | @parameterized.expand(_get_models_to_test(PYTORCH_EXPORT_WITH_PAST_MODELS), skip_on_empty=True) 192 | @slow 193 | @require_torch 194 | def test_pytorch_export_with_past(self, test_name, name, model_name, feature, coreml_config_class_constructor): 195 | self._coreml_export(test_name, name, model_name, feature, coreml_config_class_constructor) 196 | 197 | @parameterized.expand(_get_models_to_test(PYTORCH_EXPORT_SEQ2SEQ_WITH_PAST_MODELS), skip_on_empty=True) 198 | @slow 199 | @require_torch 200 | def test_pytorch_export_seq2seq_with_past( 201 | self, test_name, name, model_name, feature, coreml_config_class_constructor 202 | ): 203 | self._coreml_export(test_name, name, model_name, feature, coreml_config_class_constructor) 204 | 205 | @parameterized.expand(_get_models_to_test(TENSORFLOW_EXPORT_DEFAULT_MODELS), skip_on_empty=True) 206 | @slow 207 | @require_tf 208 | @require_vision 209 | def test_tensorflow_export(self, test_name, name, model_name, feature, coreml_config_class_constructor): 210 | self._coreml_export(test_name, name, model_name, feature, coreml_config_class_constructor) 211 | 212 | @parameterized.expand(_get_models_to_test(TENSORFLOW_EXPORT_WITH_PAST_MODELS), skip_on_empty=True) 213 | @slow 214 | @require_tf 215 | def test_tensorflow_export_with_past(self, test_name, name, model_name, feature, coreml_config_class_constructor): 216 | self._coreml_export(test_name, name, model_name, feature, coreml_config_class_constructor) 217 | 218 | @parameterized.expand(_get_models_to_test(TENSORFLOW_EXPORT_SEQ2SEQ_WITH_PAST_MODELS), skip_on_empty=True) 219 | @slow 220 | @require_tf 221 | def test_tensorflow_export_seq2seq_with_past( 222 | self, test_name, name, model_name, feature, coreml_config_class_constructor 223 | ): 224 | self._coreml_export(test_name, name, model_name, feature, coreml_config_class_constructor) 225 | -------------------------------------------------------------------------------- /tests/testing_utils.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2022 The HuggingFace Team. All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import unittest 17 | import importlib.util 18 | from transformers.utils.versions import importlib_metadata 19 | 20 | 21 | _coreml_available = importlib.util.find_spec("coremltools") is not None 22 | try: 23 | _coreml_version = importlib_metadata.version("coremltools") 24 | 25 | from coremltools.models.utils import _is_macos, _macos_version 26 | _macos_available = _is_macos() and _macos_version() >= (12, 0) 27 | 28 | except importlib_metadata.PackageNotFoundError: 29 | _coreml_available = False 30 | _macos_available = False 31 | 32 | def is_coreml_available(): 33 | return _coreml_available 34 | 35 | def is_macos_available(): 36 | return _macos_available 37 | 38 | def require_coreml(test_case): 39 | return unittest.skipUnless(is_coreml_available(), "test requires Core ML")(test_case) 40 | 41 | def require_macos(test_case): 42 | return unittest.skipUnless(is_macos_available(), "test requires macOS")(test_case) 43 | --------------------------------------------------------------------------------