├── .github └── dependabot.yml ├── .nojekyll ├── .readthedocs.yaml ├── License ├── README.md └── docs ├── _static └── llm-table.css ├── _templates └── flavors │ └── local │ ├── footer.jinja │ ├── header.jinja │ └── left-side-menu.jinja ├── ai_analyzer.rst ├── app_development.rst ├── conf.py ├── examples.rst ├── getstartex.rst ├── gpu └── ryzenai_gpu.rst ├── hybrid_oga.rst ├── icons.txt ├── images └── rai-sw.png ├── index.rst ├── inst.rst ├── licenses.rst ├── llm ├── high_level_python.rst ├── overview.rst └── server_interface.rst ├── model_quantization.rst ├── modelcompat.rst ├── modelrun.rst ├── npu_oga.rst ├── oga_model_prepare.rst ├── rai_linux.rst ├── relnotes.rst ├── ryzen_ai_libraries.rst ├── sphinx ├── requirements.in └── requirements.txt └── xrt_smi.rst /.github/dependabot.yml: -------------------------------------------------------------------------------- 1 | # To get started with Dependabot version updates, you'll need to specify which 2 | # package ecosystems to update and where the package manifests are located. 3 | # Please see the documentation for all configuration options: 4 | # https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates 5 | 6 | version: 2 7 | updates: 8 | - package-ecosystem: "pip" # See documentation for possible values 9 | directory: "/docs/sphinx" # Location of package manifests 10 | open-pull-requests-limit: 10 11 | schedule: 12 | interval: "daily" 13 | target-branch: "develop" 14 | labels: 15 | - "dependencies" 16 | reviewers: 17 | - "samjwu" 18 | -------------------------------------------------------------------------------- /.nojekyll: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/amd/ryzen-ai-documentation/de198aa9295c834055eb64b0d47796dafec63203/.nojekyll -------------------------------------------------------------------------------- /.readthedocs.yaml: -------------------------------------------------------------------------------- 1 | # Read the Docs configuration file 2 | # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details 3 | 4 | version: 2 5 | 6 | sphinx: 7 | configuration: docs/conf.py 8 | 9 | formats: [htmlzip, pdf, epub] 10 | 11 | python: 12 | install: 13 | - requirements: docs/sphinx/requirements.txt 14 | 15 | build: 16 | os: ubuntu-22.04 17 | tools: 18 | python: "3.8" 19 | -------------------------------------------------------------------------------- /License: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024, Advanced Micro Devices, Inc. All rights reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 6 | 7 | The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 5 |

Ryzen AI Software

4 |
6 | 7 | # Ryzen AI Software 8 | 9 | Version 1.4 10 | 11 | # License 12 | Ryzen AI is licensed under [MIT License](https://github.com/amd/ryzen-ai-documentation/blob/main/License). Refer to the [LICENSE File](https://github.com/amd/ryzen-ai-documentation/blob/main/License) for the full license text and copyright notice. 13 | 14 | # Please Read: Important Legal Notices 15 | The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or 16 | otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED "AS IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY 17 | DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY 18 | PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF 19 | AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 20 | 21 | ## AUTOMOTIVE APPLICATIONS DISCLAIMER 22 | AUTOMOTIVE PRODUCTS (IDENTIFIED AS "XA" IN THE PART NUMBER) ARE NOT WARRANTED FOR USE IN THE DEPLOYMENT OF AIRBAGS OR FOR USE IN APPLICATIONS 23 | THAT AFFECT CONTROL OF A VEHICLE ("SAFETY APPLICATION") UNLESS THERE IS A SAFETY CONCEPT OR REDUNDANCY FEATURE CONSISTENT WITH THE ISO 26262 AUTOMOTIVE SAFETY STANDARD ("SAFETY DESIGN"). CUSTOMER SHALL, PRIOR TO USING OR DISTRIBUTING ANY SYSTEMS THAT INCORPORATE PRODUCTS, THOROUGHLY TEST SUCH SYSTEMS FOR SAFETY PURPOSES. USE OF PRODUCTS IN A SAFETY APPLICATION WITHOUT A SAFETY DESIGN IS FULLY AT THE RISK OF CUSTOMER, SUBJECT ONLY TO APPLICABLE LAWS AND REGULATIONS GOVERNING LIMITATIONS ON PRODUCT LIABILITY. 24 | 25 | ## Copyright 26 | 27 | © Copyright 2024 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, Ryzen, Vitis AI, and combinations thereof are trademarks of Advanced Micro Devices, 28 | Inc. AMBA, AMBA Designer, Arm, ARM1176JZ-S, CoreSight, Cortex, PrimeCell, Mali, and MPCore are trademarks of Arm Limited in the US and/or elsewhere. PCI, PCIe, and PCI Express are trademarks of PCI-SIG and used under license. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. 29 | -------------------------------------------------------------------------------- /docs/_static/llm-table.css: -------------------------------------------------------------------------------- 1 | /* Software Stack Table */ 2 | 3 | .center-table { 4 | margin-left: auto; 5 | margin-right: auto; 6 | text-align: center; 7 | } 8 | 9 | .center-table th, 10 | .center-table td { 11 | border: 1px solid #ffffff; /* Adds vertical and horizontal lines */ 12 | } 13 | 14 | /* Supported Lemonade LLMs Table */ 15 | 16 | /* Vertical lines for the first and second position in the first row */ 17 | .llm-table thead tr:nth-of-type(1) th:nth-of-type(1), 18 | .llm-table thead tr:nth-of-type(1) th:nth-of-type(2) { 19 | border-right: 1px solid #ffffff; /* White color for the border */ 20 | } 21 | 22 | /* Vertical lines for the first and second position in the second row */ 23 | .llm-table thead tr:nth-of-type(2) th:nth-of-type(1), 24 | .llm-table thead tr:nth-of-type(2) th:nth-of-type(3) { 25 | border-right: 1px solid #ffffff; /* White color for the border */ 26 | } 27 | 28 | /* Vertical lines for the first and third position in all other rows */ 29 | .llm-table tbody tr td:nth-of-type(1), 30 | .llm-table tbody tr td:nth-of-type(3) { 31 | border-right: 1px solid #ffffff; /* White color for the border */ 32 | } 33 | 34 | /* Remove horizontal line between the two heading rows */ 35 | .llm-table thead tr:nth-of-type(1) th { 36 | border-bottom: none; 37 | } 38 | 39 | /* Supported DeepSeek LLMs Table */ 40 | 41 | /* Add vertical border to the right of the models column */ 42 | .deepseek-table td:nth-child(1) { 43 | border-right: 1px solid white; 44 | } 45 | 46 | /* Vertical lines for the first and second position in the first row */ 47 | .deepseek-table thead tr:nth-of-type(1) th:nth-of-type(1) { 48 | border-right: 1px solid #ffffff; /* White color for the border */ 49 | } 50 | 51 | /* Vertical lines for the first and second position in the second row */ 52 | .deepseek-table thead tr:nth-of-type(2) th:nth-of-type(1){ 53 | border-right: 1px solid #ffffff; /* White color for the border */ 54 | } 55 | 56 | /* Vertical lines for the first and third position in all other rows */ 57 | .deepseek-table tbody tr td:nth-of-type(1){ 58 | border-right: 1px solid #ffffff; /* White color for the border */ 59 | } 60 | 61 | /* Add vertical lines around the hybrid_oga cell */ 62 | .deepseek-table td:nth-child(2) { 63 | border-right: 1px solid white; 64 | } 65 | 66 | /* Remove right border between TTFT and TPS columns in the bottom two rows */ 67 | .deepseek-table tbody tr:nth-child(2) td:nth-child(2), 68 | .deepseek-table tbody tr:nth-child(3) td:nth-child(2) { 69 | border-right: none; 70 | } 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | -------------------------------------------------------------------------------- /docs/_templates/flavors/local/footer.jinja: -------------------------------------------------------------------------------- 1 | {% 2 | set license_link = ("Ryzen AI Licenses and Disclaimers", "") 3 | %} 4 | -------------------------------------------------------------------------------- /docs/_templates/flavors/local/header.jinja: -------------------------------------------------------------------------------- 1 | {% macro top_level_header(branch, latest_version, release_candidate_version) -%} 2 | Ryzen AI 3 | {%- endmacro -%} 4 | 5 | {% 6 | set nav_secondary_items = { 7 | "GitHub": theme_repository_url|replace("-docs", ""), 8 | "Community": "https://community.amd.com/t5/ai/ct-p/amd_ai", 9 | "Products": "https://www.amd.com/en/products/ryzen-ai" 10 | } 11 | %} 12 | -------------------------------------------------------------------------------- /docs/_templates/flavors/local/left-side-menu.jinja: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/amd/ryzen-ai-documentation/de198aa9295c834055eb64b0d47796dafec63203/docs/_templates/flavors/local/left-side-menu.jinja -------------------------------------------------------------------------------- /docs/ai_analyzer.rst: -------------------------------------------------------------------------------- 1 | ########### 2 | AI Analyzer 3 | ########### 4 | 5 | AMD AI Analyzer is a tool that supports analysis and visualization of model compilation and inference on Ryzen AI. The primary goal of the tool is to help users better understand how the models are processed by the hardware, and to identify performance bottlenecks that may be present during model inference. Using AI Analyzer, users can visualize graph and operator partitions between the NPU and CPU. 6 | 7 | Installation 8 | ~~~~~~~~~~~~ 9 | 10 | If you installed the Ryzen AI software using automatic installer, AI Analyzer is already installed in the conda environment. 11 | 12 | If you manually installed the software, you will need to install the AI Analyzer wheel file in your environment. 13 | 14 | 15 | .. code-block:: 16 | 17 | python -m pip install path\to\RyzenAI\installation\files\aianalyzer-.whl 18 | 19 | 20 | Enabling Profiling and Visualization 21 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 22 | 23 | Profiling and Visualization can be enabled by passing additional provider options to the ONNXRuntime Inference Session. An example is shown below: 24 | 25 | .. code-block:: 26 | 27 | provider_options = [{ 28 | 'config_file': 'vaip_config.json', 29 | 'cacheDir': str(cache_dir), 30 | 'cacheKey': 'modelcachekey', 31 | 'ai_analyzer_visualization': True, 32 | 'ai_analyzer_profiling': True, 33 | }] 34 | session = ort.InferenceSession(model.SerializeToString(), providers=providers, 35 | provider_options=provider_options) 36 | 37 | 38 | The ``ai_analyzer_profiling`` flag enables generation of artifacts related to the inference profile. The ``ai_analyzer_visualization`` flag enables generation of artifacts related to graph partitions and operator fusion. These artifacts are generated as .json files in the current run directory. 39 | 40 | AI Analyzer also supports native ONNX Runtime profiling, which can be used to analyze the parts of the session running on the CPU. Users can enable ONNX Runtime profiling through session options and pass it alongside the provider options as shown below: 41 | 42 | .. code-block:: 43 | 44 | # Configure session options for profiling 45 | sess_options = rt.SessionOptions() 46 | sess_options.enable_profiling = True 47 | 48 | provider_options = [{ 49 | 'config_file': 'vaip_config.json', 50 | 'cacheDir': str(cache_dir), 51 | 'cacheKey': 'modelcachekey', 52 | 'ai_analyzer_visualization': True, 53 | 'ai_analyzer_profiling': True, 54 | }] 55 | 56 | session = ort.InferenceSession(model.SerializeToString(), sess_options, providers=providers, 57 | provider_options=provider_options) 58 | 59 | 60 | Launching AI Analyzer 61 | ~~~~~~~~~~~~~~~~~~~~~ 62 | 63 | Once the artifacts are generated, `aianalyzer` can be invoked through the command line as follows: 64 | 65 | 66 | .. code-block:: 67 | 68 | aianalyzer 69 | 70 | 71 | **Positional Arguments** 72 | 73 | ``logdir``: Path to the folder containing generated artifacts 74 | 75 | Additional Options 76 | 77 | ``-v``, ``--version``: Show the version info and exit. 78 | 79 | ``-b ADDR``, ``--bind ADDR``: Hostname or IP address on which to listen, default is 'localhost'. 80 | 81 | ``-p PORT``, ``--port PORT``: TCP port on which to listen, default is '8000'. 82 | 83 | ``-n``, ``--no-browser``: Prevent the opening of the default url in the browser. 84 | 85 | ``-t TOKEN``, ``--token TOKEN``: Token used for authenticating first-time connections to the server. The default is to generate a new, random token. Setting to an empty string disables authentication altogether, which is NOT RECOMMENDED. 86 | 87 | 88 | 89 | Features 90 | ~~~~~~~~ 91 | 92 | AI Analyzer provides visibility into how your AI model is compiled and executed on Ryzen AI hardware. Its two main use cases are: 93 | 94 | 1. Analyzing how the model was partitioned and mapped onto Ryzen AI's CPU and NPU accelerator 95 | 2. Profiling model performance as it executes inferencing workloads 96 | 97 | When launched, the AI Analyzer server will scan the folder specified with the logdir argument and detect and load all files relevant to compilation and/or inferencing per the ai_analyzer_visualization and ai_anlayzer_profiling flags. 98 | 99 | You can instruct the AI Analyzer server to either start a browser on the same host or else return to you a URL that you can then load into a browser on any host. 100 | 101 | 102 | User Interface 103 | ~~~~~~~~~~~~~~ 104 | 105 | AI Analyzer has the following three sections as seen in the left-panel navigator 106 | 107 | 1. PARTITIONING - A breakdown of your model was assigned to execute inference across CPU and NPU 108 | 2. NPU INSIGHTS - A detailed look at the how your model was optimized for inference execution on NPU 109 | 3. PERFORMANCE - A breakdown of inference execution through the model 110 | 111 | 112 | These sections are described in more detail below 113 | 114 | 115 | 116 | PARTITIONING 117 | @@@@@@@@@@@@ 118 | 119 | This section is comprised of two pages: Summary and Graph 120 | 121 | **Summary** 122 | 123 | The Summary page gives an overview of how the models operators have been assigned to Ryzen's CPU and NPU along with charts capturing GigaOp (GOP) offloading by operator type . 124 | 125 | There is also table titled "CPU Because" that shows the reasons why certain operators were not offloaded to the NPU. 126 | 127 | **Graph** 128 | 129 | The graph page shows an interactive diagram of the partitioned ONNX model, showing graphically how the layers are assigned to the Ryzen hardware. 130 | 131 | 132 | 133 | Toolbar 134 | 135 | - You can choose to show/hide individual NPU partitions, if any, with the "Filter by Partition" button 136 | - A panel that displays properties for selected objects can be shown or hidden via the "Show Properties" toggle button 137 | - The model table can be shown and hidden via the "Show Table" toggle button. 138 | - Settings 139 | 140 | - Show Processor will separate operators that run on CPU and NPU respectively 141 | - Show Partition will separate operators running on the NPU by their respective NPU partition, if any 142 | - Show Instance Name will display the full hierarchical name for the operators in the ONNX model 143 | 144 | All objects in the graph have properties which can be viewed to the right of the graph. 145 | 146 | 147 | 148 | *Model Table* 149 | 150 | This table below the graph lists all objects in the partitioned ONNX model: 151 | 152 | - Processor (NPU or CPU) 153 | - Function (Layer) 154 | - Operator 155 | - Ports 156 | - NPU Partitions 157 | 158 | 159 | NPU INSIGHTS 160 | @@@@@@@@@@@@ 161 | 162 | This section is comprised of three pages: Summary, Original Graph, and Optimized Graph. 163 | 164 | 165 | 166 | **Summary** 167 | 168 | The Summary page gives an overview of how your model was mapped to the AMD Ryzen NPU. Charts are displayed showing statistics on the number of operators and total GMACs that have been mapped to the NPU (and if necessary, back to CPU via the "Failsafe CPU" mechanism). The statistics are shown per operator type and per NPU partition. 169 | 170 | 171 | 172 | **Original Graph** 173 | 174 | This is an interactive graph representing your model lowered to supported NPU primitive operators, and broken up into partitions if necessary. As with the PARTITIONING graph, there is a companion table containing all of the model elements that will cross-probe to the graph view. The objects in the graph and table will also cross-probe to the PARTITIONING graph. 175 | 176 | Toolbar 177 | 178 | You can choose to show/hide individual NPU partitions, if any, with the "Filter by Partition" button 179 | A panel that displays properties for selected objects can be shown or hidden via the "Show Properties" toggle button 180 | A code viewer showing the MLIR source code with cross-probing can be shown/hidden via the "Show Code View" button 181 | The table below can be shown and hidden via the "Show Table" toggle button. 182 | Display options for the graph can be accessed with the "Settings" button 183 | 184 | 185 | 186 | 187 | **Optimized Graph** 188 | 189 | This page shows the final model that will be mapped to the NPU after all transformations and optimizations such as fusion and chaining. It will also report the operators that had to be moved back to the CPU via the "Failsafe CPU" mechanism. As usual, there is a companion table below that contains all of the graph's elements, and cross-selection is supported to and from the PARTITIONING graph and the Original Graph. 190 | 191 | Toolbar 192 | 193 | You can choose to show/hide individual NPU partitions, if any, with the "Filter by Partition" button 194 | A panel that displays properties for selected objects can be shown or hidden via the "Show Properties" toggle button 195 | The table below can be shown and hidden via the "Show Table" toggle button. 196 | Display options for the graph can be accessed with the "Settings" button 197 | 198 | 199 | PERFORMANCE 200 | @@@@@@@@@@@ 201 | 202 | This section is used to view the performance of your model on RyzenAI when running one or more inferences. It is comprised of two pages: Summary and Timeline. 203 | 204 | 205 | 206 | **Summary** 207 | 208 | The performance summary page shows several overall statistics on the inference(s) as well as charts breaking down operator runtime by operator. If you run with ONNX runtime profiler enabled, you will see overall inference time including layers that run on the CPU. If you have NPU profiling enabled via the ai_analyzer_profiling flag, you will see numerous NPU-based statistics, including GOP and MAC efficiency and a chart of runtime per NPU operator type. 209 | 210 | The clock frequency field shows the assumed NPU clock frequency, but it can be edited. If you change the frequency, all timestamp data that is collected as clock cycles but displayed in time units will be adjusted accordingly. 211 | 212 | 213 | **Timeline** 214 | 215 | The Performance timeline shows a layer-by-layer breakdown of your model's execution. The upper section is a graphical depiction of layer execution across a timeline, while the lower section shows the same information in tabular format. It is important to note that the Timeline page shows one inference at a time, so if you have captured profiling data for two or more inferences, you can choose which one to display with the "Inferences" chooser. 216 | 217 | 218 | 219 | Within each inference, you can examine the overall model execution or the detailed NPU execution data by using the "Partition" chooser. 220 | 221 | 222 | 223 | Toolbar 224 | 225 | A panel that displays properties for selected objects can be shown or hidden via the "Show Properties" toggle button 226 | The table below can be shown and hidden via the "Show Table" toggle button. 227 | The graphical timeline can be downloaded to SVG via the "Export to SVG" button 228 | 229 | 230 | .. 231 | ------------ 232 | 233 | ##################################### 234 | License 235 | ##################################### 236 | 237 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 238 | 239 | -------------------------------------------------------------------------------- /docs/app_development.rst: -------------------------------------------------------------------------------- 1 | .. include:: /icons.txt 2 | 3 | ####################### 4 | Application Development 5 | ####################### 6 | 7 | This page captures requirements and recommendations for developers looking to create, package and distribute applications targeting NPU-enabled AMD processors. 8 | 9 | 10 | 11 | .. _driver-compatibility: 12 | 13 | ************************************* 14 | VitisAI EP / NPU Driver Compatibility 15 | ************************************* 16 | 17 | The VitisAI EP requires a compatible version of the NPU drivers. For each version of the VitisAI EP, compatible drivers are bounded by a minimum version and a maximum release date. NPU drivers are backward compatible with VitisAI EP released up to 3 years before. The maximum driver release date is therefore set to 3 years after the release date of the corresponding VitisAI EP. 18 | 19 | The table below summarizes the driver requirements for the different versions of the VitisAI EP. 20 | 21 | .. list-table:: 22 | :header-rows: 1 23 | 24 | * - VitisAI EP version 25 | - Minimum NPU Driver version 26 | - Maximum NPU Driver release date 27 | * - 1.4 28 | - 32.0.203.257 29 | - March 25th, 2028 30 | * - 1.3.1 31 | - 32.0.203.242 32 | - January 17th, 2028 33 | * - 1.3 34 | - 32.0.203.237 35 | - November 26th, 2027 36 | * - 1.2 37 | - 32.0.201.204 38 | - July 30th, 2027 39 | 40 | The application must check that NPU drivers compatible with the version of the Vitis AI EP being used are installed. 41 | 42 | .. _apu-types: 43 | 44 | ***************** 45 | APU Types 46 | ***************** 47 | 48 | The Ryzen AI Software supports different types of NPU-enabled APUs. These APU types are referred to as PHX, HPT, STX and KRK. 49 | 50 | To programmatically determine the type of the local APU, it is possible to enumerate the PCI devices and check for an instance with a matching Hardware ID. 51 | 52 | .. list-table:: 53 | :header-rows: 1 54 | 55 | * - Vendor 56 | - Device 57 | - Revision 58 | - APU Type 59 | * - 0x1022 60 | - 0x1502 61 | - 0x00 62 | - PHX or HPT 63 | * - 0x1022 64 | - 0x17F0 65 | - 0x00 66 | - STX 67 | * - 0x1022 68 | - 0x17F0 69 | - 0x10 70 | - STX 71 | * - 0x1022 72 | - 0x17F0 73 | - 0x11 74 | - STX 75 | * - 0x1022 76 | - 0x17F0 77 | - 0x20 78 | - KRK 79 | 80 | The application must check that it is running on an AMD processor with an NPU, and that the NPU type is supported by the version of the Vitis AI EP being used. 81 | 82 | 83 | 84 | ************************************ 85 | Application Development Requirements 86 | ************************************ 87 | 88 | ONNX-RT Session 89 | =============== 90 | 91 | The application should only use the Vitis AI Execution Provider if the following conditions are met: 92 | 93 | - The application is running on an AMD processor with an NPU type supported by the version of the Vitis AI EP being used. See :ref:`list ` above in this page. 94 | - NPU drivers compatible with the version of the Vitis AI EP being used are installed. See :ref:`compatibility table ` above in this page. 95 | 96 | |memo| **NOTE**: Sample C++ code implementing the compatibility checks to be performed before using the VitisAI EP is provided here: https://github.com/amd/RyzenAI-SW/tree/main/utilities/npu_check 97 | 98 | 99 | VitisAI EP Provider Options 100 | =========================== 101 | 102 | For INT8 models, the application should detect which type of APU is present (PHX/HPT/STX/KRK) and set the ``xclbin`` provider option accordingly. Refer to the section about :ref:`compilation of INT8 models ` for details about this. 103 | 104 | For BF16 models, the application should set the ``config_file`` provider option to use the same file as the one which was used to precompile the BF16 model. Refer to the section about :ref:`compilation of BF16 models ` for details about this. 105 | 106 | 107 | Cache Management 108 | ================ 109 | 110 | Cache directories generated by the Vitis AI Execution Provider should not be reused across different versions of the Vitis AI EP or across different version of the NPU drivers. 111 | 112 | The application should check the version of the Vitis AI EP and of the NPU drivers. If the application detects a version change, it should delete the cache, or create a new cache directory with a different name. 113 | 114 | 115 | Pre-Compiled Models 116 | =================== 117 | 118 | The deployment version of the VitisAI Execution Provider (EP) does not support the on-the-fly compilation of BF16 models. Applications utilizing BF16 models must include pre-compiled versions of these models. The VitisAI EP can then load the pre-compiled models and deploy them efficiently on the NPU. 119 | 120 | Although including pre-compiled versions of INT8 models is not mandatory, it is beneficial as it reduces session creation time and enhances the end-user experience. 121 | 122 | | 123 | 124 | ********************************** 125 | Application Packaging Requirements 126 | ********************************** 127 | 128 | |excl| **IMPORTANT**: A patched version of the ``%RYZEN_AI_INSTALLATION_PATH%\deployment`` folder is available for download at the following link: `Download Here `_. This patched ``deployment`` folder is designed to replace the one included in the official installation of Ryzen AI 1.4. The following instructions assume that the original ``deployment`` folder has been replaced with the updated version. 129 | 130 | A C++ application built on the Ryzen AI ONNX Runtime requires the following components to be included in its distribution package. 131 | 132 | .. rubric:: For INT8 models 133 | 134 | - DLLs: 135 | 136 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\dyn_dispatch_core.dll 137 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime.dll 138 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_shared.dll 139 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_vitisai.dll 140 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_vitisai_ep.dll 141 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\transaction.dll 142 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\xclbin.dll 143 | 144 | - NPU Binary files (.xclbin) from the ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins`` folder 145 | 146 | - Recommended but not mandatory: pre-compiled models in the form of :ref:`Vitis AI EP cache folders ` or :ref:`Onnx Runtime EP context models ` 147 | 148 | .. rubric:: For BF16 models 149 | 150 | - DLLs: 151 | 152 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\dyn_dispatch_core.dll 153 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime.dll 154 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_shared.dll 155 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_vitisai.dll 156 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_vitisai_ep.dll 157 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\transaction.dll 158 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\xclbin.dll 159 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\flexmlrt\\flexmlrt.dll 160 | 161 | - Pre-compiled models in the form of :ref:`Vitis AI EP cache folders ` 162 | 163 | .. rubric:: For Hybrid LLMs 164 | 165 | - DLLs: 166 | 167 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\hybrid-llm\\onnx_custom_ops.dll 168 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\hybrid-llm\\onnxruntime-genai.dll 169 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\hybrid-llm\\ryzen_mm.dll 170 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\hybrid-llm\\ryzenai_onnx_utils.dll 171 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\DirectML.dll 172 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime.dll 173 | 174 | .. rubric:: For NPU-only LLMs 175 | 176 | - DLLs: 177 | 178 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\npu-llm\\onnxruntime-genai.dll 179 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_vitis_ai_custom_ops.dll 180 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_shared.dll 181 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_vitisai_ep.dll 182 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\dyn_dispatch_core.dll 183 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_vitisai.dll 184 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\transaction.dll 185 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime.dll 186 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\xclbin.dll 187 | 188 | - VAIP LLM configuration file: %RYZEN_AI_INSTALLATION_PATH%\\deployment\\npu-llm\\vaip_llm.json 189 | 190 | 191 | .. 192 | ------------ 193 | 194 | ##################################### 195 | License 196 | ##################################### 197 | 198 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 199 | -------------------------------------------------------------------------------- /docs/conf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Configuration file for the Sphinx documentation builder. 5 | # 6 | # This file does only contain a selection of the most common options. For a 7 | # full list see the documentation: 8 | # http://www.sphinx-doc.org/en/master/config 9 | 10 | # -- Path setup -------------------------------------------------------------- 11 | 12 | # If extensions (or modules to document with autodoc) are in another directory, 13 | # add these directories to sys.path here. If the directory is relative to the 14 | # documentation root, use os.path.abspath to make it absolute, like shown here. 15 | # 16 | import os 17 | import sys 18 | import urllib.parse 19 | # import recommonmark 20 | # from recommonmark.transform import AutoStructify 21 | # from recommonmark.parser import CommonMarkParser 22 | 23 | # sys.path.insert(0, os.path.abspath('.')) 24 | sys.path.insert(0, os.path.abspath('_ext')) 25 | sys.path.insert(0, os.path.abspath('docs')) 26 | 27 | # -- Project information ----------------------------------------------------- 28 | 29 | project = 'Ryzen AI Software' 30 | copyright = '2023-2024, Advanced Micro Devices, Inc' 31 | author = 'Advanced Micro Devices, Inc' 32 | 33 | # The short X.Y version 34 | version = '1.4' 35 | # The full version, including alpha/beta/rc tags 36 | release = '1.4' 37 | html_last_updated_fmt = 'March 24, 2025' 38 | 39 | 40 | # -- General configuration --------------------------------------------------- 41 | 42 | html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "ryzenai.docs.amd.com") 43 | html_context = {} 44 | if os.environ.get("READTHEDOCS", "") == "True": 45 | html_context["READTHEDOCS"] = True 46 | 47 | # If your documentation needs a minimal Sphinx version, state it here. 48 | # 49 | # needs_sphinx = '1.0' 50 | 51 | # Add any Sphinx extension module names here, as strings. They can be 52 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 53 | # ones. 54 | extensions = [ 55 | 'sphinx.ext.graphviz', 56 | 'breathe', 57 | 'sphinx.ext.autodoc', 58 | 'sphinx.ext.doctest', 59 | 'sphinx.ext.intersphinx', 60 | 'sphinx.ext.todo', 61 | 'sphinx.ext.coverage', 62 | 'sphinx.ext.mathjax', 63 | 'sphinx.ext.ifconfig', 64 | 'sphinx.ext.viewcode', 65 | 'sphinx.ext.githubpages', 66 | 'linuxdoc.rstFlatTable', 67 | "notfound.extension", 68 | #'recommonmark', 69 | #'sphinx_markdown_tables', 70 | #'edit_on_github', 71 | # Auto-generate section labels. 72 | #'sphinx.ext.autosectionlabel', 73 | #'rst2pdf.pdfbuilder' 74 | ] 75 | 76 | graphviz_output_format = 'svg' 77 | 78 | # Prefix document path to section labels, otherwise autogenerated labels would look like 'heading' 79 | # rather than 'path/to/file:heading' 80 | autosectionlabel_prefix_document = True 81 | 82 | 83 | 84 | # Breathe Configuration 85 | breathe_projects = { 86 | "XRT":"../xml", 87 | } 88 | 89 | 90 | 91 | # Configuration for rst2pdf 92 | pdf_documents = [('index', u'', u'', u'AMD, Inc.'),] 93 | # index - master document 94 | # rst2pdf - name of the file that will be created 95 | # Sample rst2pdf doc - title of the pdf 96 | # Your Name - author name in the pdf 97 | 98 | 99 | # Configure 'Edit on GitHub' extension 100 | edit_on_github_project = '/amd/ryzen-ai-documentation' 101 | edit_on_github_branch = 'main/docs' 102 | 103 | # Add any paths that contain templates here, relative to this directory. 104 | templates_path = ['_templates'] 105 | 106 | # Expand/Collapse functionality 107 | def setup(app): 108 | app.add_css_file('custom.css') 109 | app.add_css_file("llm-table.css") 110 | 111 | 112 | # The suffix(es) of source filenames. 113 | # You can specify multiple suffix as a list of string: 114 | # 115 | # source_suffix = ['.rst', '.md'] 116 | source_suffix = { 117 | '.rst': 'restructuredtext', 118 | #'.txt': 'restructuredtext', 119 | '.md': 'markdown', 120 | } 121 | 122 | # For MD support 123 | source_parsers = { 124 | #'.md': CommonMarkParser, 125 | # myst_parser testing 126 | #'.md': 127 | } 128 | 129 | # The master toctree document. 130 | master_doc = 'index' 131 | 132 | # The language for content autogenerated by Sphinx. Refer to documentation 133 | # for a list of supported languages. 134 | # 135 | # This is also used if you do content translation via gettext catalogs. 136 | # Usually you set "language" from the command line for these cases. 137 | language = 'en' 138 | 139 | # List of patterns, relative to source directory, that match files and 140 | # directories to ignore when looking for source files. 141 | # This patterns also effect to html_static_path and html_extra_path 142 | exclude_patterns = ['include', 'api_rst', '_build', 'Thumbs.db', '.DS_Store'] 143 | 144 | # The name of the Pygments (syntax highlighting) style to use. 145 | pygments_style = 'sphinx' 146 | 147 | # If true, `todo` and `todoList` produce output, else they produce nothing. 148 | todo_include_todos = False 149 | 150 | primary_domain = 'c' 151 | highlight_language = 'none' 152 | 153 | 154 | # -- Options for HTML output ------------------------------------------------- 155 | 156 | # The theme to use for HTML and HTML Help pages. See the documentation for 157 | # a list of builtin themes. 158 | # 159 | ##html_theme = 'karma_sphinx_theme' 160 | html_theme = 'rocm_docs_theme' 161 | ##html_theme_path = ["./_themes"] 162 | 163 | 164 | # Theme options are theme-specific and customize the look and feel of a theme 165 | # further. For a list of options available for each theme, see the 166 | # documentation. 167 | # 168 | ##html_logo = '_static/xilinx-header-logo.svg' 169 | html_theme_options = { 170 | "link_main_doc": False, 171 | "flavor": "local" 172 | } 173 | 174 | # Add any paths that contain custom static files (such as style sheets) here, 175 | # relative to this directory. They are copied after the builtin static files, 176 | # so a file named "default.css" will overwrite the builtin "default.css". 177 | html_static_path = ["_static"] 178 | html_css_files = ["_static/llm-table.css"] 179 | 180 | # Custom sidebar templates, must be a dictionary that maps document names 181 | # to template names. 182 | # 183 | # The default sidebars (for documents that don't match any pattern) are 184 | # defined by theme itself. Builtin themes are using these templates by 185 | # default: ``['localtoc.html', 'relations.html', 'sourcelink.html', 186 | # 'searchbox.html']``. 187 | # 188 | #html_sidebars = { 189 | # '**': [ 190 | # 'about.html', 191 | # 'navigation.html', 192 | # 'relations.html', 193 | # 'searchbox.html', 194 | # 'donate.html', 195 | # ]} 196 | 197 | 198 | # -- Options for HTMLHelp output --------------------------------------------- 199 | 200 | # Output file base name for HTML help builder. 201 | htmlhelp_basename = 'ProjectName' 202 | 203 | 204 | # -- Options for LaTeX output ------------------------------------------------ 205 | latex_engine = 'pdflatex' 206 | latex_elements = { 207 | # The paper size ('letterpaper' or 'a4paper'). 208 | # 209 | 'papersize': 'letterpaper', 210 | 211 | # The font size ('10pt', '11pt' or '12pt'). 212 | # 213 | 'pointsize': '12pt', 214 | 215 | # Additional stuff for the LaTeX preamble. 216 | # 217 | # 'preamble': '', 218 | 219 | # Latex figure (float) alignment 220 | # 221 | # 'figure_align': 'htbp', 222 | } 223 | 224 | # Grouping the document tree into LaTeX files. List of tuples 225 | # (source start file, target name, title, 226 | # author, documentclass [howto, manual, or own class]). 227 | latex_documents = [ 228 | (master_doc, 'ryzenai.tex', 'Ryzen AI', 229 | 'AMD', 'manual'), 230 | ] 231 | 232 | 233 | # -- Options for manual page output ------------------------------------------ 234 | 235 | # One entry per manual page. List of tuples 236 | # (source start file, name, description, authors, manual section). 237 | man_pages = [ 238 | (master_doc, 'ryzenai.tex', 'Ryzen AI', 239 | [author], 1) 240 | ] 241 | 242 | 243 | # -- Options for Texinfo output ---------------------------------------------- 244 | 245 | # Grouping the document tree into Texinfo files. List of tuples 246 | # (source start file, target name, title, author, 247 | # dir menu entry, description, category) 248 | texinfo_documents = [ 249 | (master_doc, 'ENTER YOUR LIBRARY ID HERE. FOR EXAMPLE: xfopencv', 'ENTER YOUR LIBRARY PROJECT NAME HERE', 250 | author, 'AMD', 'One line description of project.', 251 | 'Miscellaneous'), 252 | ] 253 | 254 | 255 | # -- Options for Epub output ------------------------------------------------- 256 | 257 | # Bibliographic Dublin Core info. 258 | epub_title = project 259 | 260 | # The unique identifier of the text. This can be a ISBN number 261 | # or the project homepage. 262 | # 263 | # epub_identifier = '' 264 | 265 | # A unique identification for the text. 266 | # 267 | # epub_uid = '' 268 | 269 | # A list of files that should not be packed into the epub file. 270 | epub_exclude_files = ['search.html'] 271 | 272 | 273 | 274 | 275 | # -- Options for rinoh ------------------------------------------ 276 | 277 | 278 | rinoh_documents = [dict(doc='index', # top-level file (index.rst) 279 | target='manual')] # output file (manual.pdf) 280 | 281 | 282 | 283 | # -- Notfound (404) extension settings 284 | 285 | if "READTHEDOCS" in os.environ: 286 | components = urllib.parse.urlparse(os.environ["READTHEDOCS_CANONICAL_URL"]) 287 | notfound_urls_prefix = components.path 288 | 289 | 290 | # -- Extension configuration ------------------------------------------------- 291 | # At the bottom of conf.py 292 | #def setup(app): 293 | # app.add_config_value('recommonmark_config', { 294 | # 'url_resolver': lambda url: github_doc_root + url, 295 | # 'auto_toc_tree_section': 'Contents', 296 | # }, True) 297 | # app.add_transform(AutoStructify) 298 | 299 | 300 | ################################################################################# 301 | #License 302 | #Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 303 | -------------------------------------------------------------------------------- /docs/examples.rst: -------------------------------------------------------------------------------- 1 | ########################## 2 | Examples, Demos, Tutorials 3 | ########################## 4 | 5 | This page introduces various demos, examples, and tutorials currently available with the Ryzen™ AI Software. 6 | 7 | ************************* 8 | Getting Started Tutorials 9 | ************************* 10 | 11 | NPU 12 | ~~~ 13 | 14 | - The :doc:`Getting Started Tutorial ` deploys a custom ResNet model demonstrating: 15 | 16 | - Pretrained model conversion to ONNX 17 | - Quantization using AMD Quark quantizer 18 | - Deployment using ONNX Runtime C++ and Python code 19 | 20 | - `Hello World Jupyter Notebook Tutorial `_ 21 | 22 | - New BF16 Model examples: 23 | 24 | - `Image Classification `_ 25 | - `Finetuned DistilBERT for Text Classification `_ 26 | - `Text Embedding Model Alibaba-NLP/gte-large-en-v1.5 `_ 27 | 28 | iGPU 29 | ~~~~ 30 | 31 | - `ResNet50 on iGPU `_ 32 | 33 | 34 | ************************************ 35 | Other examples, demos, and tutorials 36 | ************************************ 37 | 38 | - Refer to `RyzenAI-SW repo `_ 39 | 40 | 41 | 42 | .. 43 | ------------ 44 | 45 | ##################################### 46 | License 47 | ##################################### 48 | 49 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /docs/getstartex.rst: -------------------------------------------------------------------------------- 1 | :orphan: 2 | 3 | ######################## 4 | Getting Started Tutorial 5 | ######################## 6 | 7 | This tutorial uses a fine-tuned version of the ResNet model (using the CIFAR-10 dataset) to demonstrate the process of preparing, quantizing, and deploying a model using Ryzen AI Software. The tutorial features deployment using both Python and C++ ONNX runtime code. 8 | 9 | .. note:: 10 | In this documentation, "NPU" is used in descriptions, while "IPU" is retained in some of the tool's language, code, screenshots, and commands. This intentional 11 | distinction aligns with existing tool references and does not affect functionality. Avoid making replacements in the code. 12 | 13 | - The source code files can be downloaded from `this link `_. Alternatively, you can clone the RyzenAI-SW repo and change the directory into "tutorial". 14 | 15 | .. code-block:: 16 | 17 | git clone https://github.com/amd/RyzenAI-SW.git 18 | cd tutorial/getting_started_resnet 19 | 20 | | 21 | 22 | The following are the steps and the required files to run the example: 23 | 24 | .. list-table:: 25 | :widths: 20 25 25 26 | :header-rows: 1 27 | 28 | * - Steps 29 | - Files Used 30 | - Description 31 | * - Installation 32 | - ``requirements.txt`` 33 | - Install the necessary package for this example. 34 | * - Preparation 35 | - ``prepare_model_data.py``, 36 | ``resnet_utils.py`` 37 | - The script ``prepare_model_data.py`` prepares the model and the data for the rest of the tutorial. 38 | 39 | 1. To prepare the model the script converts pre-trained PyTorch model to ONNX format. 40 | 2. To prepare the necessary data the script downloads and extracts CIFAR-10 dataset. 41 | 42 | * - Pretrained model 43 | - ``models/resnet_trained_for_cifar10.pt`` 44 | - The ResNet model trained using CIFAR-10 is provided in .pt format. 45 | * - Quantization 46 | - ``resnet_quantize.py`` 47 | - Convert the model to the NPU-deployable model by performing Post-Training Quantization flow using AMD Quark Quantization. 48 | * - Deployment - Python 49 | - ``predict.py`` 50 | - Run the Quantized model using the ONNX Runtime code. We demonstrate running the model on both CPU and NPU. 51 | * - Deployment - C++ 52 | - ``cpp/resnet_cifar/.`` 53 | - This folder contains the source code ``resnet_cifar.cpp`` that demonstrates running inference using C++ APIs. We additionally provide the infrastructure (required libraries, CMake files and header files) required by the example. 54 | 55 | 56 | | 57 | | 58 | 59 | ************************ 60 | Step 1: Install Packages 61 | ************************ 62 | 63 | * Ensure that the Ryzen AI Software is correctly installed. For more details, see the :doc:`installation instructions `. 64 | 65 | * Use the conda environment created during the installation for the rest of the steps. This example requires a couple of additional packages. Run the following command to install them: 66 | 67 | 68 | .. code-block:: 69 | 70 | python -m pip install -r requirements.txt 71 | 72 | | 73 | | 74 | 75 | 76 | ************************************** 77 | Step 2: Prepare dataset and ONNX model 78 | ************************************** 79 | 80 | In this example, we utilize a custom ResNet model finetuned using the CIFAR-10 dataset 81 | 82 | The ``prepare_model_data.py`` script downloads the CIFAR-10 dataset in pickle format (for python) and binary format (for C++). This dataset will be used in the subsequent steps for quantization and inference. The script also exports the provided PyTorch model into ONNX format. The following snippet from the script shows how the ONNX model is exported: 83 | 84 | .. code-block:: 85 | 86 | dummy_inputs = torch.randn(1, 3, 32, 32) 87 | input_names = ['input'] 88 | output_names = ['output'] 89 | dynamic_axes = {'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}} 90 | tmp_model_path = str(models_dir / "resnet_trained_for_cifar10.onnx") 91 | torch.onnx.export( 92 | model, 93 | dummy_inputs, 94 | tmp_model_path, 95 | export_params=True, 96 | opset_version=13, 97 | input_names=input_names, 98 | output_names=output_names, 99 | dynamic_axes=dynamic_axes, 100 | ) 101 | 102 | Note the following settings for the onnx conversion: 103 | 104 | - Ryzen AI supports a batch size=1, so dummy input is fixed to a batch_size =1 during model conversion 105 | - Recommended ``opset_version`` setting 13 is used. 106 | 107 | Run the following command to prepare the dataset and export the ONNX model: 108 | 109 | .. code-block:: 110 | 111 | python prepare_model_data.py 112 | 113 | * The downloaded CIFAR-10 dataset is saved in the current directory at the following location: ``data/*``. 114 | * The ONNX model is generated at models/resnet_trained_for_cifar10.onnx 115 | 116 | | 117 | | 118 | 119 | ************************** 120 | Step 3: Quantize the Model 121 | ************************** 122 | 123 | Quantizing AI models from floating-point to 8-bit integers reduces computational power and the memory footprint required for inference. This example utilizes Quark for ONNX quantizer workflow. Quark takes the pre-trained float32 model from the previous step (``resnet_trained_for_cifar10.onnx``) and provides a quantized model. 124 | 125 | .. code-block:: 126 | 127 | python resnet_quantize.py 128 | 129 | This generates a quantized model using QDQ quant format and generate Quantized model with default configuration. After the completion of the run, the quantized ONNX model ``resnet_quantized.onnx`` is saved to models/resnet_quantized.onnx 130 | 131 | The :file:`resnet_quantize.py` file has ``ModelQuantizer::quantize_model`` function that applies quantization to the model. 132 | 133 | .. code-block:: 134 | 135 | from quark.onnx.quantization.config import (Config, get_default_config) 136 | from quark.onnx import ModelQuantizer 137 | 138 | # Get quantization configuration 139 | quant_config = get_default_config("XINT8") 140 | config = Config(global_quant_config=quant_config) 141 | 142 | # Create an ONNX quantizer 143 | quantizer = ModelQuantizer(config) 144 | 145 | # Quantize the ONNX model 146 | quantizer.quantize_model(input_model_path, output_model_path, dr) 147 | 148 | The parameters of this function are: 149 | 150 | * **input_model_path**: (String) The file path of the model to be quantized. 151 | * **output_model_path**: (String) The file path where the quantized model is saved. 152 | * **dr**: (Object or None) Calibration data reader that enumerates the calibration data and producing inputs for the original model. In this example, CIFAR10 dataset is used for calibration during the quantization process. 153 | 154 | 155 | | 156 | | 157 | 158 | ************************ 159 | Step 4: Deploy the Model 160 | ************************ 161 | 162 | We demonstrate deploying the quantized model using both Python and C++ APIs. 163 | 164 | * :ref:`Deployment - Python ` 165 | * :ref:`Deployment - C++ ` 166 | 167 | .. note:: 168 | During the Python and C++ deployment, the compiled model artifacts are saved in the cache folder named ``/modelcachekey``. Ryzen-AI does not support the complied model artifacts across the versions, so if the model artifacts exist from the previous software version, ensure to delete the folder ``modelcachekey`` before the deployment steps. 169 | 170 | 171 | .. _dep-python: 172 | 173 | Deployment - Python 174 | =========================== 175 | 176 | The ``predict.py`` script is used to deploy the model. It extracts the first ten images from the CIFAR-10 test dataset and converts them to the .png format. The script then reads all those ten images and classifies them by running the quantized custom ResNet model on CPU or NPU. 177 | 178 | Deploy the Model on the CPU 179 | ---------------------------- 180 | 181 | By default, ``predict.py`` runs the model on CPU. 182 | 183 | .. code-block:: 184 | 185 | python predict.py 186 | 187 | Typical output 188 | 189 | .. code-block:: 190 | 191 | Image 0: Actual Label cat, Predicted Label cat 192 | Image 1: Actual Label ship, Predicted Label ship 193 | Image 2: Actual Label ship, Predicted Label airplane 194 | Image 3: Actual Label airplane, Predicted Label airplane 195 | Image 4: Actual Label frog, Predicted Label frog 196 | Image 5: Actual Label frog, Predicted Label frog 197 | Image 6: Actual Label automobile, Predicted Label automobile 198 | Image 7: Actual Label frog, Predicted Label frog 199 | Image 8: Actual Label cat, Predicted Label cat 200 | Image 9: Actual Label automobile, Predicted Label automobile 201 | 202 | 203 | Deploy the Model on the Ryzen AI NPU 204 | ------------------------------------ 205 | 206 | To successfully run the model on the NPU, run the following setup steps: 207 | 208 | - Ensure ``RYZEN_AI_INSTALLATION_PATH`` points to ``path\to\ryzen-ai-sw-\``. If you installed Ryzen-AI software using the MSI installer, this variable should already be set. Ensure that the Ryzen-AI software package has not been moved post installation, in which case ``RYZEN_AI_INSTALLATION_PATH`` will have to be set again. 209 | 210 | - By default, the Ryzen AI Conda environment automatically sets the standard binary for all inference sessions through the ``XLNX_VART_FIRMWARE`` environment variable. However, explicitly passing the xclbin option in provider_options overrides the default setting. 211 | 212 | .. code-block:: 213 | 214 | parser = argparse.ArgumentParser() 215 | parser.add_argument('--ep', type=str, default ='cpu',choices = ['cpu','npu'], help='EP backend selection') 216 | opt = parser.parse_args() 217 | 218 | providers = ['CPUExecutionProvider'] 219 | provider_options = [{}] 220 | 221 | if opt.ep == 'npu': 222 | providers = ['VitisAIExecutionProvider'] 223 | cache_dir = Path(__file__).parent.resolve() 224 | provider_options = [{ 225 | 'cacheDir': str(cache_dir), 226 | 'cacheKey': 'modelcachekey', 227 | 'xclbin': 'path/to/xclbin' 228 | }] 229 | 230 | session = ort.InferenceSession(model.SerializeToString(), providers=providers, 231 | provider_options=provider_options) 232 | 233 | 234 | Run the ``predict.py`` with the ``--ep npu`` switch to run the custom ResNet model on the Ryzen AI NPU: 235 | 236 | 237 | .. code-block:: 238 | 239 | python predict.py --ep npu 240 | 241 | Typical output 242 | 243 | .. code-block:: 244 | 245 | [Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50% 246 | [Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1 247 | ... 248 | Image 0: Actual Label cat, Predicted Label cat 249 | Image 1: Actual Label ship, Predicted Label ship 250 | Image 2: Actual Label ship, Predicted Label ship 251 | Image 3: Actual Label airplane, Predicted Label airplane 252 | Image 4: Actual Label frog, Predicted Label frog 253 | Image 5: Actual Label frog, Predicted Label frog 254 | Image 6: Actual Label automobile, Predicted Label truck 255 | Image 7: Actual Label frog, Predicted Label frog 256 | Image 8: Actual Label cat, Predicted Label cat 257 | Image 9: Actual Label automobile, Predicted Label automobile 258 | 259 | 260 | .. _dep-cpp: 261 | 262 | Deployment - C++ 263 | =========================== 264 | 265 | Prerequisites 266 | ------------- 267 | 268 | 1. Visual Studio 2022 Community edition, ensure "Desktop Development with C++" is installed 269 | 2. cmake (version >= 3.26) 270 | 3. opencv (version=4.6.0) required for the custom resnet example 271 | 272 | Install OpenCV 273 | -------------- 274 | 275 | It is recommended to build OpenCV from the source code and use static build. The default installation location is "\install" , the following instruction installs OpenCV in the location "C:\\opencv" as an example. You may first change the directory to where you want to clone the OpenCV repository. 276 | 277 | .. code-block:: bash 278 | 279 | git clone https://github.com/opencv/opencv.git -b 4.6.0 280 | cd opencv 281 | cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_CONFIGURATION_TYPES=Release -A x64 -T host=x64 -G "Visual Studio 17 2022" "-DCMAKE_INSTALL_PREFIX=C:\opencv" "-DCMAKE_PREFIX_PATH=C:\opencv" -DCMAKE_BUILD_TYPE=Release -DBUILD_opencv_python2=OFF -DBUILD_opencv_python3=OFF -DBUILD_WITH_STATIC_CRT=OFF -B build 282 | cmake --build build --config Release 283 | cmake --install build --config Release 284 | 285 | The build files will be written to ``build\``. 286 | 287 | Build and Run Custom Resnet C++ sample 288 | -------------------------------------- 289 | 290 | The C++ source files, CMake list files and related artifacts are provided in the ``cpp/resnet_cifar/*`` folder. The source file ``cpp/resnet_cifar/resnet_cifar.cpp`` takes 10 images from the CIFAR-10 test set, converts them to .png format, preprocesses them, and performs model inference. The example has onnxruntime dependencies, that are provided in ``%RYZEN_AI_INSTALLATION_PATH%/onnxruntime/*``. 291 | 292 | Run the following command to build the resnet example. Assign ``-DOpenCV_DIR`` to the OpenCV build directory. 293 | 294 | .. code-block:: bash 295 | 296 | cd getting_started_resnet/cpp 297 | cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_CONFIGURATION_TYPES=Release -A x64 -T host=x64 -DCMAKE_INSTALL_PREFIX=. -DCMAKE_PREFIX_PATH=. -B build -S resnet_cifar -DOpenCV_DIR="C:/opencv/build" -G "Visual Studio 17 2022" 298 | 299 | This should generate the build directory with the ``resnet_cifar.sln`` solution file along with other project files. Open the solution file using Visual Studio 2022 and build to compile. You can also use "Developer Command Prompt for VS 2022" to open the solution file in Visual Studio. 300 | 301 | .. code-block:: bash 302 | 303 | devenv build/resnet_cifar.sln 304 | 305 | Now to deploy our model, we will go back to the parent directory (getting_started_resnet) of this example. After compilation, the executable should be generated in ``cpp/build/Release/resnet_cifar.exe``. We will copy this application over to the parent directory: 306 | 307 | .. code-block:: bash 308 | 309 | cd .. 310 | xcopy cpp\build\Release\resnet_cifar.exe . 311 | 312 | Additionally, we will also need to copy the onnxruntime DLLs from the Vitis AI Execution Provider package to the current directory. The following commands copy the required files in the current directory: 313 | 314 | .. code-block:: bash 315 | 316 | xcopy %RYZEN_AI_INSTALLATION_PATH%\onnxruntime\bin\* /E /I 317 | 318 | 319 | The C++ application that was generated takes 3 arguments: 320 | 321 | #. Path to the quantized ONNX model generated in Step 3 322 | #. The execution provider of choice (cpu or NPU) 323 | #. vaip_config.json (pass None if running on CPU) 324 | 325 | 326 | Deploy the Model on the CPU 327 | **************************** 328 | 329 | To run the model on the CPU, use the following command: 330 | 331 | .. code-block:: bash 332 | 333 | resnet_cifar.exe models\resnet_quantized.onnx cpu 334 | 335 | Typical output: 336 | 337 | .. code-block:: bash 338 | 339 | model name:models\resnet_quantized.onnx 340 | ep:cpu 341 | Input Node Name/Shape (1): 342 | input : -1x3x32x32 343 | Output Node Name/Shape (1): 344 | output : -1x10 345 | Final results: 346 | Predicted label is cat and actual label is cat 347 | Predicted label is ship and actual label is ship 348 | Predicted label is ship and actual label is ship 349 | Predicted label is airplane and actual label is airplane 350 | Predicted label is frog and actual label is frog 351 | Predicted label is frog and actual label is frog 352 | Predicted label is truck and actual label is automobile 353 | Predicted label is frog and actual label is frog 354 | Predicted label is cat and actual label is cat 355 | Predicted label is automobile and actual label is automobile 356 | 357 | Deploy the Model on the NPU 358 | **************************** 359 | 360 | To successfully run the model on the NPU: 361 | 362 | - Ensure ``RYZEN_AI_INSTALLATION_PATH`` points to ``path\to\ryzen-ai-sw-\``. If you installed Ryzen-AI software using the MSI installer, this variable should already be set. Ensure that the Ryzen-AI software package has not been moved post installation, in which case ``RYZEN_AI_INSTALLATION_PATH`` will have to be set again. 363 | 364 | - By default, the Ryzen AI Conda environment automatically sets the standard binary for all inference sessions through the ``XLNX_VART_FIRMWARE`` environment variable. However, explicitly passing the xclbin option in provider_options overrides the default setting. 365 | 366 | The following code block from ``reset_cifar.cpp`` shows how ONNX Runtime is configured to deploy the model on the Ryzen AI NPU: 367 | 368 | .. code-block:: bash 369 | 370 | auto session_options = Ort::SessionOptions(); 371 | 372 | auto cache_dir = std::filesystem::current_path().string(); 373 | 374 | if(ep=="npu") 375 | { 376 | auto options = 377 | std::unordered_map{ {"cacheDir", cache_dir}, {"cacheKey", "modelcachekey"}, {"xclbin", "path/to/xclbin"}}; 378 | session_options.AppendExecutionProvider_VitisAI(options) 379 | } 380 | 381 | auto session = Ort::Session(env, model_name.data(), session_options); 382 | 383 | To run the model on the NPU, we will pass the npu flag and the vaip_config.json file as arguments to the C++ application. Use the following command to run the model on the NPU: 384 | 385 | .. code-block:: bash 386 | 387 | resnet_cifar.exe models\resnet_quantized.onnx npu 388 | 389 | Typical output: 390 | 391 | .. code-block:: 392 | 393 | [Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50% 394 | [Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1 395 | ... 396 | Final results: 397 | Predicted label is cat and actual label is cat 398 | Predicted label is ship and actual label is ship 399 | Predicted label is ship and actual label is ship 400 | Predicted label is airplane and actual label is airplane 401 | Predicted label is frog and actual label is frog 402 | Predicted label is frog and actual label is frog 403 | Predicted label is truck and actual label is automobile 404 | Predicted label is frog and actual label is frog 405 | Predicted label is cat and actual label is cat 406 | Predicted label is automobile and actual label is automobile 407 | .. 408 | ------------ 409 | 410 | ##################################### 411 | License 412 | ##################################### 413 | 414 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 415 | -------------------------------------------------------------------------------- /docs/gpu/ryzenai_gpu.rst: -------------------------------------------------------------------------------- 1 | ########################### 2 | DirectML Flow 3 | ########################### 4 | 5 | ************* 6 | Prerequisites 7 | ************* 8 | 9 | - DirectX12 capable Windows OS (Windows 11 recommended) 10 | - Latest AMD `GPU device driver `_ installed 11 | - `Microsoft Olive `_ for model conversion and optimization 12 | - Latest `ONNX Runtime DirectML EP `_ 13 | 14 | You can ensure GPU driver and DirectX version from ``Windows Task Manager`` -> ``Performance`` -> ``GPU`` 15 | 16 | ****************************** 17 | Running models on Ryzen AI GPU 18 | ****************************** 19 | 20 | Running models on the Ryzen AI GPU is accomplished in two simple steps: 21 | 22 | **Model Conversion and Optimization**: After the model is trained, Microsoft Olive Optimizer can be used to convert the model to ONNX and optimize it for optimal target execution. 23 | 24 | For additional information, refer to the `Microsoft Olive Documentation `_ 25 | 26 | 27 | **Deployment**: Once the model is in the ONNX format, the ONNX Runtime DirectML EP (``DmlExecutionProvider``) is used to run the model on the AMD Ryzen AI GPU. 28 | 29 | For additional information, refer to the `ONNX Runtime documentation for the DirectML Execution Provider `_ 30 | 31 | 32 | ******** 33 | Examples 34 | ******** 35 | 36 | - Optimizing and running `ResNet on Ryzen AI GPU `_ 37 | 38 | 39 | ******************** 40 | Additional Resources 41 | ******************** 42 | 43 | 44 | - Article on how AMD and Black Magic Design worked together to accelerate `Davinci Resolve Studio `_ workload on AMD hardware: 45 | 46 | - `AI Accelerated Video Editing with DaVinci Resolve 18.6 & AMD Radeon Graphics `_ 47 | 48 | | 49 | 50 | - Blog posts on using the Ryzen AI Software for various generative AI workloads on GPU: 51 | 52 | - `Automatic1111 Stable Diffusion WebUI with DirectML Extension on AMD GPUs `_ 53 | 54 | - `Running Optimized Llama2 with Microsoft DirectML on AMD Radeon Graphics `_ 55 | 56 | - `AI-Assisted Mobile Workstation Workflows Powered by AMD Ryzen™ AI `_ 57 | -------------------------------------------------------------------------------- /docs/hybrid_oga.rst: -------------------------------------------------------------------------------- 1 | ############################ 2 | OnnxRuntime GenAI (OGA) Flow 3 | ############################ 4 | 5 | Ryzen AI Software supports deploying LLMs on Ryzen AI PCs using the native ONNX Runtime Generate (OGA) C++ or Python API. The OGA API is the lowest-level API available for building LLM applications on a Ryzen AI PC. This documentation covers the Hybrid execution mode for LLMs, which utilizes both the NPU and GPU 6 | 7 | **Note**: Refer to :doc:`npu_oga` for NPU only execution mode. 8 | 9 | ************************ 10 | Supported Configurations 11 | ************************ 12 | 13 | The Ryzen AI OGA flow supports Strix and Krackan Point processors. Phoenix (PHX) and Hawk (HPT) processors are not supported. 14 | 15 | 16 | ************ 17 | Requirements 18 | ************ 19 | 20 | - Install NPU Drivers and Ryzen AI MSI installer according to the :doc:`inst` 21 | - Install GPU device driver: Ensure GPU device driver https://www.amd.com/en/support is installed 22 | - Install Git for Windows (needed to download models from HF): https://git-scm.com/downloads 23 | 24 | ******************** 25 | Pre-optimized Models 26 | ******************** 27 | 28 | AMD provides a set of pre-optimized LLMs ready to be deployed with Ryzen AI Software and the supporting runtime for hybrid execution. These models can be found on Hugging Face: 29 | 30 | - https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid 31 | - https://huggingface.co/amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-fp16-onnx-hybrid 32 | - https://huggingface.co/amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid 33 | - https://huggingface.co/amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-fp16-onnx-hybrid 34 | - https://huggingface.co/amd/chatglm3-6b-awq-g128-int4-asym-fp16-onnx-hybrid 35 | - https://huggingface.co/amd/Llama-2-7b-hf-awq-g128-int4-asym-fp16-onnx-hybrid 36 | - https://huggingface.co/amd/Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid 37 | - https://huggingface.co/amd/Llama-3-8B-awq-g128-int4-asym-fp16-onnx-hybrid/tree/main 38 | - https://huggingface.co/amd/Llama-3.1-8B-awq-g128-int4-asym-fp16-onnx-hybrid/tree/main 39 | - https://huggingface.co/amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid 40 | - https://huggingface.co/amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid 41 | - https://huggingface.co/amd/Mistral-7B-Instruct-v0.1-hybrid 42 | - https://huggingface.co/amd/Mistral-7B-Instruct-v0.2-hybrid 43 | - https://huggingface.co/amd/Mistral-7B-v0.3-hybrid 44 | - https://huggingface.co/amd/Llama-3.1-8B-Instruct-hybrid 45 | - https://huggingface.co/amd/CodeLlama-7b-instruct-g128-hybrid 46 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-hybrid 47 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Qwen-1.5B-awq-asym-uint4-g128-lmhead-onnx-hybrid 48 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Qwen-7B-awq-asym-uint4-g128-lmhead-onnx-hybrid 49 | - https://huggingface.co/amd/AMD-OLMo-1B-SFT-DPO-hybrid 50 | - https://huggingface.co/amd/Qwen2-7B-awq-uint4-asym-g128-lmhead-fp16-onnx-hybrid 51 | - https://huggingface.co/amd/Qwen2-1.5B-awq-uint4-asym-global-g128-lmhead-g32-fp16-onnx-hybrid 52 | - https://huggingface.co/amd/gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid 53 | 54 | 55 | The steps for deploying the pre-optimized models using Python or C++ are described in the following sections. 56 | 57 | ****************************** 58 | Hybrid Execution of OGA Models 59 | ****************************** 60 | 61 | Setup 62 | ===== 63 | 64 | Activate the Ryzen AI 1.4 Conda environment: 65 | 66 | .. code-block:: 67 | 68 | conda activate ryzen-ai-1.4.0 69 | 70 | Copy the required files in a local folder to run the LLMs from: 71 | 72 | .. code-block:: 73 | 74 | mkdir hybrid_run 75 | cd hybrid_run 76 | xcopy /Y /E "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\onnxruntime_genai\benchmark" . 77 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\amd_genai_prompt.txt" . 78 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\onnxruntime-genai.dll" . 79 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\onnx_custom_ops.dll" . 80 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\ryzen_mm.dll" . 81 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\ryzenai_onnx_utils.dll" . 82 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\DirectML.dll" . 83 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime.dll" . 84 | 85 | Download Models from HuggingFace 86 | ================================ 87 | 88 | Download the desired models from the list of pre-optimized models on Hugging Face: 89 | 90 | .. code-block:: 91 | 92 | # Make sure you have git-lfs installed (https://git-lfs.com) 93 | git lfs install 94 | git clone 95 | 96 | For example, for Llama-2-7b-chat: 97 | 98 | .. code-block:: 99 | 100 | git lfs install 101 | git clone https://huggingface.co/amd/Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid 102 | 103 | 104 | Enabling Performance Mode (Optional) 105 | ==================================== 106 | 107 | To run the LLMs in the best performance mode, follow these steps: 108 | 109 | - Go to ``Windows`` → ``Settings`` → ``System`` → ``Power`` and set the power mode to Best Performance. 110 | - Execute the following commands in the terminal: 111 | 112 | .. code-block:: 113 | 114 | cd C:\Windows\System32\AMD 115 | xrt-smi configure --pmode performance 116 | 117 | 118 | Sample C++ Program 119 | ================== 120 | 121 | The ``model_benchmark.exe`` test application provides a simple mechanism for running and evaluating Hybrid OGA models using the native OGA C++ APIs. The source code for this application can be used a reference implementation for how to integrate LLMs using the native OGA C++ APIs. 122 | 123 | The ``model_benchmark.exe`` test application can be used as follows: 124 | 125 | .. code-block:: 126 | 127 | # To see available options and default settings 128 | .\model_benchmark.exe -h 129 | 130 | # To run with default settings 131 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file -l $list_of_prompt_lengths 132 | 133 | # To show more informational output 134 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file --verbose 135 | 136 | # To run with given number of generated tokens 137 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file -l $list_of_prompt_lengths -g $num_tokens 138 | 139 | # To run with given number of warmup iterations 140 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file -l $list_of_prompt_lengths -w $num_warmup 141 | 142 | # To run with given number of iterations 143 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file -l $list_of_prompt_lengths -r $num_iterations 144 | 145 | 146 | For example, for Llama-2-7b-chat: 147 | 148 | .. code-block:: 149 | 150 | .\model_benchmark.exe -i Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid -f amd_genai_prompt.txt -l "1024" --verbose 151 | 152 | | 153 | 154 | **NOTE**: The C++ source code for the ``model_benchmark.exe`` executable can be found in the ``%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\c`` folder. This source code can be modified and recompiled if necessary using the commands below. 155 | 156 | .. code-block:: 157 | 158 | :: Copy project files 159 | xcopy /E /I "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\c" .\sources 160 | xcopy /E /I "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\onnxruntime_genai\include" .\sources\include 161 | xcopy /E /I "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\onnxruntime_genai\lib" .\sources\lib 162 | 163 | :: Build project 164 | cd sources 165 | cmake -G "Visual Studio 17 2022" -A x64 -S . -B build 166 | cmake --build build --config Release 167 | 168 | :: Copy runtime DLLs 169 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\onnxruntime-genai.dll" .\build\Release 170 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\onnx_custom_ops.dll" .\build\Release 171 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\ryzen_mm.dll" .\build\Release 172 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\ryzenai_onnx_utils.dll" .\build\Release 173 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\DirectML.dll" .\build\Release 174 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime.dll" .\build\Release 175 | 176 | 177 | Sample Python Scripts 178 | ===================== 179 | 180 | To run LLMs other than ChatGLM, use the following command: 181 | 182 | .. code-block:: 183 | 184 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\llama3\run_model.py" --model_dir 185 | 186 | To run ChatGLM, use the following command: 187 | 188 | .. code-block:: 189 | 190 | pip install transformers==4.44.0 191 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\chatglm\model-generate-chatglm3.py" --model 192 | 193 | 194 | For example, for Llama-2-7b-chat: 195 | 196 | .. code-block:: 197 | 198 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\llama3\run_model.py" --model_dir Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid 199 | 200 | 201 | *********************** 202 | Using Fine-Tuned Models 203 | *********************** 204 | 205 | It is also possible to run fine-tuned versions of the pre-optimized OGA models. 206 | 207 | To do this, the fine-tuned models must first be prepared for execution with the OGA Hybrid flow. For instructions on how to do this, refer to the page about :doc:`oga_model_prepare`. 208 | 209 | Once a fine-tuned model has been prepared for Hybrid execution, it can be deployed by following the steps described above in this page. 210 | -------------------------------------------------------------------------------- /docs/icons.txt: -------------------------------------------------------------------------------- 1 | 2 | .. |warning| unicode:: U+26A0 .. Warning Sign 3 | .. |excl| unicode:: U+2757 .. Heavy Exclamation Mark 4 | .. |pen| unicode:: U+270E .. Lower Right Pencil 5 | .. |bulb| unicode:: U+1F4A1 .. Electric Light Bulb 6 | .. |folder| unicode:: U+1F4C1 .. File folder 7 | .. |clipboard| unicode:: U+1F4CB .. Clipboard 8 | .. |pushpin| unicode:: U+1F4CC .. Pushpin 9 | .. |roundpin| unicode:: U+1F4CD .. Round Pushpin 10 | .. |memo| unicode:: U+1F4DD .. Memo 11 | .. |info| unicode:: U+1F6C8 .. Circled Information Source 12 | .. |checkmark| unicode:: U+2705 .. White Heavy Check Mark 13 | .. |crossmark| unicode:: U+274C .. Red Cross Mark 14 | -------------------------------------------------------------------------------- /docs/images/rai-sw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/amd/ryzen-ai-documentation/de198aa9295c834055eb64b0d47796dafec63203/docs/images/rai-sw.png -------------------------------------------------------------------------------- /docs/index.rst: -------------------------------------------------------------------------------- 1 | ########################## 2 | Ryzen AI Software 3 | ########################## 4 | 5 | AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs. Ryzen AI software enables applications to run on the neural processing unit (NPU) built in the AMD XDNA™ architecture, as well as on the integrated GPU. This allows developers to build and deploy models trained in PyTorch or TensorFlow and run them directly on laptops powered by Ryzen AI using ONNX Runtime and the Vitis™ AI Execution Provider (EP). 6 | 7 | .. image:: images/rai-sw.png 8 | :align: center 9 | 10 | *********** 11 | Quick Start 12 | *********** 13 | 14 | - :ref:`Supported Configurations ` 15 | - :doc:`inst` 16 | - :doc:`examples` 17 | 18 | ************************* 19 | Development Flow Overview 20 | ************************* 21 | 22 | The Ryzen AI development flow does not require any modifications to the existing model training processes and methods. The pre-trained model can be used as the starting point of the Ryzen AI flow. 23 | 24 | Quantization 25 | ============ 26 | Quantization involves converting the AI model's parameters from floating-point to lower-precision representations, such as bfloat16 floating-point or 8-bit integer. Quantized models are more power-efficient, utilize less memory, and offer better performance. 27 | 28 | **AMD Quark** is a comprehensive cross-platform deep learning toolkit designed to simplify and enhance the quantization of deep learning models. Supporting both PyTorch and ONNX models, Quark empowers developers to optimize their models for deployment on a wide range of hardware backends, achieving significant performance gains without compromising accuracy. 29 | 30 | For more details, refer to the :doc:`model_quantization` page. 31 | 32 | Compilation and Deployment 33 | ========================== 34 | The AI model is deployed using the ONNX Runtime with either C++ or Python APIs. The Vitis AI Execution Provider included in the ONNX Runtime intelligently determines what portions of the AI model should run on the NPU, optimizing workloads to ensure optimal performance with lower power consumption. 35 | 36 | For more details, refer to the :doc:`modelrun` page. 37 | 38 | 39 | | 40 | | 41 | 42 | 43 | .. toctree:: 44 | :maxdepth: 1 45 | :hidden: 46 | 47 | relnotes.rst 48 | 49 | 50 | .. toctree:: 51 | :maxdepth: 1 52 | :hidden: 53 | :caption: Getting Started on the NPU 54 | 55 | inst.rst 56 | examples.rst 57 | 58 | .. toctree:: 59 | :maxdepth: 1 60 | :hidden: 61 | :caption: Running Models on the NPU 62 | 63 | model_quantization.rst 64 | modelrun.rst 65 | app_development.rst 66 | 67 | .. toctree:: 68 | :maxdepth: 1 69 | :hidden: 70 | :caption: Running LLMs on the NPU 71 | 72 | llm/overview.rst 73 | llm/high_level_python.rst 74 | llm/server_interface.rst 75 | hybrid_oga.rst 76 | oga_model_prepare.rst 77 | 78 | .. toctree:: 79 | :maxdepth: 1 80 | :hidden: 81 | :caption: Running Models on the GPU 82 | 83 | gpu/ryzenai_gpu.rst 84 | 85 | .. toctree:: 86 | :maxdepth: 1 87 | :hidden: 88 | :caption: Additional Features 89 | 90 | xrt_smi.rst 91 | ai_analyzer.rst 92 | ryzen_ai_libraries.rst 93 | 94 | 95 | .. toctree:: 96 | :maxdepth: 1 97 | :hidden: 98 | :caption: Additional Topics 99 | 100 | Model Zoo 101 | Licensing Information 102 | 103 | 104 | 105 | .. 106 | ------------ 107 | ##################################### 108 | License 109 | ##################################### 110 | 111 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 112 | -------------------------------------------------------------------------------- /docs/inst.rst: -------------------------------------------------------------------------------- 1 | .. include:: /icons.txt 2 | 3 | ######################### 4 | Installation Instructions 5 | ######################### 6 | 7 | 8 | 9 | ************* 10 | Prerequisites 11 | ************* 12 | 13 | The Ryzen AI Software supports AMD processors with a Neural Processing Unit (NPU). Consult the release notes for the full list of :ref:`supported configurations `. 14 | 15 | The following dependencies must be present on the system before installing the Ryzen AI Software: 16 | 17 | .. list-table:: 18 | :widths: 25 25 19 | :header-rows: 1 20 | 21 | * - Dependencies 22 | - Version Requirement 23 | * - Windows 11 24 | - build >= 22621.3527 25 | * - Visual Studio 26 | - 2022 27 | * - cmake 28 | - version >= 3.26 29 | * - Anaconda or Miniconda 30 | - Latest version 31 | 32 | | 33 | 34 | |warning| **IMPORTANT**: 35 | 36 | - Visual Studio 2022 Community: ensure that "Desktop Development with C++" is installed 37 | 38 | - Anaconda or Miniconda: ensure that the following path is set in the System PATH variable: ``path\to\anaconda3\Scripts`` or ``path\to\miniconda3\Scripts`` (The System PATH variable should be set in the *System Variables* section of the *Environment Variables* window). 39 | 40 | | 41 | 42 | .. _install-driver: 43 | 44 | ******************* 45 | Install NPU Drivers 46 | ******************* 47 | 48 | - Download the NPU driver installation package :download:`NPU Driver ` 49 | 50 | - Install the NPU drivers by following these steps: 51 | 52 | - Extract the downloaded ``NPU_RAI1.4_GA_257_WHQL.zip`` zip file. 53 | - Open a terminal in administrator mode and execute the ``.\npu_sw_installer.exe`` exe file. 54 | 55 | - Ensure that NPU MCDM driver (Version:32.0.203.257, Date:3/12/2025) is correctly installed by opening ``Device Manager`` -> ``Neural processors`` -> ``NPU Compute Accelerator Device``. 56 | 57 | 58 | .. _install-bundled: 59 | 60 | ************************* 61 | Install Ryzen AI Software 62 | ************************* 63 | 64 | - Download the RyzenAI Software installer :download:`ryzen-ai-1.4.0.exe `. 65 | 66 | - Launch the MSI installer and follow the instructions on the installation wizard: 67 | 68 | - Accept the terms of the Licence agreement 69 | - Provide the destination folder for Ryzen AI installation (default: ``C:\Program Files\RyzenAI\1.4.0``) 70 | - Specify the name for the conda environment (default: ``ryzen-ai-1.4.0``) 71 | 72 | 73 | The Ryzen AI Software packages are now installed in the conda environment created by the installer. 74 | 75 | 76 | .. _quicktest: 77 | 78 | 79 | ********************* 80 | Test the Installation 81 | ********************* 82 | 83 | The Ryzen AI Software installation folder contains test to verify that the software is correctly installed. This installation test can be found in the ``quicktest`` subfolder. 84 | 85 | - Open a Conda command prompt (search for "Anaconda Prompt" in the Windows start menu) 86 | 87 | - Activate the Conda environment created by the Ryzen AI installer: 88 | 89 | .. code-block:: 90 | 91 | conda activate 92 | 93 | - Run the test: 94 | 95 | .. code-block:: 96 | 97 | cd %RYZEN_AI_INSTALLATION_PATH%/quicktest 98 | python quicktest.py 99 | 100 | 101 | - The quicktest.py script sets up the environment and runs a simple CNN model. On a successful run, you will see an output similar to the one shown below. This indicates that the model is running on the NPU and that the installation of the Ryzen AI Software was successful: 102 | 103 | .. code-block:: 104 | 105 | [Vitis AI EP] No. of Operators : CPU 2 NPU 398 106 | [Vitis AI EP] No. of Subgraphs : NPU 1 Actually running on NPU 1 107 | ... 108 | Test Passed 109 | ... 110 | 111 | 112 | .. note:: 113 | 114 | The full path to the Ryzen AI Software installation folder is stored in the ``RYZEN_AI_INSTALLATION_PATH`` environment variable. 115 | 116 | 117 | ************************** 118 | Other Installation Options 119 | ************************** 120 | 121 | Linux Installer 122 | ~~~~~~~~~~~~~~~ 123 | 124 | Compiling BF16 models requires more processing power than compiling INT8 models. If a larger BF16 model cannot be compiled on a Windows machine due to hardware limitations (e.g., insufficient RAM), an alternative Linux-based compilation flow is supported. More details can be found here: :doc:`rai_linux` 125 | 126 | 127 | 128 | Lightweight Installer 129 | ~~~~~~~~~~~~~~~~~~~~~ 130 | 131 | A lightweight installer is available with reduced features. It cannot be used for compiling BF16 models but fully supports compiling and running INT8 models and running LLM models. 132 | 133 | - Download the RyzenAI Software Runtime MSI installer :download:`ryzen-ai-rt-1.4.0.msi `. 134 | 135 | - Launch the MSI installer and follow the instructions on the installation wizard: 136 | 137 | - Accept the terms of the Licence agreement 138 | - Provide the destination folder for Ryzen AI installation 139 | - Specify the name for the conda environment 140 | 141 | 142 | 143 | .. 144 | ------------ 145 | 146 | ##################################### 147 | License 148 | ##################################### 149 | 150 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 151 | -------------------------------------------------------------------------------- /docs/licenses.rst: -------------------------------------------------------------------------------- 1 | Licensing Information 2 | ===================== 3 | 4 | Ryzen AI is released by Advanced Micro Devices, Inc. (AMD) and is subject to the licensing terms listed below. Some components may include third-party software that is subject to additional licenses. Review the following links for more information: 5 | 6 | - `AMD End User License Agreement `_ 7 | - `Third Party End User License Agreement `_ 8 | -------------------------------------------------------------------------------- /docs/llm/high_level_python.rst: -------------------------------------------------------------------------------- 1 | .. Heading guidelines 2 | .. # with overline, for parts 3 | .. * with overline, for chapters 4 | .. =, for sections 5 | .. -, for subsections 6 | .. ^, for subsubsections 7 | .. “, for paragraphs 8 | 9 | ##################### 10 | High-Level Python SDK 11 | ##################### 12 | 13 | A Python environment offers flexibility for experimenting with LLMs, profiling them, and integrating them into Python applications. We use the `Lemonade SDK `_ to get up and running quickly. 14 | 15 | To get started, follow these instructions. 16 | 17 | *************************** 18 | System-level pre-requisites 19 | *************************** 20 | 21 | You only need to do this once per computer: 22 | 23 | 1. Make sure your system has the recommended Ryzen AI driver installed as described in :ref:`install-driver`. 24 | 2. Download and install `Miniconda for Windows `_. 25 | 3. Launch a terminal and call ``conda init``. 26 | 27 | 28 | ***************** 29 | Environment Setup 30 | ***************** 31 | 32 | To create and set up an environment, run these commands in your terminal: 33 | 34 | .. code-block:: bash 35 | 36 | conda create -n ryzenai-llm python=3.10 37 | conda activate ryzenai-llm 38 | pip install lemonade-sdk[llm-oga-hybrid] 39 | lemonade-install --ryzenai hybrid 40 | 41 | **************** 42 | Validation Tools 43 | **************** 44 | 45 | Now that you have completed installation, you can try prompting an LLM like this (where ``PROMPT`` is any prompt you like). 46 | 47 | Run this command in a terminal that has your environment activated: 48 | 49 | .. code-block:: bash 50 | 51 | lemonade -i amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 llm-prompt --max-new-tokens 64 -p PROMPT 52 | 53 | Each example linked in the :ref:`featured-llms` table also has `example commands `_ for validating the speed and accuracy of each model. 54 | 55 | ********** 56 | Python API 57 | ********** 58 | You can also run this code to try out the high-level Lemonade API in a Python script: 59 | 60 | .. code-block:: python 61 | 62 | from lemonade.api import from_pretrained 63 | 64 | model, tokenizer = from_pretrained( 65 | "amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid", recipe="oga-hybrid" 66 | ) 67 | 68 | input_ids = tokenizer("This is my prompt", return_tensors="pt").input_ids 69 | response = model.generate(input_ids, max_new_tokens=30) 70 | 71 | print(tokenizer.decode(response[0])) 72 | 73 | Each example linked in the :ref:`featured-llms` table also has an `example script `_ for streaming the text output of the LLM. 74 | 75 | ********** 76 | Next Steps 77 | ********** 78 | 79 | From here, you can check out `an example `_ or any of the other :ref:`featured-llms`. 80 | 81 | The examples pages also provide code for: 82 | 83 | #. Additional validation tools for measuring speed and accuracy. 84 | #. Streaming responses with the API. 85 | #. Integrating the API into applications. 86 | #. Launching the server interface from the Python environment. 87 | 88 | 89 | 90 | 91 | .. 92 | ------------ 93 | ##################################### 94 | License 95 | ##################################### 96 | 97 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 98 | -------------------------------------------------------------------------------- /docs/llm/overview.rst: -------------------------------------------------------------------------------- 1 | ######## 2 | Overview 3 | ######## 4 | 5 | ************************************ 6 | OGA-based Flow with Hybrid Execution 7 | ************************************ 8 | 9 | Ryzen AI Software supports deploying quantized 4-bit LLMs on Ryzen AI 300-series PCs. This solution uses a hybrid execution mode, which leverages both the NPU and integrated GPU (iGPU), and is built on the OnnxRuntime GenAI (OGA) framework. 10 | 11 | Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase. 12 | 13 | OGA is a multi-vendor generative AI framework from Microsoft that provides a convenient LLM interface for execution backends such as Ryzen AI. 14 | 15 | Supported Configurations 16 | ======================== 17 | 18 | - Only Ryzen AI 300-series Strix Point (STX) and Krackan Point (KRK) processors support OGA-based hybrid execution. 19 | - Developers with Ryzen AI 7000- and 8000-series processors can get started using the CPU-based examples linked in the :ref:`featured-llms` table. 20 | - Windows 11 is the required operating system. 21 | 22 | 23 | ******************************* 24 | Development Interfaces 25 | ******************************* 26 | 27 | The Ryzen AI LLM software stack is available through three development interfaces, each suited for specific use cases as outlined in the sections below. All three interfaces are built on top of native OnnxRuntime GenAI (OGA) libraries, as shown in the :ref:`software-stack-table` diagram below. 28 | 29 | The high-level Python APIs, as well as the Server Interface, also leverage the Lemonade SDK, which is multi-vendor open-source software that provides everything necessary for quickly getting started with LLMs on OGA. 30 | 31 | A key benefit of both OGA and Lemonade is that software developed against their interfaces is portable to many other execution backends. 32 | 33 | .. _software-stack-table: 34 | 35 | .. flat-table:: Ryzen AI Software Stack 36 | :header-rows: 1 37 | :class: center-table 38 | 39 | * - Your Python Application 40 | - Your LLM Stack 41 | - Your Native Application 42 | * - `Lemonade Python API* <#high-level-python-sdk>`_ 43 | - `Lemonade Server Interface* <#server-interface-rest-api>`_ 44 | - :rspan:`1` `OGA C++ Headers <../hybrid_oga.html>`_ 45 | * - :cspan:`1` `OGA Python API* `_ 46 | * - :cspan:`2` `Custom AMD OnnxRuntime GenAI (OGA) `_ 47 | * - :cspan:`2` `AMD Ryzen AI Driver and Hardware `_ 48 | 49 | \* indicates open-source software (OSS). 50 | 51 | High-Level Python SDK 52 | ===================== 53 | 54 | The high-level Python SDK, Lemonade, allows you to get started using PyPI installation in approximately 5 minutes. 55 | 56 | This SDK allows you to: 57 | 58 | - Experiment with models in hybrid execution mode on Ryzen AI hardware. 59 | - Validate inference speed and task performance. 60 | - Integrate with Python apps using a high-level API. 61 | 62 | To get started in Python, follow these instructions: :doc:`high_level_python`. 63 | 64 | 65 | Server Interface (REST API) 66 | =========================== 67 | 68 | The Server Interface provides a convenient means to integrate with applications that: 69 | 70 | - Already support an LLM server interface, such as the Ollama server or OpenAI API. 71 | - Are written in any language (C++, C#, Javascript, etc.) that supports REST APIs. 72 | - Benefits from process isolation for the LLM backend. 73 | 74 | To get started with the server interface, follow these instructions: :doc:`server_interface`. 75 | 76 | For example applications that have been tested with Lemonade Server, see the `Lemonade Server Examples `_. 77 | 78 | 79 | OGA APIs for C++ Libraries and Python 80 | ===================================== 81 | 82 | Native C++ libraries for OGA are available to give full customizability for deployment into native applications. 83 | 84 | The Python bindings for OGA also provide a customizable interface for Python development. 85 | 86 | To get started with the OGA APIs, follow these instructions: :doc:`../hybrid_oga`. 87 | 88 | 89 | .. _featured-llms: 90 | 91 | ******************************* 92 | Featured LLMs 93 | ******************************* 94 | 95 | The following tables contain a curated list of LLMs that have been validated on Ryzen AI hybrid execution mode. The hybrid examples are built on top of OnnxRuntime GenAI (OGA). 96 | 97 | The comprehensive set of pre-optimized models for hybrid execution used in these examples are available in the `AMD hybrid collection on Hugging Face `_. It is also possible to run fine-tuned versions of the models listed (for example, fine-tuned versions of Llama2 or Llama3). For instructions on how to prepare a fine-tuned OGA model for hybrid execution, refer to :doc:`../oga_model_prepare`. 98 | 99 | .. _ryzen-ai-oga-featured-llms: 100 | 101 | .. flat-table:: Ryzen AI OGA Featured LLMs 102 | :header-rows: 2 103 | :class: llm-table 104 | 105 | * - 106 | - :cspan:`1` CPU Baseline (HF bfloat16) 107 | - :cspan:`3` Ryzen AI Hybrid (OGA int4) 108 | * - Model 109 | - Example 110 | - Validation 111 | - Example 112 | - TTFT Speedup 113 | - Tokens/S Speedup 114 | - Validation 115 | 116 | * - `DeepSeek-R1-Distill-Qwen-7B `_ 117 | - `Link `__ 118 | - 🟢 119 | - `Link `__ 120 | - 3.4x 121 | - 8.4x 122 | - 🟢 123 | * - `DeepSeek-R1-Distill-Llama-8B `_ 124 | - `Link `__ 125 | - 🟢 126 | - `Link `__ 127 | - 4.2x 128 | - 7.6x 129 | - 🟢 130 | * - `Llama-3.2-1B-Instruct `_ 131 | - `Link `__ 132 | - 🟢 133 | - `Link `__ 134 | - 1.9x 135 | - 5.1x 136 | - 🟢 137 | * - `Llama-3.2-3B-Instruct `_ 138 | - `Link `__ 139 | - 🟢 140 | - `Link `__ 141 | - 2.8x 142 | - 8.1x 143 | - 🟢 144 | * - `Phi-3-mini-4k-instruct `_ 145 | - `Link `__ 146 | - 🟢 147 | - `Link `__ 148 | - 3.6x 149 | - 7.8x 150 | - 🟢 151 | * - `Qwen1.5-7B-Chat `_ 152 | - `Link `__ 153 | - 🟢 154 | - `Link `__ 155 | - 4.0x 156 | - 7.3x 157 | - 🟢 158 | * - `Mistral-7B-Instruct-v0.3 `_ 159 | - `Link `__ 160 | - 🟢 161 | - `Link `__ 162 | - 5.0x 163 | - 8.1x 164 | - 🟢 165 | * - `Llama-3.1-8B-Instruct `_ 166 | - `Link `__ 167 | - 🟢 168 | - `Link `__ 169 | - 3.9x 170 | - 8.9x 171 | - 🟢 172 | 173 | The :ref:`ryzen-ai-oga-featured-llms` table was compiled using validation, benchmarking, and accuracy metrics as measured by the `ONNX TurnkeyML v6.1.0 `_ ``lemonade`` commands in each example link. After this table was created, the Lemonade SDK moved to the new location found `here `_. 174 | 175 | Data collection details: 176 | 177 | * All validation, performance, and accuracy metrics are collected on the same system configuration: 178 | 179 | * System: HP OmniBook Ultra Laptop 14z 180 | * Processor: AMD Ryzen AI 9 HX 375 W/ Radeon 890M 181 | * Memory: 32GB of RAM 182 | 183 | * The Hugging Face ``transformers`` framework is used as the baseline implementation for speedup and accuracy comparisons. 184 | 185 | * The baseline checkpoint is the original ``safetensors`` Hugging Face checkpoint linked in each table row, in the ``bfloat16`` data type. 186 | 187 | * All speedup numbers are the measured performance of the model with input sequence length (ISL) of ``1024`` and output sequence length (OSL) of ``64``, on the specified backend, divided by the measured performance of the baseline. 188 | * We assign the 🟢 validation score based on this criteria: all commands in the example guide ran successfully. 189 | 190 | 191 | ************************************** 192 | OGA-based Flow with NPU-only Execution 193 | ************************************** 194 | 195 | The primary OGA-based flow for LLMs employs a hybrid execution mode which leverages both the NPU and iGPU. AMD also provides support for an OGA-based flow where the iGPU is not solicited and where the compute-intensive operations are exclusively offloaded to the NPU. 196 | 197 | The OGA-based NPU-only execution mode is supported on STX and KRK platforms. 198 | 199 | To get started with the OGA-based NPU-only execution mode, follow these instructions :doc:`../npu_oga`. 200 | 201 | 202 | 203 | 204 | .. 205 | ------------ 206 | 207 | ##################################### 208 | License 209 | ##################################### 210 | 211 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 212 | -------------------------------------------------------------------------------- /docs/llm/server_interface.rst: -------------------------------------------------------------------------------- 1 | .. Heading guidelines 2 | .. # with overline, for parts 3 | .. * with overline, for chapters 4 | .. =, for sections 5 | .. -, for subsections 6 | .. ^, for subsubsections 7 | .. “, for paragraphs 8 | 9 | ########################### 10 | Server Interface (REST API) 11 | ########################### 12 | 13 | The Lemonade SDK offers a server interface that allows your application to load an LLM on Ryzen AI hardware in a process, and then communicate with this process using standard ``REST`` APIs. This allows applications written in any language (C#, JavaScript, Python, C++, etc.) to easily integrate with Ryzen AI LLMs. 14 | 15 | Server interfaces are used across the LLM ecosystem because they allow for no-code plug-and-play between the higher level of the application stack (GUIs, agents, RAG, etc.) with the LLM and hardware that have been abstracted by the server. 16 | 17 | For example, open source projects such as `Open WebUI <#open-webui-demo>`_ have out-of-box support for connecting to a variety of server interfaces, which in turn allows users to quickly start working with LLMs in a GUI. 18 | 19 | ************ 20 | Server Setup 21 | ************ 22 | 23 | Lemonade Server can be installed via the Lemonade Server Installer executable by following these steps: 24 | 25 | 1. Make sure your system has the recommended Ryzen AI driver installed as described in :ref:`install-driver`. 26 | 2. Download and install ``Lemonade_Server_Installer.exe`` from the `latest Lemonade release `_. 27 | 3. Launch the server by double-clicking the ``lemonade_server`` shortcut added to your desktop. 28 | 29 | See the `Lemonade Server README `_ for more details. 30 | 31 | ************ 32 | Server Usage 33 | ************ 34 | 35 | The Lemonade Server provides the following OpenAI-compatible endpoints: 36 | 37 | - POST ``/api/v0/chat/completions`` - Chat Completions (messages to completions) 38 | - POST ``/api/v0/completions`` - Text Completions (prompt to completion) 39 | - GET ``/api/v0/models`` - List available models 40 | 41 | Please refer to the `server specification `_ document in the Lemonade repository for details about the request and response formats for each endpoint. 42 | 43 | The `OpenAI API documentation `_ also has code examples for integrating streaming completions into an application. 44 | 45 | Open WebUI Demo 46 | =============== 47 | 48 | To experience the Lemonade Server, try using it with an OpenAI-compatible application, such as Open WebUI. 49 | 50 | Instructions: 51 | ------------- 52 | 53 | 1. **Launch Lemonade Server:** Double-click the lemon icon on your desktop. See `server setup <#server-setup>`_ for installation instructions. 54 | 55 | 2. **Install and Run Open WebUI:** In a terminal, install Open WebUI using the following commands: 56 | 57 | .. code-block:: bash 58 | 59 | conda create -n webui python=3.11 60 | conda activate webui 61 | pip install open-webui 62 | open-webui serve 63 | 64 | 3. **Launch Open WebUI**: In a browser, navigate to ``_. 65 | 66 | 4. **Connect Open WebUI to Lemonade Server:** In the top-right corner of the UI, click the profile icon and then: 67 | 68 | - Go to ``Settings`` → ``Connections``. 69 | - Click the ``+`` button to add our OpenAI-compatible connection. 70 | - In the URL field, enter ``http://localhost:8000/api/v0``, and in the key field put ``-``, then press save. 71 | 72 | **Done!** You are now able to run Open WebUI with Hybrid models. Feel free to choose any of the available “-Hybrid” models in the model selection menu. 73 | 74 | ********** 75 | Next Steps 76 | ********** 77 | 78 | - See `Lemonade Server Examples `_ to find applications that have been tested with Lemonade Server. 79 | - Check out the `Lemonade Server specification `_ to learn more about supported features. 80 | - Try out your Lemonade Server install with any application that uses the OpenAI chat completions API. 81 | 82 | 83 | .. 84 | ------------ 85 | ##################################### 86 | License 87 | ##################################### 88 | 89 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 90 | -------------------------------------------------------------------------------- /docs/model_quantization.rst: -------------------------------------------------------------------------------- 1 | ################## 2 | Model Quantization 3 | ################## 4 | 5 | **Model quantization** is the process of mapping high-precision weights/activations to a lower precision format, such as BF16/INT8, while maintaining model accuracy. This technique enhances the computational and memory efficiency of the model for deployment on NPU devices. It can be applied post-training, allowing existing models to be optimized without the need for retraining. 6 | 7 | The Ryzen AI compiler supports input models quantized to either INT8 or BF16 format: 8 | 9 | - CNN models: INT8 or BF16 10 | - Transformer models: BF16 11 | 12 | Quantization introduces several challenges, primarily revolving around the potential drop in model accuracy. Choosing the right quantization parameters—such as data type, bit-width, scaling factors, and the decision between per-channel or per-tensor quantization—adds layers of complexity to the design process. 13 | 14 | ********* 15 | AMD Quark 16 | ********* 17 | 18 | **AMD Quark** is a comprehensive cross-platform deep learning toolkit designed to simplify and enhance the quantization of deep learning models. Supporting both PyTorch and ONNX models, Quark empowers developers to optimize their models for deployment on a wide range of hardware backends, achieving significant performance gains without compromising accuracy. 19 | 20 | For more challenging model quantization needs **AMD Quark** supports advanced quantization technique like **Fast Finetuning** that helps recover the lost accuracy of the quantized model. 21 | 22 | Documentation 23 | ============= 24 | The complete documentation for AMD Quark for Ryzen AI can be found here: https://quark.docs.amd.com/latest/supported_accelerators/ryzenai/index.html 25 | 26 | 27 | INT8 Examples 28 | ============= 29 | **AMD Quark** provides default configrations that support INT8 quantization configuration. For example, `XINT8` uses symmetric INT8 activation and weights quantization with power-of-two scales using the MinMSE calibration method. 30 | The quantization configuration can be customized using the `QuantizationConfig` class. The following example shows how to set up the quantization configuration for INT8 quantization: 31 | 32 | .. code-block:: 33 | 34 | quant_config = QuantizationConfig(calibrate_method=PowerOfTwoMethod.MinMSE, 35 | activation_type=QuantType.QUInt8, 36 | weight_type=QuantType.QInt8, 37 | enable_npu_cnn=True, 38 | extra_options={'ActivationSymmetric': True}) 39 | config = Config(global_quant_config=quant_config) 40 | print("The configuration of the quantization is {}".format(config)) 41 | 42 | The user can use the `get_default_config('XINT8')` function to get the default configuration for INT8 quantization. 43 | 44 | For more details 45 | ~~~~~~~~~~~~~~~~ 46 | - `AMD Quark Tutorial `_ for Ryzen AI Deployment 47 | - Running INT8 model on NPU using :doc:`Getting Started Tutorial ` 48 | - Advanced quantization techniques `Fast Finetuning and Cross Layer Equalization `_ for INT8 model 49 | 50 | 51 | BF16 Examples 52 | ============= 53 | **AMD Quark** provides default configrations that support BFLOAT16 (BF16) model quantization. For example, BF16 is a 16-bit floating-point format designed to have same exponent size as FP32, allowing a wide dynamic range, but with reduced precision to save memory and speed up computations. 54 | The BFLOAT16 (BF16) model needs to be converted from QDQ nodes to Cast operations to run with VAIML compiler. AMD Quark support this conversion with the configuration option `BF16QDQToCast`. 55 | 56 | .. code-block:: 57 | 58 | quant_config = get_default_config("BF16") 59 | quant_config.extra_options["BF16QDQToCast"] = True 60 | config = Config(global_quant_config=quant_config) 61 | print("The configuration of the quantization is {}".format(config)) 62 | 63 | For more details 64 | ~~~~~~~~~~~~~~~~ 65 | - `Image Classification `_ using ResNet50 to run BF16 model on NPU 66 | - `Finetuned DistilBERT for Text Classification `_ 67 | - `Text Embedding Model Alibaba-NLP/gte-large-en-v1.5 `_ 68 | - Advanced quantization techniques `Fast Finetuning `_ for BF16 models. 69 | 70 | 71 | .. 72 | ------------ 73 | 74 | ##################################### 75 | License 76 | ##################################### 77 | 78 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 79 | -------------------------------------------------------------------------------- /docs/modelcompat.rst: -------------------------------------------------------------------------------- 1 | .. include:: icons.txt 2 | 3 | ################### 4 | Model Compatibility 5 | ################### 6 | 7 | The Ryzen AI Software supports deploying quantized model saved in the ONNX format. 8 | 9 | Currently, the NPU supports a subset of the ONNX operators. At runtime, the ONNX graph is automatically partitioned into multiple subgraphs by the Vitis AI ONNX Execution Provider (VAI EP). The subgraph(s) containing operators supported by the NPU are executed on the NPU. The remaining subgraph(s) are executed on the CPU. This graph partitioning and deployment technique across CPU and NPU is fully automated by the VAI EP and is totally transparent to the end-user. 10 | 11 | |memo| **NOTE**: Models with ONNX opset 17 are recommended. If your model uses a different opset version, consider converting it using the `ONNX Version Converter `_ 12 | 13 | 14 | The Ryzen AI compiler supports input models quantized to either INT8 or BF16 format: 15 | 16 | - CNN models: INT8 or BF16 17 | - Transformer models: BF16 18 | 19 | BF16 models (CNN or Transformer) require processing power in terms of core count and memory, depending on model size. If a larger model cannot be compiled on a Windows machine due to hardware limitations (e.g., insufficient RAM), an alternative Linux-based compilation flow is supported. More details can be found here: . 20 | 21 | The list of the ONNX operators currently supported by the NPU is as follows: 22 | 23 | 24 | 25 | .. 26 | ------------ 27 | 28 | ##################################### 29 | License 30 | ##################################### 31 | 32 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 33 | -------------------------------------------------------------------------------- /docs/modelrun.rst: -------------------------------------------------------------------------------- 1 | .. include:: /icons.txt 2 | 3 | ################################ 4 | Model Compilation and Deployment 5 | ################################ 6 | 7 | ***************** 8 | Introduction 9 | ***************** 10 | 11 | The Ryzen AI Software supports compiling and deploying quantized model saved in the ONNX format. The ONNX graph is automatically partitioned into multiple subgraphs by the VitisAI Execution Provider (EP). The subgraph(s) containing operators supported by the NPU are executed on the NPU. The remaining subgraph(s) are executed on the CPU. This graph partitioning and deployment technique across CPU and NPU is fully automated by the VAI EP and is totally transparent to the end-user. 12 | 13 | |memo| **NOTE**: Models with ONNX opset 17 are recommended. If your model uses a different opset version, consider converting it using the `ONNX Version Converter `_ 14 | 15 | Models are compiled for the NPU by creating an ONNX inference session using the Vitis AI Execution Provider (VAI EP): 16 | 17 | .. code-block:: python 18 | 19 | providers = ['VitisAIExecutionProvider'] 20 | session = ort.InferenceSession( 21 | model, 22 | sess_options = sess_opt, 23 | providers = providers, 24 | provider_options = provider_options 25 | ) 26 | 27 | 28 | The ``provider_options`` parameter allows passing special options to the Vitis AI EP. 29 | 30 | .. list-table:: 31 | :widths: 20 35 32 | :header-rows: 1 33 | 34 | * - Provider Options 35 | - Description 36 | * - config_file 37 | - Configuration file to pass certain compile-specific options, used for BF16 compilation. 38 | * - xclbin 39 | - NPU binary file to specify NPU configuration, used for INT8 models. 40 | * - cache_dir 41 | - The path and name of the cache directory. 42 | * - cache_key 43 | - The subfolder in the cache directory where the compiled model is stored. 44 | * - encryptionKey 45 | - Used for generating an encrypted compiled model. 46 | 47 | Detailed usage of these options is discussed in the following sections of this page. 48 | 49 | 50 | .. _compile-bf16: 51 | 52 | ************************** 53 | Compiling BF16 models 54 | ************************** 55 | 56 | |memo| **NOTE**: For compiling large BF16 models a machine with at least 32GB of memory is recommended. The machine does not need to have an NPU. It is also possible to compile BF16 models on a Linux workstation. More details can be found here: :doc:`rai_linux` 57 | 58 | When compiling BF16 models, a compilation configuration file must be provided through the ``config_file`` provider options. 59 | 60 | .. code-block:: python 61 | 62 | providers = ['VitisAIExecutionProvider'] 63 | 64 | provider_options = [{ 65 | 'config_file': 'vai_ep_config.json' 66 | }] 67 | 68 | session = ort.InferenceSession( 69 | "resnet50.onnx", 70 | providers=providers, 71 | provider_options=provider_options 72 | ) 73 | 74 | 75 | By default, the configuration file for compiling BF16 models should contain the following: 76 | 77 | .. code-block:: json 78 | 79 | { 80 | "passes": [ 81 | { 82 | "name": "init", 83 | "plugin": "vaip-pass_init" 84 | }, 85 | { 86 | "name": "vaiml_partition", 87 | "plugin": "vaip-pass_vaiml_partition", 88 | "vaiml_config": {} 89 | } 90 | ] 91 | } 92 | 93 | 94 | Additional options can be specified in the ``vaiml_config`` section of the configuration file, as described below. 95 | 96 | **Performance Optimization** 97 | 98 | The default compilation optimization level is 1. The optimization level can be changed as follows: 99 | 100 | .. code-block:: json 101 | 102 | "vaiml_config": {"optimize_level": 2} 103 | 104 | Supported values: 1 (default), 2 105 | 106 | 107 | **Automatic FP32 to BF16 Conversion** 108 | 109 | If a FP32 model is used, the compiler will automatically cast it to BF16 if this option is enabled. For better control over accuracy, it is recommended to quantize the model to BF16 using Quark. 110 | 111 | .. code-block:: json 112 | 113 | "vaiml_config": {"enable_f32_to_bf16_conversion": true} 114 | 115 | Supported values: false (default), true 116 | 117 | 118 | **Optimizations for Transformer-Based Models** 119 | 120 | By default, the compiler vectorizes the data to optimize performance for CNN models. However, transformers perform best with unvectorized data. To better optimize transformer-based models, set: 121 | 122 | .. code-block:: json 123 | 124 | "vaiml_config": {"preferred_data_storage": "unvectorized"} 125 | 126 | Supported values: "vectorized" (default), "unvectorized" 127 | 128 | 129 | .. _compile-int8: 130 | 131 | ************************** 132 | Compiling INT8 models 133 | ************************** 134 | 135 | When compiling INT8 models, the NPU configuration must be specified through the ``xclbin`` provider option. This option is not required for BF16 models. 136 | 137 | There are two types of NPU configurations for INT8 models: standard and benchmark. Setting the NPU configuration involves specifying a specific ``.xclbin`` binary file, which is located in the Ryzen AI Software installation tree. 138 | 139 | Depending on the target processor and binary type (standard/benchmark), the following ``.xclbin`` files should be used: 140 | 141 | **For STX/KRK APUs**: 142 | 143 | - Standard binary: ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins\strix\AMD_AIE2P_Nx4_Overlay.xclbin`` 144 | - Benchmark binary: ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins\strix\AMD_AIE2P_4x4_Overlay.xclbin`` 145 | 146 | **For PHX/HPT APUs**: 147 | 148 | - Standard binary: ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins\phoenix\1x4.xclbin`` 149 | - Benchmark binary: ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins\phoenix\4x4.xclbin`` 150 | 151 | Python example selecting the standard NPU configuration for STX/KRK: 152 | 153 | .. code-block:: python 154 | 155 | providers = ['VitisAIExecutionProvider'] 156 | 157 | provider_options = [{ 158 | 'xclbin': '{}\\voe-4.0-win_amd64\\xclbins\\strix\\AMD_AIE2P_Nx4_Overlay.xclbin'.format(os.environ["RYZEN_AI_INSTALLATION_PATH"]) 159 | }] 160 | 161 | session = ort.InferenceSession( 162 | "resnet50.onnx", 163 | providers=providers, 164 | provider_options=provider_options 165 | ) 166 | 167 | | 168 | 169 | By default, the Ryzen AI Conda environment automatically sets the standard binary for all inference sessions through the ``XLNX_VART_FIRMWARE`` environment variable. However, explicitly passing the xclbin option in the provider options overrides the environment variable. 170 | 171 | .. code-block:: 172 | 173 | > echo %XLNX_VART_FIRMWARE% 174 | C:\Program Files\RyzenAI\1.4.0\voe-4.0-win_amd64\xclbins\strix\AMD_AIE2P_Nx4_Overlay.xclbin 175 | 176 | 177 | 178 | | 179 | 180 | ************************************ 181 | Managing Compiled Models 182 | ************************************ 183 | 184 | To avoid the overhead of recompiling models, it is very advantageous to save the compiled models and use these pre-compiled versions in the final application. Pre-compiled models can be loaded instantaneously and immediately executed on the NPU. This greatly improves the session creation time and overall end-user experience. 185 | 186 | The RyzenAI Software supports two mechanisms for saving and reloading compiled models: 187 | 188 | - VitisAI EP Cache 189 | - OnnxRuntime EP Context Cache 190 | 191 | .. _vitisai-ep-cache: 192 | 193 | VitisAI EP Cache 194 | ================ 195 | 196 | The VitisAI EP includes a built-in caching mechanism. This mechanism is enabled by default. When a model is compiled for the first time, it is automatically saved in the VitisAI EP cache directory. Any subsequent creation of an ONNX Runtime session using the same model will load the precompiled model from the cache directory, thereby reducing session creation time. 197 | 198 | The location of the VitisAI EP cache is specified with the ``cache_dir`` and ``cache_key`` provider options: 199 | 200 | - ``cache_dir`` - Specifies the path and name of the cache directory. 201 | - ``cache_key`` - Specifies the subfolder in the cache directory where the compiled model is stored. 202 | 203 | Python example: 204 | 205 | .. code-block:: python 206 | 207 | from pathlib import Path 208 | 209 | providers = ['VitisAIExecutionProvider'] 210 | cache_dir = Path(__file__).parent.resolve() 211 | provider_options = [{'cache_dir': str(cache_dir), 212 | 'cache_key': 'compiled_resnet50'}] 213 | 214 | session = ort.InferenceSession( 215 | "resnet50.onnx", 216 | providers=providers, 217 | provider_options=provider_options 218 | ) 219 | 220 | 221 | In the example above, the cache directory is set to the absolute path of the folder containing the script being executed. Once the session is created, the compiled model is saved inside a subdirectory named ``compiled_resnet50`` within the specified cache folder. 222 | 223 | Default Settings 224 | ---------------- 225 | In the current release, if ``cache_dir`` is not set, the default cache location is determined by the type of model: 226 | 227 | - INT8 models - ``C:\temp\%USERNAME%\vaip\.cache`` 228 | - BF16 models - The directory where the script or program is executed 229 | 230 | 231 | Disabling the Cache 232 | ------------------- 233 | To ignore cached models and force recompilation, unset the ``XLNX_ENABLE_CACHE`` environment variable before running the application: 234 | 235 | .. code-block:: 236 | 237 | set XLNX_ENABLE_CACHE= 238 | 239 | 240 | 241 | VitisAI EP Cache Encryption 242 | --------------------------- 243 | 244 | The contents of the VitisAI EP cache folder can be encrypted using AES256. Cache encryption is enabled by passing an encryption key through the VAI EP provider options. The same key must be used to decrypt the model when loading it from the cache. The key is a 256-bit value represented as a 64-digit string. 245 | 246 | Python example: 247 | 248 | .. code-block:: python 249 | 250 | session = onnxruntime.InferenceSession( 251 | "resnet50.onnx", 252 | providers=["VitisAIExecutionProvider"], 253 | provider_options=[{ 254 | "config_file":"/path/to/vaip_config.json", 255 | "encryptionKey": "89703f950ed9f738d956f6769d7e45a385d3c988ca753838b5afbc569ebf35b2" 256 | }]) 257 | 258 | C++ example: 259 | 260 | .. code-block:: cpp 261 | 262 | auto onnx_model_path = "resnet50.onnx" 263 | Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "resnet50"); 264 | auto session_options = Ort::SessionOptions(); 265 | auto options = std::unorderd_map({}); 266 | options["config_file"] = "/path/to/vaip_config.json"; 267 | options["encryptionKey"] = "89703f950ed9f738d956f6769d7e45a385d3c988ca753838b5afbc569ebf35b2"; 268 | 269 | session_options.AppendExecutionProvider("VitisAI", options); 270 | auto session = Ort::Experimental::Session(env, model_name, session_options); 271 | 272 | As a result of encryption, the model generated in the cache directory cannot be opened with Netron. Additionally, dumping is disabled to prevent the leakage of sensitive information about the model. 273 | 274 | .. _ort-ep-context-cache: 275 | 276 | OnnxRuntime EP Context Cache 277 | ============================ 278 | 279 | The Vitis AI EP supports the ONNX Runtime EP context cache feature. This features allows dumping and reloading a snapshot of the EP context before deployment. Currently, this feature is only available for INT8 models. 280 | 281 | The user can enable dumping of the EP context by setting the ``ep.context_enable`` session option to 1. 282 | 283 | The following options can be used for additional control: 284 | 285 | - ``ep.context_file_path`` – Specifies the output path for the dumped context model. 286 | - ``ep.context_embed_mode`` – Embeds the EP context into the ONNX model when set to 1. 287 | 288 | For further details, refer to the official ONNX Runtime documentation: https://onnxruntime.ai/docs/execution-providers/EP-Context-Design.html 289 | 290 | 291 | EP Context Encryption 292 | --------------------- 293 | 294 | By default, the generated context model is unencrypted and can be used directly during inference. If needed, the context model can be encrypted using one of the methods described below. 295 | 296 | User-managed encryption 297 | ~~~~~~~~~~~~~~~~~~~~~~~ 298 | After the context model is generated, the developer can encrypt the generated file using a method of choice. At runtime, the encrypted file can be loaded by the application, decrypted in memory and passed as a serialized string to the inference session. This method gives complete control to the developer over the encryption process. 299 | 300 | EP-managed encryption 301 | ~~~~~~~~~~~~~~~~~~~~~~~ 302 | The Vitis AI EP encryption mechanism can be used to encrypt the context model. This is enabled by passing an encryption key via the ``encryptionKey`` provider option (discussed in the previous section). The model is encrypted using AES256. At runtime, the same encryption key must be provided to decrypt and load the context model. With this method, encryption and decryption is seamlessly managed by the VitisAI EP. 303 | 304 | Python example: 305 | 306 | .. code-block:: python 307 | 308 | # Compilation session 309 | session_options = ort.SessionOptions() 310 | session_options.add_session_config_entry('ep.context_enable', '1') 311 | session_options.add_session_config_entry('ep.context_file_path', 'context_model.onnx') 312 | session_options.add_session_config_entry('ep.context_embed_mode', '1') 313 | session = ort.InferenceSession( 314 | path_or_bytes='resnet50.onnx', 315 | sess_options=session_options, 316 | providers=['VitisAIExecutionProvider'], 317 | provider_options=[{'encryptionKey': '89703f950ed9f738d956f6769d7e45a385d3c988ca753838b5afbc569ebf35b2'}] 318 | ) 319 | 320 | # Inference session 321 | session_options = ort.SessionOptions() 322 | session = ort.InferenceSession( 323 | path_or_bytes='context_model.onnx', 324 | sess_options=session_options, 325 | providers=['VitisAIExecutionProvider'], 326 | provider_options=[{'encryptionKey': '89703f950ed9f738d956f6769d7e45a385d3c988ca753838b5afbc569ebf35b2'}] 327 | ) 328 | 329 | 330 | **NOTE**: When compiling with encryptionKey, ensure that any existing cache directory (either the default cache directory or the directory specified by the ``cache_dir`` provider option) is deleted before compiling. 331 | 332 | | 333 | 334 | ************************** 335 | Operator Assignment Report 336 | ************************** 337 | 338 | 339 | Vitis AI EP generates a file named ``vitisai_ep_report.json`` that provides a report on model operator assignments across CPU and NPU. This file is automatically generated in the cache directory if no explicit cache location is specified in the code. This report includes information such as the total number of nodes, the list of operator types in the model, and which nodes and operators runs on the NPU or on the CPU. Additionally, the report includes node statistics, such as input to a node, the applied operation, and output from the node. 340 | 341 | 342 | .. code-block:: 343 | 344 | { 345 | "deviceStat": [ 346 | { 347 | "name": "all", 348 | "nodeNum": 400, 349 | "supportedOpType": [ 350 | "::Add", 351 | "::Conv", 352 | ... 353 | ] 354 | }, 355 | { 356 | "name": "CPU", 357 | "nodeNum": 2, 358 | "supportedOpType": [ 359 | "::DequantizeLinear", 360 | "::QuantizeLinear" 361 | ] 362 | }, 363 | { 364 | "name": "NPU", 365 | "nodeNum": 398, 366 | "supportedOpType": [ 367 | "::Add", 368 | "::Conv", 369 | ... 370 | ] 371 | ... 372 | 373 | .. 374 | ------------ 375 | 376 | ##################################### 377 | License 378 | ##################################### 379 | 380 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 381 | -------------------------------------------------------------------------------- /docs/npu_oga.rst: -------------------------------------------------------------------------------- 1 | :orphan: 2 | 3 | ###################### 4 | OGA NPU Execution Mode 5 | ###################### 6 | 7 | Ryzen AI Software supports deploying LLMs on Ryzen AI PCs using the native ONNX Runtime Generate (OGA) C++ or Python API. The OGA API is the lowest-level API available for building LLM applications on a Ryzen AI PC. This documentation covers the NPU execution mode for LLMs, which utilizes only the NPU. 8 | 9 | **Note**: Refer to :doc:`hybrid_oga` for Hybrid NPU + GPU execution mode. 10 | 11 | 12 | ************************ 13 | Supported Configurations 14 | ************************ 15 | 16 | The Ryzen AI OGA flow supports Strix and Krackan Point processors. Phoenix (PHX) and Hawk (HPT) processors are not supported. 17 | 18 | 19 | ************ 20 | Requirements 21 | ************ 22 | - Install NPU Drivers and Ryzen AI MSI installer according to the :doc:`inst` 23 | - Install Git for Windows (needed to download models from HF): https://git-scm.com/downloads 24 | 25 | 26 | ******************** 27 | Pre-optimized Models 28 | ******************** 29 | 30 | AMD provides a set of pre-optimized LLMs ready to be deployed with Ryzen AI Software and the supporting runtime for NPU execution. These models can be found on Hugging Face in the following collection: 31 | 32 | - https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix 33 | - https://huggingface.co/amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix 34 | - https://huggingface.co/amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix 35 | - https://huggingface.co/amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-bf16-onnx-ryzen-strix 36 | - https://huggingface.co/amd/chatglm3-6b-awq-g128-int4-asym-bf16-onnx-ryzen-strix 37 | - https://huggingface.co/amd/Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix 38 | - https://huggingface.co/amd/Llama2-7b-chat-awq-g128-int4-asym-bf16-onnx-ryzen-strix 39 | - https://huggingface.co/amd/Llama-3-8B-awq-g128-int4-asym-bf16-onnx-ryzen-strix 40 | - https://huggingface.co/amd/Llama-3.1-8B-awq-g128-int4-asym-bf16-onnx-ryzen-strix 41 | - https://huggingface.co/amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix 42 | - https://huggingface.co/amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix 43 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Llama-8B-awq-g128-int4-asym-bf16-onnx-ryzen-strix 44 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Qwen-1.5B-awq-g128-int4-asym-bf16-onnx-ryzen-strix 45 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Qwen-7B-awq-g128-int4-asym-bf16-onnx-ryzen-strix 46 | - https://huggingface.co/amd/AMD-OLMo-1B-SFT-DPO-awq-g128-int4-asym-bf16-onnx-ryzen-strix 47 | 48 | The steps for deploying the pre-optimized models using C++ and python are described in the following sections. 49 | 50 | *************************** 51 | NPU Execution of OGA Models 52 | *************************** 53 | 54 | Setup 55 | ===== 56 | 57 | Activate the Ryzen AI 1.4 Conda environment: 58 | 59 | .. code-block:: 60 | 61 | conda activate ryzen-ai-1.4.0 62 | 63 | Create a folder to run the LLMs from, and copy the required files: 64 | 65 | .. code-block:: 66 | 67 | mkdir npu_run 68 | cd npu_run 69 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\npu-llm\exe" .\libs 70 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\npu-llm\libs\vaip_llm.json" libs 71 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\npu-llm\onnxruntime-genai.dll" libs 72 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_vitis_ai_custom_ops.dll" libs 73 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_providers_shared.dll" libs 74 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_vitisai_ep.dll" libs 75 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\dyn_dispatch_core.dll" libs 76 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_providers_vitisai.dll" libs 77 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\transaction.dll" libs 78 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime.dll" libs 79 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\xclbin.dll" libs 80 | 81 | 82 | Download Models from HuggingFace 83 | ================================ 84 | 85 | Download the desired models from the list of pre-optimized models on Hugging Face: 86 | 87 | .. code-block:: 88 | 89 | # Make sure you have git-lfs installed (https://git-lfs.com) 90 | git lfs install 91 | git clone 92 | 93 | For example, for Llama-2-7b: 94 | 95 | .. code-block:: 96 | 97 | git lfs install 98 | git clone https://huggingface.co/amd/Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix 99 | 100 | 101 | **NOTE**: Ensure the models are cloned in the ``npu_run`` folder. 102 | 103 | 104 | Enabling Performance Mode (Optional) 105 | ==================================== 106 | 107 | To run the LLMs in the best performance mode, follow these steps: 108 | 109 | - Go to ``Windows`` → ``Settings`` → ``System`` → ``Power`` and set the power mode to Best Performance. 110 | - Execute the following commands in the terminal: 111 | 112 | .. code-block:: 113 | 114 | cd C:\Windows\System32\AMD 115 | xrt-smi configure --pmode performance 116 | 117 | 118 | 119 | Sample C++ Programs 120 | =================== 121 | 122 | The ``run_llm.exe`` test application provides a simple interface to run LLMs. The source code for this application can also be used a reference for how to integrate LLMs using the native OGA C++ APIs. 123 | 124 | It supports the following command line options:: 125 | 126 | -m: model path 127 | -f: prompt file 128 | -n: max new tokens 129 | -c: use chat template 130 | -t: input prompt token length 131 | -l: max length to be set in search options 132 | -h: help 133 | 134 | 135 | Example usage: 136 | 137 | .. code-block:: 138 | 139 | .\libs\run_llm.exe -m "Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix" -f "Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix\prompts.txt" -t "1024" -n 20 140 | 141 | | 142 | 143 | The ``model_benchmark.exe`` program can be used to profile the execution of LLMs and report various metrics. It supports the following command line options:: 144 | 145 | -i,--input_folder 146 | Path to the ONNX model directory to benchmark, compatible with onnxruntime-genai. 147 | -l,--prompt_length 148 | List of number of tokens in the prompt to use. 149 | -p,--prompt_file 150 | Name of prompt file (txt) expected in the input model directory. 151 | -g,--generation_length 152 | Number of tokens to generate. Default: 128 153 | -r,--repetitions 154 | Number of times to repeat the benchmark. Default: 5 155 | -w,--warmup 156 | Number of warmup runs before benchmarking. Default: 1 157 | -t,--cpu_util_time_interval 158 | Sampling time interval for peak cpu utilization calculation, in milliseconds. Default: 250 159 | -v,--verbose 160 | Show more informational output. 161 | -h,--help 162 | Show this help message and exit. 163 | 164 | 165 | For example, for Llama-2-7b: 166 | 167 | .. code-block:: 168 | 169 | .\libs\model_benchmark.exe -i "Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix" -g 20 -p "Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix\prompts.txt" -l "2048,1024,512,256,128" 170 | 171 | | 172 | 173 | **NOTE**: The C++ source code for the ``run_llm.exe`` and ``model_benchmark.exe`` executables can be found in the ``%RYZEN_AI_INSTALLATION_PATH%\npu-llm\cpp`` folder. This source code can be modified and recompiled using the commands below. 174 | 175 | .. code-block:: 176 | 177 | :: Copy project files 178 | xcopy /E /I "%RYZEN_AI_INSTALLATION_PATH%\npu-llm\cpp" .\sources 179 | 180 | :: Build project 181 | cd sources 182 | cmake -G "Visual Studio 17 2022" -A x64 -S . -B build 183 | cmake --build build --config Release 184 | 185 | :: Copy executables in the "libs" folder 186 | xcopy /I build\Release .\libs 187 | 188 | :: Copy runtime dependencies in the "libs" folder 189 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\npu-llm\libs\vaip_llm.json" libs 190 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\npu-llm\onnxruntime-genai.dll" libs 191 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_vitis_ai_custom_ops.dll" libs 192 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_providers_shared.dll" libs 193 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_vitisai_ep.dll" libs 194 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\dyn_dispatch_core.dll" libs 195 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_providers_vitisai.dll" libs 196 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\transaction.dll" libs 197 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime.dll" libs 198 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\xclbin.dll" libs 199 | 200 | Sample Python Scripts 201 | ===================== 202 | 203 | In the model directory, open the ``genai_config.json`` file located in the folder of the downloaded model. Update the value of the "custom_ops_library" key with the path to the ``onnxruntime_vitis_ai_custom_ops.dll``, located in the ``npu_run\libs`` folder: 204 | 205 | .. code-block:: 206 | 207 | "session_options": { 208 | ... 209 | "custom_ops_library": "libs\\onnxruntime_vitis_ai_custom_ops.dll", 210 | ... 211 | } 212 | 213 | To run LLMs other than ChatGLM, use the following command: 214 | 215 | .. code-block:: 216 | 217 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\llama3\run_model.py" --model_dir 218 | 219 | To run ChatGLM, use the following command: 220 | 221 | .. code-block:: 222 | 223 | pip install transformers==4.44.0 224 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\chatglm\model-generate-chatglm3.py" -m 225 | 226 | For example, for Llama-2-7b: 227 | 228 | .. code-block:: 229 | 230 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\llama3\run_model.py" --model_dir Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix 231 | 232 | 233 | 234 | *********************** 235 | Using Fine-Tuned Models 236 | *********************** 237 | 238 | It is also possible to run fine-tuned versions of the pre-optimized OGA models. 239 | 240 | To do this, the fine-tuned models must first be prepared for execution with the OGA NPU-only flow. For instructions on how to do this, refer to the page about :doc:`oga_model_prepare`. 241 | 242 | Once a fine-tuned model has been prepared for NPU-only execution, it can be deployed by following the steps described above in this page. 243 | -------------------------------------------------------------------------------- /docs/oga_model_prepare.rst: -------------------------------------------------------------------------------- 1 | #################### 2 | Preparing OGA Models 3 | #################### 4 | 5 | This section describes the process for preparing LLMs for deployment on a Ryzen AI PC using the hybrid or NPU-only execution mode. Currently, the flow supports only fine-tuned versions of the models already supported (as listed in :doc:`hybrid_oga` page). For example, fine-tuned versions of Llama2 or Llama3 can be used. However, different model families with architectures not supported by the hybrid flow cannot be used. 6 | 7 | Preparing a LLM for deployment on a Ryzen AI PC involves 2 steps: 8 | 9 | 1. **Quantization**: The pretrained model is quantized to reduce memory footprint and better map to compute resources in the hardware accelerators 10 | 2. **Postprocessing**: During the postprocessing the model is exported to OGA followed by NPU-only or Hybrid execution mode specific postprocess to obtain the final deployable model. 11 | 12 | ************ 13 | Quantization 14 | ************ 15 | 16 | Prerequisites 17 | ============= 18 | Linux machine with AMD (e.g., AMD Instinct MI Series) or Nvidia GPUs 19 | 20 | Setup 21 | ===== 22 | 23 | 1. Create and activate Conda Environment  24 | 25 | .. code-block:: 26 | 27 | conda create --name python=3.11 28 | conda activate 29 | 30 | 2. If Using AMD GPUs, update PyTorch to use ROCm  31 | 32 | .. code-block:: 33 | 34 | pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1 35 | python -c "import torch; print(torch.cuda.is_available())" # Must return `True` 36 | 37 | 3. Download :download:`AMD Quark 0.8 ` and unzip the archive 38 | 39 | 4. Install Quark:  40 | 41 | .. code-block:: 42 | 43 | cd 44 | pip install amd_quark-0.8+<>.whl 45 | 46 | 5. Install other dependencies 47 | 48 | .. code-block:: 49 | 50 | pip install datasets 51 | pip install transformers 52 | pip install accelerate 53 | pip install evaluate 54 | 55 | 56 | Some models may require a specific version of ``transformers``. For example, ChatGLM3 requires version 4.44.0. 57 | 58 | Generate Quantized Model 59 | ======================== 60 | 61 | Use following command to run Quantization. In a GPU equipped Linux machine the quantization can take about 30-60 minutes. 62 | 63 | .. code-block:: 64 | 65 | cd examples/torch/language_modeling/llm_ptq/ 66 | 67 | python quantize_quark.py \ 68 | --model_dir "meta-llama/Llama-2-7b-chat-hf" \ 69 | --output_dir \ 70 | --quant_scheme w_uint4_per_group_asym \ 71 | --num_calib_data 128 \ 72 | --quant_algo awq \ 73 | --dataset pileval_for_awq_benchmark \ 74 | --model_export hf_format \ 75 | --data_type \ 76 | --exclude_layers 77 | 78 | 79 | - To generate OGA model for NPU only execution mode use ``--datatype float32`` 80 | - To generate OGA model for Hybrid execution mode use ``--datatype float16`` 81 | - For a BF16 pretrained model, you can use ``--data_type bfloat16``. 82 | 83 | The quantized model is generated in the folder. 84 | 85 | ************** 86 | Postprocessing 87 | ************** 88 | 89 | Copy the quantized model to the Windows PC with Ryzen AI installed, activate the Ryzen AI Conda environment, and execute ``model_generate`` command to generate the final model. 90 | 91 | Generate the final model for Hybrid execution mode: 92 | 93 | .. code-block:: 94 | 95 | conda activate ryzen-ai- 96 | 97 | model_generate --hybrid 98 | 99 | 100 | Generate the final model for NPU execution mode: 101 | 102 | .. code-block:: 103 | 104 | conda activate ryzen-ai- 105 | 106 | model_generate --npu 107 | 108 | .. 109 | ------------ 110 | 111 | ##################################### 112 | License 113 | ##################################### 114 | 115 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 116 | -------------------------------------------------------------------------------- /docs/rai_linux.rst: -------------------------------------------------------------------------------- 1 | :orphan: 2 | 3 | ########################## 4 | Ryzen AI Software on Linux 5 | ########################## 6 | 7 | This guide provides instructions for using Ryzen AI 1.4 on Linux for model compilation and followed by running inference on Windows. 8 | 9 | ************* 10 | Prerequisites 11 | ************* 12 | The following are the recommended system configuration for RyzenAI Linux installer 13 | 14 | .. list-table:: 15 | :widths: 25 25 16 | :header-rows: 1 17 | 18 | * - Dependencies 19 | - Version Requirement 20 | * - Ubuntu 21 | - 22.04 22 | * - RAM 23 | - 32GB or Higher 24 | * - CPU cores 25 | - >= 8 26 | * - Python 27 | - 3.10 or Higher 28 | 29 | 30 | ************************* 31 | Installation Instructions 32 | ************************* 33 | 34 | - Download the Ryzen AI Software Linux installer: :download:`ryzen_ai-1.4.0.tgz `. 35 | 36 | - Extract the .tgz using the following command: 37 | 38 | .. code-block:: 39 | 40 | tar -xvzf ryzen_ai-1.4.0.tgz 41 | 42 | - Run the installer with default settings. This will prompt to read and agree to the EULA: 43 | 44 | .. code-block:: 45 | 46 | cd ryzen_ai-1.4.0 47 | ./install_ryzen_ai_1_4.sh 48 | 49 | - After reading the EULA, re-run the installer with options to agree to the EULA and create a Python virtual environment: 50 | 51 | .. code-block:: 52 | 53 | ./install_ryzen_ai_1_4.sh -a yes -p -l 54 | 55 | - Activate the virtual environment to start using the Ryzen AI Software: 56 | 57 | .. code-block:: 58 | 59 | source /bin/activate 60 | 61 | 62 | ****************** 63 | Usage Instructions 64 | ****************** 65 | 66 | The process for model compilation on Linux is similar to that on Windows. Refer to the instructions provided in the :doc:`modelrun` page for complete details. 67 | 68 | Once the model has been successfully compiled on your Linux machine, proceed to copy the entire working directory to a Windows machine that operates on an STX-based system. 69 | 70 | Prior to running the compiled model on the Windows machine, ensure that all required prerequisites are satisfied as listed in the :doc:`inst` page. 71 | -------------------------------------------------------------------------------- /docs/relnotes.rst: -------------------------------------------------------------------------------- 1 | .. include:: /icons.txt 2 | 3 | ############# 4 | Release Notes 5 | ############# 6 | 7 | .. _supported-configurations: 8 | 9 | ************************ 10 | Supported Configurations 11 | ************************ 12 | 13 | Ryzen AI 1.4 Software supports AMD processors codenamed Phoenix, Hawk Point, Strix, Strix Halo, and Krackan Point. These processors can be found in the following Ryzen series: 14 | 15 | - Ryzen 200 Series 16 | - Ryzen 7000 Series, Ryzen PRO 7000 Series 17 | - Ryzen 8000 Series, Ryzen PRO 8000 Series 18 | - Ryzen AI 300 Series, Ryzen AI PRO Series, Ryzen AI Max 300 Series 19 | 20 | For a complete list of supported devices, refer to the `processor specifications `_ page (look for the "AMD Ryzen AI" column towards the right side of the table, and select "Available" from the pull-down menu). 21 | 22 | The rest of this document will refer to Phoenix as PHX, Hawk Point as HPT, Strix and Strix Halo as STX, and Krackan Point as KRK. 23 | 24 | 25 | ************************* 26 | Model Compatibility Table 27 | ************************* 28 | 29 | The following table lists which types of models are supported on what hardware platforms. 30 | 31 | .. list-table:: 32 | :header-rows: 1 33 | 34 | * - Model Type 35 | - PHX/HPT 36 | - STX/KRK 37 | * - CNN INT8 38 | - |checkmark| 39 | - |checkmark| 40 | * - CNN BF16 41 | - 42 | - |checkmark| 43 | * - NLP BF16 44 | - 45 | - |checkmark| 46 | * - LLM (OGA) 47 | - 48 | - |checkmark| 49 | 50 | 51 | *********** 52 | Version 1.4 53 | *********** 54 | 55 | - New Features: 56 | 57 | - `New architecture support for Ryzen AI 300 series processors `_ 58 | - Unified support for LLMs, INT8, and BF16 models in a single release package 59 | - Public release for compilation of BF16 CNN and NLP models on Windows 60 | - `Public release of the LLM Hybrid OGA flow `_ 61 | - `LLM building flow for finetuned LLM `_ 62 | - Support for up to 16 hardware contexts on Ryzen AI 300 series processors 63 | - Vitis AI EP now supports the ONNX Runtime EP context cache feature (for custom handling of pre-compiled models) 64 | - Ryzen AI environment variables converted to VitisAI EP session options 65 | - Improved exception handling and fallback to CPU 66 | 67 | - `New Hybrid execution mode LLMs `_: 68 | 69 | - DeepSeek-R1-Distill-Llama-8B 70 | - DeepSeek-R1-Distill-Qwen-1.5B 71 | - DeepSeek-R1-Distill-Qwen-7B 72 | - Gemma2-2B 73 | - Qwen2-1.5B 74 | - Qwen2-7B 75 | - AMD-OLMO-1B-SFT-DPO 76 | - Mistral-7B-Instruct-v0.1 77 | - Mistral-7B-Instruct-v0.2 78 | - Mistral-7B-v0.3 79 | - Llama3.1-8B-Instruct 80 | - Codellama-7B-Instruct 81 | 82 | - :doc:`New BF16 model examples `: 83 | 84 | - Image classification 85 | - Finetuned DistilBERT for text classification 86 | - Text embedding model Alibaba-NLP/gte-large-en-v1.5 87 | 88 | - New Tools: 89 | 90 | - `Lemonade SDK `_ 91 | 92 | - `Lemonade Server `_: A server interface that uses the standard Open AI API, allowing applications in any language to integrate with Lemonade Server for local LLM deployment and compatibility with existing Open AI apps. 93 | - `Lemonade Python API `_: Offers High-Level API for easy integration of Lemonade LLMs into Python applications and Low-Level API for custom experiments with specific checkpoints, devices, and tools. 94 | - `Lemonade Command Line `_ Interface easily benchmark, measure accuracy, prompt or gather memory usage of your LLM. 95 | - `TurnkeyML `_ – Open-source tool that includes low-code APIs for general ONNX workflows. 96 | - `Digest AI `_ – A Model Ingestion and Analysis Tool in collaboration with the Linux Foundation. 97 | - `GAIA `_ – An open-source application designed for the quick setup and execution of generative AI applications on local PC hardware. 98 | 99 | - Quark-torch: 100 | 101 | - Added ROUGE and METEOR evaluation metrics for LLMs 102 | - Support for evaluating ONNX models exported using OGA 103 | - Support for offline evaluation (evaluation without generation) for LLMs 104 | - Support for Hugging Face integration 105 | - Support for Gemma2 quantization using the OGA flow 106 | - Support for Llama-3.2 quantization with FP8 (weights, activation, and KV-cache) for the vision and language components 107 | 108 | - Quark-onnx: 109 | 110 | - Support compatibility with ONNX Runtime version 1.20.0, and 1.20.1 111 | - Support for microexponents (MX) data types, including MX4, MX6, and MX9 112 | - Support for BF16 data type for VAIML 113 | - Support for excluding pre and post-processing from quantization 114 | - Support for mixed precision with any data type 115 | - Support for Quarot rotation R1 algorithm 116 | - Support for microexponents and microscaling AdaQuant 117 | - Support for an auto-search algorithm to automatically find the best accuracy quantized model 118 | - Added tools for evaluating L2, PSNR, VMAF, and cosine 119 | 120 | - ONNX Runtime EP: 121 | 122 | - Support for Chinese characters in the ``filename/cache_dir/cache_key/xclbin`` 123 | - Support for ``int4/uint4`` data type 124 | - Support for configurable failure handling: CPU fallback or exception 125 | - Update for encrypt/decrypt feature 126 | 127 | - Known Issues: 128 | 129 | - Microsoft Windows Insider Program (WIP) users may see warnings or need to restart when running all applications concurrently. 130 | 131 | - NPU driver and workloads will continue to work. 132 | 133 | - Context creation may appear to be limited when some application do not close contexts quickly. 134 | 135 | 136 | *********** 137 | Version 1.3 138 | *********** 139 | 140 | - New Features: 141 | 142 | - Initial release of the Quark quantizer 143 | - Support for mixed precision data types 144 | - Compatibility with Copilot+ applications 145 | 146 | - Improved support for :doc:`LLMs using OGA ` 147 | 148 | - New EoU Tools: 149 | 150 | - CNN profiling tool for VAI-ML flow 151 | - Idle detection and suspension of contexts 152 | - Rebalance feature for AIE hardware resource optimization 153 | 154 | - NPU and Compiler: 155 | 156 | - New Op Support: 157 | 158 | - MAC 159 | - QResize Bilinear 160 | - LUT Q-Power 161 | - Expand 162 | - Q-Hsoftmax 163 | - A16 Q-Pad 164 | - Q-Reduce-Mean along H/W dimension 165 | - A16 Q-Global-AvgPool 166 | - A16 Padding with non-zero values 167 | - A16 Q-Sqrt 168 | - Support for XINT8/XINT16 MatMul and A16W16/A8W8 Q-MatMul 169 | 170 | - Performance Improvements: 171 | 172 | - Q-Conv, Q-Pool, Q-Add, Q-Mul, Q-InstanceNorm 173 | - Enhanced QDQ support for a range of operations 174 | - Enhanced the tiling algorithm 175 | - Improved graph-level optimization with extra transpose removal 176 | - Enhanced AT/MT fusion 177 | - Optimized memory usage and compile time 178 | - Improved compilation messages 179 | 180 | - Quark for PyTorch: 181 | 182 | - Model Support: 183 | 184 | - Examples of LLM PTQ, such as Llama3.2 and Llama3.2-Vision models 185 | - Example of YOLO-NAS detection model PTQ/QAT 186 | - Example of SDXL v1.0 with weight INT8 activation INT8 187 | 188 | - PyTorch Quantizer Enhancements: 189 | 190 | - Partial model quantization by user configuration under FX mode 191 | - Quantization of ConvTranspose2d in Eager Mode and FX mode 192 | - Advanced Quantization Algorithms with auto-generated configurations 193 | - Optimized Configuration with DataTypeSpec for ease of use 194 | - Accelerated in-place replacement under Eager Mode 195 | - Loading configuration from file of algorithms and pre-optimizations 196 | 197 | - Quark for ONNX: 198 | 199 | - New Features: 200 | 201 | - Compatibility with ONNX Runtime version 1.18, 1.19 202 | - Support for int4, uint4, Microscaling data types 203 | - Quantization for arbitrary specified operators 204 | - Quantization type alignment of element-wise operators for mixed precision 205 | - ONNX graph cleaning 206 | - Int32 bias quantization 207 | 208 | - ONNX Quantizer Enhancements: 209 | 210 | - Fast fine-tuning support for the MatMul operator, BFP data type, and GPU acceleration 211 | - Improved ONNX quantization of LLM models 212 | - Optimized quantization of FP16 models 213 | - Custom operator compilation process 214 | - Default parameters for auto mixed precision 215 | - Optimized Ryzen AI workflow by aligning with hardware constraints of the NPU 216 | 217 | - ONNX Runtime EP: 218 | 219 | - Support for ONNX Runtime EP shared libraries 220 | - Python dependency removal 221 | - Memory optimization during the compile phase 222 | - Pattern API enhancement with multiple outputs and commutable arguments support 223 | 224 | - Known Issues: 225 | 226 | - Extended compile time for some models with BF16/BFP16 data types 227 | - LLM models with 4K sequence length may revert to CPU execution 228 | - Accuracy drop in some Transformer models using BF16/BFP16 data types, requiring Quark intervention 229 | 230 | *********** 231 | Version 1.2 232 | *********** 233 | 234 | - New features: 235 | 236 | - Support added for Strix Point NPUs 237 | - Support added for integrated GPU 238 | - Smart installer for Ryzen AI 1.2 239 | - NPU DPM based on power slider 240 | 241 | - New model support: 242 | 243 | - `LLM flow support `_ for multiple models in both PyTorch and ONNX flow (optimized model support will be released asynchronously) 244 | - SDXL-T with limited performance optimization 245 | 246 | - New EoU tools: 247 | 248 | - `AI Analyzer `_ : Analysis and visualization of model compilation and inference profiling 249 | - Platform/NPU inspection and management tool (`xrt-smi `_) 250 | - `Onnx Benchmarking tool `_ 251 | 252 | - New Demos: 253 | 254 | - NPU-GPU multi-model pipeline application `demo `_ 255 | 256 | - NPU and Compiler 257 | 258 | - New device support: Strix Nx4 and 4x4 Overlay 259 | - New Op support: 260 | 261 | - InstanceNorm 262 | - Silu 263 | - Floating scale quantization operators (INT8, INT16) 264 | - Support new rounding mode (Round to even) 265 | - Performance Improvement: 266 | 267 | - Reduced the model compilation time 268 | - Improved instruction loading 269 | - Improved synchronization in large overlay 270 | - Enhanced strided_slice performance 271 | - Enhanced convolution MT fusion 272 | - Enhanced convolution AT fusion 273 | - Enhanced data movement op performance 274 | - ONNX Quantizer updates 275 | 276 | - Improved usability with various features and tools, including weights-only quantization, graph optimization, dynamic shape fixing, and format transformations. 277 | - Improved the accuracy of quantized models through automatic mixed precision and enhanced AdaRound and AdaQuant techniques. 278 | - Enhanced support for the BFP data type, including more attributes and shape inference capability. 279 | - Optimized the NPU workflow by aligning with the hardware constraints of the NPU. 280 | - Supported compilation for Windows and Linux. 281 | - Bugfix: 282 | 283 | - Fixed the problem where per-channel quantization is not compatible with onnxruntime 1.17. 284 | - Fixed the bug of CLE when conv with groups. 285 | - Fixed the bug of bias correction. 286 | - Pytorch Quantizer updates 287 | 288 | - Tiny value quantization protection. 289 | - Higher onnx version support in quantized model exporting. 290 | - Relu6 hardware constrains support. 291 | - Support of mean operation with keepdim=True. 292 | - Resolved issues: 293 | 294 | - NPU SW stack will fail to initialize when the system is out of memory. This could impact camera functionality when Microsoft Effect Pack is enabled. 295 | - If Microsoft Effects Pack is overloaded with other 4+ applications that use NPU to do inference, then camera functionality can be impacted. Can be fixed with a reboot. This will be fixed in the next release. 296 | 297 | *********** 298 | Version 1.1 299 | *********** 300 | 301 | - New model support: 302 | 303 | - Llama 2 7B with w4abf16 (3-bit and 4-bit) quantization (Beta) 304 | - Whisper base (EA access) 305 | 306 | - New EoU tools: 307 | 308 | - CNN Benchmarking tool on RyzenAI-SW Repo 309 | - Platform/NPU inspection and management tool 310 | 311 | Quantizer 312 | ========= 313 | 314 | - ONNX Quantizer: 315 | 316 | - Improved usability with various features and tools, including diverse parameter configurations, graph optimization, shape fixing, and format transformations. 317 | - Improved quantization accuracy through the implementation of experimental algorithmic improvements, including AdaRound and AdaQuant. 318 | - Optimized the NPU workflow by distinguishing between different targets and aligning with the hardware constraints of the NPU. 319 | - Introduced new utilities for model conversion. 320 | 321 | - PyTorch Quantizer: 322 | 323 | - Mixed data type quantization enhancement and bug fix. 324 | - Corner bug fixes for add, sub, and conv1d operations. 325 | - Tool for converting the S8S8 model to the U8S8 model. 326 | - Tool for converting the customized Q/DQ to onnxruntime contributed Q/DQ with the "microsoft" domain. 327 | - Tool for fixing a dynamic shapes model to fixed shape model. 328 | 329 | - Bug fixes 330 | 331 | - Fix for incorrect logging when simulating the LeakyRelu alpha value. 332 | - Fix for useless initializers not being cleaned up during optimization. 333 | - Fix for external data cannot be found when using use_external_data_format. 334 | - Fix for custom Ops cannot be registered due to GLIBC version mismatch 335 | 336 | NPU and Compiler 337 | ================ 338 | 339 | - New op support: 340 | 341 | - Support Channel-wie Prelu. 342 | - Gstiling with reverse = false. 343 | - Fixed issues: 344 | 345 | - Fixed Transpose-convolution and concat optimization issues. 346 | - Fixed Conv stride 3 corner case hang issue. 347 | - Performance improvement: 348 | 349 | - Updated Conv 1x1 stride 2x2 optimization. 350 | - Enhanced Conv 7x7 performance. 351 | - Improved padding performance. 352 | - Enhanced convolution MT fusion. 353 | - Improved the performance for NCHW layout model. 354 | - Enhanced the performance for eltwise-like op. 355 | - Enhanced Conv and eltwise AT fusion. 356 | - Improved the output convolution/transpose-convolution’s performance. 357 | - Enhanced the logging message for EoU. 358 | 359 | 360 | ONNX Runtime EP 361 | =============== 362 | 363 | - End-2-End Application support on NPU 364 | 365 | - Enhanced existing support: Provided high-level APIs to enable seamless incorporation of pre/post-processing operations into the model to run on NPU 366 | - Two examples (resnet50 and yolov8) published to demonstrate the usage of these APIs to run end-to-end models on the NPU 367 | - Bug fixes for ONNXRT EP to support customers’ models 368 | 369 | Misc 370 | ==== 371 | 372 | - Contains mitigation for the following CVEs: CVE-2024-21974, CVE-2024-21975, CVE-2024-21976 373 | 374 | ************* 375 | Version 1.0.1 376 | ************* 377 | 378 | - Minor fix for Single click installation without given env name. 379 | - Perform improvement in the NPU driver. 380 | - Bug fix in elementwise subtraction in the compiler. 381 | - Runtime stability fixes for minor corner cases. 382 | - Quantizer update to resolve performance drop with default settings. 383 | 384 | *********** 385 | Version 1.0 386 | *********** 387 | Quantizer 388 | ========= 389 | 390 | - ONNX Quantizer 391 | 392 | - Support for ONNXRuntime 1.16. 393 | - Support for the Cross-Layer-Equalization (CLE) algorithm in quantization, which can balance the weights of consecutive Conv nodes to make it more quantize-friendly in per-tensor quantization. 394 | - Support for mixed precision quantization including UINT16/INT16/UINT32/INT32/FLOAT16/BFLOAT16, and support asymmetric quantization for BFLOAT16. 395 | - Support for the MinMSE method for INT16/UINT16/INT32/UINT32 quantization. 396 | - Support for quantization using the INT16 scale. 397 | - Support for unsigned ReLU in symmetric activation configuration. 398 | - Support for converting Float16 to Float32 during quantization. 399 | - Support for converting NCHW model to NHWC model during quantization. 400 | - Support for two more modes for MinMSE for better accuracy. The "All" mode computes the scales with all batches while the "MostCommon" mode computes the scale for each batch and uses the most common scales. 401 | - Support for the quantization of more operations: 402 | 403 | - PReLU, Sub, Max, DepthToSpace, SpaceToDepth, Slice, InstanceNormalization, and LpNormalization. 404 | - Non-4D ReduceMean. 405 | - Leakyrelu with arbitrary alpha. 406 | - Split by converting it to Slice. 407 | 408 | - Support for op fusing of InstanceNormalization and L2Normalization in NPU workflow. 409 | - Support for converting Clip to ReLU when the minimal value is 0. 410 | - Updated shift_bias, shift_read, and shift_write constraints in the NPU workflow and added an option "IPULimitationCheck" to disable it. 411 | - Support for disabling the op fusing of Conv + LeakyReLU/PReLU in the NPU workflow. 412 | - Support for logging for quantization configurations and summary information. 413 | - Support for removing initializer from input to support models converted from old version pytorch where weights are stored as inputs. 414 | - Added a recommended configuration for the IPU_Transformer platform. 415 | - New utilities: 416 | 417 | - Tool for converting the float16 model to the float32 model. 418 | - Tool for converting the NCHW model to the NHWC model. 419 | - Tool for quantized models with random input. 420 | 421 | - Three examples for quantization models from Timm, Torchvision, and ONNXRuntime modelzoo respectively. 422 | - Bugfixes: 423 | 424 | - Fix a bug that weights are quantized with the "NonOverflow" method when using the "MinMSE" method. 425 | 426 | - Pytorch Quantizer 427 | 428 | - Support of some operations quantization in quantizer: inplace div, inplace sub 429 | - Log and document enhancement to emphasize fast-finetune 430 | - Timm models quantization script example 431 | - Bug fix for operators: clamp and prelu 432 | - QAT Support quantization of operations with multiple outputs 433 | - QAT EOU enhancements: significantly reduces the need for network modifications 434 | - QAT ONNX exporting enhancements: support more configurations 435 | - New QAT examples 436 | 437 | - TF2 Quantizer 438 | 439 | - Support for Tensorflow 2.11 and 2.12. 440 | - Support for the 'tf.linalg.matmul' operator. 441 | - Updated shift_bias constraints for NPU workflow. 442 | - Support for dumping models containing operations with multiple outputs. 443 | - Added an example of a sequential model. 444 | - Bugfixes: 445 | 446 | - Fix a bug that Hardsigmoid and Hardswish are not mapped to DPU without Batch Normalization. 447 | - Fix a bug when both align_pool and align_concat are used simultaneously. 448 | - Fix a bug in the sequential model when a layer has multiple consumers. 449 | 450 | - TF1 Quantizer 451 | 452 | - Update shift_bias constraints for NPU workflow. 453 | - Bugfixes: 454 | 455 | - Fix a bug in fast_finetune when the 'input_node' and 'quant_node' are inconsistent. 456 | - Fix a bug that AddV2 op identified as BiasAdd. 457 | - Fix a bug when the data type of the concat op is not float. 458 | - Fix a bug in split_large_kernel_pool when the stride is not equal to 1. 459 | 460 | ONNXRuntime Execution Provider 461 | ============================== 462 | 463 | - Support new OPs, such as PRelu, ReduceSum, LpNormlization, DepthToSpace(DCR). 464 | - Increase the percentage of model operators performed on the NPU. 465 | - Fixed some issues causing model operators allocation to CPU. 466 | - Improved report summary 467 | - Support the encryption of the VOE cache 468 | - End-2-End Application support on NPU 469 | 470 | - Enable running pre/post/custom ops on NPU, utilizing ONNX feature of E2E extensions. 471 | - Two examples published for yolov8 and resnet50, in which preprocessing custom op is added and runs on NPU. 472 | 473 | - Performance: latency improves by up to 18% and power savings by up to 35% by additionally running preprocessing on NPU apart from inference. 474 | - Multiple NPU overlays support 475 | 476 | - VOE configuration that supports both CNN-centric and GEMM-centric NPU overlays. 477 | - Increases number of ops that run on NPU, especially for models which have both GEMM and CNN ops. 478 | - Examples published for use with some of the vision transformer models. 479 | 480 | NPU and Compiler 481 | ============================== 482 | 483 | - New operators support 484 | 485 | - Global average pooling with large spatial dimensions 486 | - Single Activation (no fusion with conv2d, e.g. relu/single alpha PRelu) 487 | 488 | - Operator support enhancement 489 | 490 | - Enlarge the width dimension support range for depthwise-conv2d 491 | - Support more generic broadcast for element-wise like operator 492 | - Support output channel not aligned with 4B GStiling 493 | - Support Mul and LeakyRelu fusion 494 | - Concatenation’s redundant input elimination 495 | - Channel Augmentation for conv2d (3x3, stride=2) 496 | 497 | - Performance optimization 498 | 499 | - PDI partition refine to reduce the overhead for PDI swap 500 | - Enabled cost model for some specific models 501 | 502 | - Fixed asynchronous error in multiple thread scenario 503 | - Fixed known issue on tanh and transpose-conv2d hang issue 504 | 505 | Known Issues 506 | ============================== 507 | 508 | - Support for multiple applications is limited to up to eight 509 | - Windows Studio Effects should be disabled when using the Latency profile. To disable Windows Studio Effects, open **Settings > Bluetooth & devices > Camera**, select your primary camera, and then disable all camera effects. 510 | 511 | 512 | 513 | *********** 514 | Version 0.9 515 | *********** 516 | 517 | Quantizer 518 | ========= 519 | 520 | - Pytorch Quantizer 521 | 522 | - Dict input/output support for model forward function 523 | - Keywords argument support for model forward function 524 | - Matmul subroutine quantization support 525 | - Support of some operations in quantizer: softmax, div, exp, clamp 526 | - Support quantization of some non-standard conv2d. 527 | 528 | 529 | - ONNX Quantizer 530 | 531 | - Add support for Float16 and BFloat16 quantization. 532 | - Add C++ kernels for customized QuantizeLinear and DequantizeLinaer operations. 533 | - Support saving quantizer version info to the quantized models' producer field. 534 | - Support conversion of ReduceMean to AvgPool in NPU workflow. 535 | - Support conversion of BatchNorm to Conv in NPU workflow. 536 | - Support optimization of large kernel GlobalAvgPool and AvgPool operations in NPU workflow. 537 | - Supports hardware constraints check and adjustment of Gemm, Add, and Mul operations in NPU workflow. 538 | - Supports quantization for LayerNormalization, HardSigmoid, Erf, Div, and Tanh for NPU. 539 | 540 | ONNXRuntime Execution Provider 541 | ============================== 542 | 543 | - Support new OPs, such as Conv1d, LayerNorm, Clip, Abs, Unsqueeze, ConvTranspose. 544 | - Support pad and depad based on NPU subgraph’s inputs and outputs. 545 | - Support for U8S8 models quantized by ONNX quantizer. 546 | - Improve report summary tools. 547 | 548 | NPU and Compiler 549 | ================ 550 | 551 | - Supported exp/tanh/channel-shuffle/pixel-unshuffle/space2depth 552 | - Performance uplift of xint8 output softmax 553 | - Improve the partition messages for CPU/DPU 554 | - Improve the validation check for some operators 555 | - Accelerate the speed of compiling large models 556 | - Fix the elew/pool/dwc/reshape mismatch issue and fix the stride_slice hang issue 557 | - Fix str_w != str_h issue in Conv 558 | 559 | 560 | LLM 561 | === 562 | 563 | - Smoothquant for OPT1.3b, 2.7b, 6.7b, 13b models. 564 | - Huggingface Optimum ORT Quantizer for ONNX and Pytorch dynamic quantizer for Pytorch 565 | - Enabled Flash attention v2 for larger prompts as a custom torch.nn.Module 566 | - Enabled all CPU ops in bfloat16 or float32 with Pytorch 567 | - int32 accumulator in AIE (previously int16) 568 | - DynamicQuantLinear op support in ONNX 569 | - Support different compute primitives for prefill/prompt and token phases 570 | - Zero copy of weights shared between different op primitives 571 | - Model saving after quantization and loading at runtime for both Pytorch and ONNX 572 | - Enabled profiling prefill/prompt and token time using local copy of OPT Model with additional timer instrumentation 573 | - Added demo mode script with greedy, stochastic and contrastive search options 574 | 575 | ASR 576 | === 577 | - Support Whipser-tiny 578 | - All GEMMs offloaded to AIE 579 | - Improved compile time 580 | - Improved WER 581 | 582 | Known issues 583 | ============ 584 | 585 | - Flow control OPs including "Loop", "If", "Reduce" not supported by VOE 586 | - Resizing OP in ONNX opset 10 or lower is not supported by VOE 587 | - Tensorflow 2.x quantizer supports models within tf.keras.model only 588 | - Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue 589 | - Running multiple concurrent models using temporal sharing on the 5x4 binary is not supported 590 | - Only batch sizes of 1 are supported 591 | - Only models with the pretrained weights setting = TRUE should be imported 592 | - Launching multiple processes on 4 1x4 binaries can cause hangs, especially when models have many sub-graphs 593 | 594 | | 595 | | 596 | 597 | *********** 598 | Version 0.8 599 | *********** 600 | 601 | Quantizer 602 | ========= 603 | 604 | - Pytorch Quantizer 605 | 606 | - Pytorch 1.13 and 2.0 support 607 | - Mixed precision quantization support, supporting float32/float16/bfloat16/intx mixed quantization 608 | - Support of bit-wise accuracy cross check between quantizer and ONNX-runtime 609 | - Split and chunk operators were automatically converted to slicing 610 | - Add support for BFP data type quantization 611 | - Support of some operations in quantizer: where, less, less_equal, greater, greater_equal, not, and, or, eq, maximum, minimum, sqrt, Elu, Reduction_min, argmin 612 | - QAT supports training on multiple GPUs 613 | - QAT supports operations with multiple inputs or outputs 614 | 615 | - ONNX Quantizer 616 | 617 | - Provided Python wheel file for installation 618 | - Support OnnxRuntime 1.15 619 | - Supports setting input shapes of random data reader 620 | - Supports random data reader in the dump model function 621 | - Supports saving the S8S8 model in U8S8 format for NPU 622 | - Supports simulation of Sigmoid, Swish, Softmax, AvgPool, GlobalAvgPool, ReduceMean and LeakyRelu for NPU 623 | - Supports node fusions for NPU 624 | 625 | ONNXRuntime Execution Provider  626 | ============================== 627 | 628 | - Supports for U8S8 quantized ONNX models 629 | - Improve the function of falling back to CPU EP 630 | - Improve AIE plugin framework 631 | 632 | - Supports LLM Demo 633 | - Supports Gemm ASR 634 | - Supports E2E AIE acceleration for Pre/Post ops 635 | - Improve the easy-of-use for partition and  deployment 636 | - Supports  models containing subgraphs 637 | - Supports report summary about OP assignment 638 | - Supports report summary about DPU subgraphs falling back to CPU 639 | - Improve log printing and troubleshooting tools. 640 | - Upstreamed to ONNX Runtime Github repo for any data type support and bug fix 641 | 642 | NPU and Compiler 643 | ================ 644 | 645 | - Extended the support range of some operators 646 | 647 | - Larger input size: conv2d, dwc 648 | - Padding mode: pad 649 | - Broadcast: add 650 | - Variant dimension (non-NHWC shape): reshape, transpose, add 651 | - Support new operators, e.g. reducemax(min/sum/avg), argmax(min) 652 | - Enhanced multi-level fusion 653 | - Performance enhancement for some operators 654 | - Add quantization information validation 655 | - Improvement in device partition 656 | 657 | - User friendly message 658 | - Target-dependency check 659 | 660 | Demos 661 | ===== 662 | 663 | - New Demos link: https://account.amd.com/en/forms/downloads/ryzen-ai-software-platform-xef.html?filename=transformers_2308.zip 664 | 665 | - LLM demo with OPT-1.3B/2.7B/6.7B 666 | - Automatic speech recognition demo with Whisper-tiny 667 | 668 | Known issues 669 | ============ 670 | - Flow control OPs including "Loop", "If", "Reduce" not supported by VOE 671 | - Resize OP in ONNX opset 10 or lower not supported by VOE 672 | - Tensorflow 2.x quantizer supports models within tf.keras.model only 673 | - Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue 674 | - Run multiple concurrent models by temporal sharing on the Performance optimized overlay (5x4.xclbin) is not supported 675 | - Support batch size 1 only for NPU 676 | 677 | 678 | | 679 | | 680 | 681 | *********** 682 | Version 0.7 683 | *********** 684 | 685 | Quantizer 686 | ========= 687 | 688 | - Docker Containers 689 | 690 | - Provided CPU dockers for Pytorch, Tensorflow 1.x, and Tensorflow 2.x quantizer 691 | - Provided GPU Docker files to build GPU dockers 692 | 693 | - Pytorch Quantizer 694 | 695 | - Supports multiple output conversion to slicing 696 | - Enhanced transpose OP optimization 697 | - Inspector support new IP targets for NPU 698 | 699 | - ONNX Quantizer 700 | 701 | - Provided Python wheel file for installation 702 | - Supports quantizing ONNX models for NPU as a plugin for the ONNX Runtime native quantizer 703 | - Supports power-of-two quantization with both QDQ and QOP format 704 | - Supports Non-overflow and Min-MSE quantization methods 705 | - Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format. 706 | 707 | - Supports signed and unsigned configurations. 708 | - Supports symmetry and asymmetry configurations. 709 | - Supports per-tensor and per-channel configurations. 710 | - Supports bias quantization using int8 datatype for NPU. 711 | - Supports quantization parameters (scale) refinement for NPU. 712 | - Supports excluding certain operations from quantization for NPU. 713 | - Supports ONNX models larger than 2GB. 714 | - Supports using CUDAExecutionProvider for calibration in quantization 715 | - Open source and upstreamed to Microsoft Olive Github repo 716 | 717 | - TensorFlow 2.x Quantizer 718 | 719 | - Added support for exporting the quantized model ONNX format. 720 | - Added support for the keras.layers.Activation('leaky_relu') 721 | 722 | - TensorFlow 1.x Quantizer 723 | 724 | - Added support for folding Reshape and ResizeNearestNeighbor operators. 725 | - Added support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes. 726 | - Added support for quantizing Sum, StridedSlice, and Maximum operators. 727 | - Added support for setting the input shape of the model, which is useful in deploying models with undefined input shapes. 728 | - Add support for setting the opset version in exporting ONNX format 729 | 730 | ONNX Runtime Execution Provider 731 | =============================== 732 | 733 | - Vitis ONNX Runtime Execution Provider (VOE) 734 | 735 | - Supports ONNX Opset version 18, ONNX Runtime 1.16.0, and ONNX version 1.13 736 | - Supports both C++ and Python APIs(Python version 3) 737 | - Supports deploy model with other EPs 738 | - Supports falling back to CPU EP 739 | - Open source and upstreamed to ONNX Runtime Github repo 740 | - Compiler 741 | 742 | - Multiple Level op fusion 743 | - Supports the same muti-output operator like chunk split 744 | - Supports split big pooling to small pooling 745 | - Supports 2-channel writeback feature for Hard-Sigmoid and Depthwise-Convolution 746 | - Supports 1-channel GStiling 747 | - Explicit pad-fix in CPU subgraph for 4-byte alignment 748 | - Tuning the performance for multiple models 749 | 750 | NPU 751 | === 752 | 753 | - Two configurations 754 | 755 | - Power Optimized Overlay 756 | 757 | - Suitable for smaller AI models (1x4.xclbin) 758 | - Supports spatial sharing, up to 4 concurrent AI workloads 759 | 760 | - Performance Optimized Overlay (5x4.xclbin) 761 | 762 | - Suitable for larger AI models 763 | 764 | Known issues 765 | ============ 766 | - Flow control OPs including "Loop", "If", "Reduce" are not supported by VOE 767 | - Resize OP in ONNX opset 10 or lower not supported by VOE 768 | - Tensorflow 2.x quantizer supports models within tf.keras.model only 769 | - Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue 770 | - Run multiple concurrent models by temporal sharing on the Performance optimized overlay (5x4.xclbin) is not supported 771 | 772 | 773 | 774 | 775 | .. 776 | ------------ 777 | 778 | ##################################### 779 | License 780 | ##################################### 781 | 782 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice. 783 | -------------------------------------------------------------------------------- /docs/ryzen_ai_libraries.rst: -------------------------------------------------------------------------------- 1 | .. Copyright (C) 2023-2025 Advanced Micro Devices, Inc. All rights reserved. 2 | 3 | ##################### 4 | Ryzen AI CVML library 5 | ##################### 6 | 7 | The Ryzen AI CVML libraries build on top of the Ryzen AI drivers and execution infrastructure to provide powerful AI capabilities to C++ applications without having to worry about training specific AI models and integrating them to the Ryzen AI framework. 8 | 9 | Each Ryzen AI CVML library feature offers a simple C++ application programming interface (API) that can be easily incorporated into existing applications. 10 | 11 | The Ryzen AI CVML library is distributed through the RyzenAI-SW Github repository: https://github.com/amd/RyzenAI-SW/tree/main/Ryzen-AI-CVML-Library 12 | 13 | ************* 14 | Prerequisites 15 | ************* 16 | Ensure that the following software tools/packages are installed on the development system. 17 | 18 | 1. Visual Studio 2022 Community edition or newer, ensure “Desktop Development with C++” is installed 19 | 2. Cmake (version >= 3.18) 20 | 3. OpenCV (version=4.8.1 or newer) 21 | 22 | ************************************************** 23 | Building sample applications 24 | ************************************************** 25 | This section describes the steps to build Ryzen AI CVML library sample applications. 26 | 27 | Navigate to the folder containing Ryzen AI samples 28 | ================================================== 29 | Download the Ryzen AI CVML sources, and go to the 'samples' sub-folder of the library. :: 30 | 31 | git clone https://github.com/amd/RyzenAI-SW.git -b main --depth-1 32 | chdir RyzenAI-SW\Ryzen-AI-CVML-Library\samples 33 | 34 | OpenCV libraries 35 | ================ 36 | Ryzen AI CVML library samples make use of OpenCV, so set an environment variable to let the build scripts know where to find OpenCV. :: 37 | 38 | set OPENCV_INSTALL_ROOT= 39 | 40 | Build Instructions 41 | ================== 42 | Create a build folder and use CMAKE to build the sample(s). :: 43 | 44 | mkdir build-samples 45 | cmake -S %CD% -B %CD%\build-samples -DOPENCV_INSTALL_ROOT=%OPENCV_INSTALL_ROOT% 46 | cmake --build %CD%\build-samples --config Release 47 | 48 | The compiled sample application(s) will be placed in the various build-samples\\Release folder(s) under the 'samples' folder. 49 | 50 | ************************************************* 51 | Running sample applications 52 | ************************************************* 53 | This section describes how to execute Ryzen AI CVML library sample applications. 54 | 55 | Update the console and/or system PATH 56 | ===================================== 57 | Ryzen AI CVML library applications need to be able to find the library files. One way to do this is to add the location of the libraries to the system or console PATH environment variable. 58 | 59 | In this example, the location of OpenCV's runtime libraries is also added to the PATH environment variable. :: 60 | 61 | set PATH=%PATH%;\windows 62 | set PATH=%PATH%;%OPENCV_INSTALL_ROOT%\x64\vc16\bin 63 | 64 | Adjust the aforementioned commands to match the actual location of Ryzen AI and OpenCV libraries, respectively. 65 | 66 | Select an input source/image/video 67 | ================================== 68 | Ryzen AI CVML library samples can accept a variety of image and video input formats, or even open the default camera on the system if "0" is specified as an input. 69 | 70 | In this example, a publicly available video file is used for the application's input. :: 71 | 72 | curl -o dancing.mp4 https://videos.pexels.com/video-files/4540332/4540332-hd_1920_1080_25fps.mp4 73 | 74 | Execute the sample application 75 | ============================== 76 | Finally, the previously built sample application can be executed with the selected input source. :: 77 | 78 | build-samples\cvml-sample-depth-estimation\Release\cvml-sample-depth-estimation.exe -i dancing.mp4 79 | 80 | .. 81 | ------------ 82 | 83 | ##################################### 84 | License 85 | ##################################### 86 | 87 | Ryzen AI is licensed under MIT License. Refer to the LICENSE file for the full license text and copyright notice. 88 | -------------------------------------------------------------------------------- /docs/sphinx/requirements.in: -------------------------------------------------------------------------------- 1 | rocm-docs-core==0.24.2 2 | -------------------------------------------------------------------------------- /docs/sphinx/requirements.txt: -------------------------------------------------------------------------------- 1 | # 2 | # This file is autogenerated by pip-compile with Python 3.8 3 | # by the following command: 4 | # 5 | # pip-compile --resolver=backtracking requirements.in 6 | # 7 | accessible-pygments==0.0.4 8 | # via pydata-sphinx-theme 9 | alabaster==0.7.13 10 | # via sphinx 11 | babel==2.12.1 12 | # via 13 | # pydata-sphinx-theme 14 | # sphinx 15 | beautifulsoup4==4.12.2 16 | # via pydata-sphinx-theme 17 | breathe==4.35.0 18 | # via rocm-docs-core 19 | certifi==2023.7.22 20 | # via requests 21 | cffi==1.15.1 22 | # via 23 | # cryptography 24 | # pynacl 25 | charset-normalizer==3.2.0 26 | # via requests 27 | click==8.1.7 28 | # via sphinx-external-toc 29 | cryptography==41.0.4 30 | # via pyjwt 31 | deprecated==1.2.14 32 | # via pygithub 33 | docutils==0.19 34 | # via 35 | # breathe 36 | # myst-parser 37 | # pydata-sphinx-theme 38 | # sphinx 39 | fastjsonschema==2.18.0 40 | # via rocm-docs-core 41 | fspath==20230629 42 | # via linuxdoc 43 | gitdb==4.0.10 44 | # via gitpython 45 | gitpython==3.1.37 46 | # via rocm-docs-core 47 | idna==3.4 48 | # via requests 49 | imagesize==1.4.1 50 | # via sphinx 51 | importlib-metadata==6.8.0 52 | # via sphinx 53 | importlib-resources==6.1.0 54 | # via rocm-docs-core 55 | jinja2==3.1.2 56 | # via 57 | # myst-parser 58 | # sphinx 59 | linuxdoc==20240924 60 | # via sphinx 61 | markdown-it-py==2.2.0 62 | # via 63 | # mdit-py-plugins 64 | # myst-parser 65 | markupsafe==2.1.3 66 | # via jinja2 67 | mdit-py-plugins==0.3.5 68 | # via myst-parser 69 | mdurl==0.1.2 70 | # via markdown-it-py 71 | myst-parser==1.0.0 72 | # via rocm-docs-core 73 | packaging==23.1 74 | # via 75 | # pydata-sphinx-theme 76 | # sphinx 77 | pycparser==2.21 78 | # via cffi 79 | pydata-sphinx-theme==0.14.1 80 | # via 81 | # rocm-docs-core 82 | # sphinx-book-theme 83 | pygithub==1.59.1 84 | # via rocm-docs-core 85 | pygments==2.16.1 86 | # via 87 | # accessible-pygments 88 | # pydata-sphinx-theme 89 | # sphinx 90 | pyjwt[crypto]==2.8.0 91 | # via pygithub 92 | pynacl==1.5.0 93 | # via pygithub 94 | pytz==2023.3.post1 95 | # via babel 96 | pyyaml==6.0.1 97 | # via 98 | # myst-parser 99 | # rocm-docs-core 100 | # sphinx-external-toc 101 | requests==2.31.0 102 | # via 103 | # pygithub 104 | # sphinx 105 | rocm-docs-core==0.24.2 106 | # via -r requirements.in 107 | six==1.17.0 108 | # via linuxdoc 109 | smmap==5.0.1 110 | # via gitdb 111 | snowballstemmer==2.2.0 112 | # via sphinx 113 | soupsieve==2.5 114 | # via beautifulsoup4 115 | sphinx==5.3.0 116 | # via 117 | # breathe 118 | # myst-parser 119 | # pydata-sphinx-theme 120 | # rocm-docs-core 121 | # sphinx-book-theme 122 | # sphinx-copybutton 123 | # sphinx-design 124 | # sphinx-external-toc 125 | # sphinx-notfound-page 126 | sphinx-book-theme==1.0.1 127 | # via rocm-docs-core 128 | sphinx-copybutton==0.5.2 129 | # via rocm-docs-core 130 | sphinx-design==0.5.0 131 | # via rocm-docs-core 132 | sphinx-external-toc==0.3.1 133 | # via rocm-docs-core 134 | sphinx-notfound-page==1.0.0 135 | # via rocm-docs-core 136 | sphinxcontrib-applehelp==1.0.4 137 | # via sphinx 138 | sphinxcontrib-devhelp==1.0.2 139 | # via sphinx 140 | sphinxcontrib-htmlhelp==2.0.1 141 | # via sphinx 142 | sphinxcontrib-jsmath==1.0.1 143 | # via sphinx 144 | sphinxcontrib-qthelp==1.0.3 145 | # via sphinx 146 | sphinxcontrib-serializinghtml==1.1.5 147 | # via sphinx 148 | typing-extensions==4.8.0 149 | # via pydata-sphinx-theme 150 | urllib3==2.0.5 151 | # via requests 152 | wrapt==1.15.0 153 | # via deprecated 154 | zipp==3.17.0 155 | # via 156 | # importlib-metadata 157 | # importlib-resources 158 | -------------------------------------------------------------------------------- /docs/xrt_smi.rst: -------------------------------------------------------------------------------- 1 | .. 2 | .. Heading guidelines 3 | .. 4 | .. # with overline, for parts 5 | .. * with overline, for chapters 6 | .. =, for sections 7 | .. -, for subsections 8 | .. ^, for subsubsections 9 | .. “, for paragraphs 10 | .. 11 | 12 | .. include:: /icons.txt 13 | 14 | ######################## 15 | NPU Management Interface 16 | ######################## 17 | 18 | ******************************* 19 | Introduction 20 | ******************************* 21 | 22 | The ``xrt-smi`` utility is a command-line interface to monitor and manage the NPU integrated AMD CPUs. 23 | 24 | It is installed in ``C:\Windows\System32\AMD`` and it can be directly invoked from within the conda environment created by the Ryzen AI Software installer. 25 | 26 | The ``xrt-smi`` utility currently supports three primary commands: 27 | 28 | - ``examine`` - generates reports related to the state of the AI PC and the NPU. 29 | - ``validate`` - executes sanity tests on the NPU. 30 | - ``configure`` - manages the performance level of the NPU. 31 | 32 | By default, the output of the ``xrt-smi examine`` and ``xrt-smi validate`` commands goes to the terminal. It can also be written to file in JSON format as shown below: 33 | 34 | .. code-block:: shell 35 | 36 | xrt-smi examine -f JSON -o 37 | 38 | The utility also support the following options which can be used with any command: 39 | 40 | - ``--help`` - help to use xrt-smi or one of its sub commands 41 | - ``--version`` - report the version of XRT, driver and firmware 42 | - ``--verbose`` - turn on verbosity 43 | - ``--batch`` - enable batch mode (disables escape characters) 44 | - ``--force`` - when possible, force an operation. Eg - overwrite a file in examine or validate 45 | 46 | The ``xrt-smi`` utility requires `Microsoft Visual C++ Redistributable `_ (version 2015 to 2022) to be installed. 47 | 48 | 49 | ******************************* 50 | Overview of Key Commands 51 | ******************************* 52 | 53 | .. list-table:: 54 | :widths: 35 65 55 | :header-rows: 1 56 | 57 | * - Command 58 | - Description 59 | * - examine 60 | - system config, device name 61 | * - examine --report platform 62 | - performance mode, power 63 | * - examine --report aie-partitions 64 | - hw contexts 65 | * - validate --run latency 66 | - latency test 67 | * - validate --run throughput 68 | - throughput test 69 | * - validate --run gemm 70 | - INT8 GEMM test TOPS. This is a full array test and it should not be run while another workload is running. **NOTE**: This command is not supported on PHX and HPT NPUs. 71 | * - configure --pmode 72 | - set performance mode 73 | 74 | 75 | |memo| **NOTE**: The ``examine --report aie-partition`` report runtime information. These commands should be used when a model is running on the NPU. You can run these commands in a loop to see live updates of the reported data. 76 | 77 | 78 | ******************************* 79 | xrt-smi examine 80 | ******************************* 81 | 82 | System Information 83 | ================== 84 | 85 | Reports OS/system information of the AI PC and confirm the presence of the AMD NPU. 86 | 87 | .. code-block:: shell 88 | 89 | xrt-smi examine 90 | 91 | Sample Command Line Output:: 92 | 93 | 94 | System Configuration 95 | OS Name : Windows NT 96 | Release : 26100 97 | Machine : x86_64 98 | CPU Cores : 20 99 | Memory : 32063 MB 100 | Distribution : Microsoft Windows 11 Enterprise 101 | Model : HP OmniBook Ultra Laptop 14-fd0xxx 102 | BIOS Vendor : HP 103 | BIOS Version : W81 Ver. 01.01.14 104 | 105 | XRT 106 | Version : 2.19.0 107 | Branch : HEAD 108 | Hash : f62307ddadf65b54acbed420a9f0edc415fefafc 109 | Hash Date : 2025-03-12 16:34:48 110 | NPU Driver Version : 32.0.203.257 111 | NPU Firmware Version : 1.0.7.97 112 | 113 | Device(s) Present 114 | |BDF |Name | 115 | |----------------|-----------| 116 | |[00c4:00:01.1] |NPU Strix | 117 | 118 | 119 | Sample JSON Output:: 120 | 121 | 122 | { 123 | "schema_version": { 124 | "schema": "JSON", 125 | "creation_date": "Tue Mar 18 22:43:38 2025 GMT" 126 | }, 127 | "system": { 128 | "host": { 129 | "os": { 130 | "sysname": "Windows NT", 131 | "release": "26100", 132 | "machine": "x86_64", 133 | "distribution": "Microsoft Windows 11 Enterprise", 134 | "model": "HP OmniBook Ultra Laptop 14-fd0xxx", 135 | "hostname": "XCOUDAYD02", 136 | "memory_bytes": "0x7d3f62000", 137 | "cores": "20", 138 | "bios_vendor": "HP", 139 | "bios_version": "W81 Ver. 01.01.14" 140 | }, 141 | "xrt": { 142 | "version": "2.19.0", 143 | "branch": "HEAD", 144 | "hash": "f62307ddadf65b54acbed420a9f0edc415fefafc", 145 | "build_date": "2025-03-12 16:34:48", 146 | "drivers": [ 147 | { 148 | "name": "NPU Driver", 149 | "version": "32.0.203.257" 150 | } 151 | ] 152 | }, 153 | "devices": [ 154 | { 155 | "bdf": "00c4:00:01.1", 156 | "device_class": "Ryzen", 157 | "name": "NPU Strix", 158 | "id": "0x0", 159 | "firmware_version": "1.0.7.97", 160 | "instance": "mgmt(inst=1)", 161 | "is_ready": "true" 162 | } 163 | ] 164 | } 165 | } 166 | } 167 | 168 | 169 | 170 | 171 | Platform Information 172 | ==================== 173 | 174 | Reports more detailed information about the NPU, such as the performance mode and power consumption. 175 | 176 | .. code-block:: shell 177 | 178 | xrt-smi examine --report platform 179 | 180 | Sample Command Line Output:: 181 | 182 | -------------------------- 183 | [00c5:00:01.1] : NPU Strix 184 | -------------------------- 185 | Platform 186 | Name : NPU Strix 187 | Performance Mode : Default 188 | 189 | Power : 1.277 Watts 190 | 191 | |memo| **NOTE**: Power reporting is not supported on PHX and HPT NPUs. Power reporting is only available on STX devices and onwards. 192 | 193 | NPU Partitions 194 | ============== 195 | 196 | Reports details about the NPU partition and column occupancy on the NPU. 197 | 198 | .. code-block:: shell 199 | 200 | xrt-smi examine --report aie-partitions 201 | 202 | Sample Command Line Output:: 203 | 204 | -------------------------- 205 | [00c5:00:01.1] : NPU Strix 206 | -------------------------- 207 | AIE Partitions 208 | Partition Index: 0 209 | Columns: [0, 1, 2, 3] 210 | HW Contexts: 211 | |PID |Ctx ID |Status |Instr BO |Sub |Compl |Migr |Err |Prio |GOPS |EGOPS |FPS |Latency | 212 | |-------|--------|--------|----------|-----|-------|------|-----|--------|------|-------|-----|---------| 213 | |20696 |0 |Active |64 KB |57 |56 |0 |0 |Normal |0 |0 |0 |0 | 214 | 215 | 216 | NPU Context Bindings 217 | ==================== 218 | 219 | Reports details about the columns to NPU HW context binding. 220 | 221 | .. code-block:: shell 222 | 223 | xrt-smi examine --report aie-partitions --verbose 224 | 225 | Sample Command Line Output:: 226 | 227 | Verbose: Enabling Verbosity 228 | Verbose: SubCommand: examine 229 | 230 | -------------------------- 231 | [00c5:00:01.1] : NPU Strix 232 | -------------------------- 233 | AIE Partitions 234 | Partition Index: 0 235 | Columns: [0, 1, 2, 3] 236 | HW Contexts: 237 | |PID |Ctx ID |Status |Instr BO |Sub |Compl |Migr |Err |Prio |GOPS |EGOPS |FPS |Latency | 238 | |-------|--------|--------|----------|-----|-------|------|-----|--------|------|-------|-----|---------| 239 | |20696 |0 |Active |64 KB |57 |56 |0 |0 |Normal |0 |0 |0 |0 | 240 | 241 | AIE Columns 242 | |Column ||HW Context Slot | 243 | |--------||-----------------| 244 | |0 ||[1] | 245 | |1 ||[1] | 246 | |2 ||[1] | 247 | |3 ||[1] | 248 | 249 | 250 | 251 | 252 | 253 | ******************************* 254 | xrt-smi validate 255 | ******************************* 256 | 257 | Executing a Sanity Check on the NPU 258 | =================================== 259 | 260 | Runs a set of built-in NPU sanity tests which includes latency, throughput, and gemm. 261 | 262 | Note: All tests are run in performance mode. 263 | 264 | - ``latency`` - this test executes a no-op control code and measures the end-to-end latency on all columns 265 | - ``throughput`` - this test loops back the input data from DDR through a MM2S Shim DMA channel back to DDR through a S2MM Shim DMA channel. The data movement within the AIE array follows the lowest latency path i.e. movement is restricted to just the Shim tile. 266 | - ``gemm`` - An INT8 GeMM kernel is deployed on all 32 cores by the application. Each core is storing cycle count in the core data memory. The cycle count is read by the firmware. The TOPS application uses the "XBUTIL" tool to capture the IPUHCLK while the workload runs. Once all cores are executed, the cycle count from all cores will be synced back to the host. Finally, the application uses IPUHCLK, core cycle count, and GeMM kernel size to calculate the TOPS. This is a full array test and it should not be run while another workload is running. **NOTE**: This command is not supported on PHX and HPT NPUs. 267 | - ``all`` - All applicable validate tests will be executed (default) 268 | 269 | 270 | .. code-block:: shell 271 | 272 | xrt-smi validate --run all 273 | 274 | |memo| **NOTE**: Some sanity checks may fail if other applications (for example MEP, Microsoft Experience Package) are also using the NPU. 275 | 276 | Sample Command Line Output:: 277 | 278 | 279 | Validate Device : [00c4:00:01.1] 280 | Platform : NPU Strix 281 | Power Mode : Performance 282 | ------------------------------------------------------------------------------- 283 | Test 1 [00c4:00:01.1] : gemm 284 | Details : TOPS: 51.3 285 | Test Status : [PASSED] 286 | ------------------------------------------------------------------------------- 287 | Test 2 [00c4:00:01.1] : latency 288 | Details : Average latency: 84.2 us 289 | Test Status : [PASSED] 290 | ------------------------------------------------------------------------------- 291 | Test 3 [00c4:00:01.1] : throughput 292 | Details : Average throughput: 59891.0 ops 293 | Test Status : [PASSED] 294 | ------------------------------------------------------------------------------- 295 | Validation completed. Please run the command '--verbose' option for more details 296 | 297 | ******************************* 298 | xrt-smi configure 299 | ******************************* 300 | 301 | Managing the Performance Level of the NPU 302 | ========================================= 303 | 304 | To set the performance level of the NPU, you can choose from the following modes: powersaver, balanced, performance, or default. Use the command below: 305 | 306 | .. code-block:: shell 307 | 308 | xrt-smi configure --pmode 309 | 310 | - ``default`` - adapts to the Windows Power Mode setting, which can be adjusted under System -> Power & battery -> Power mode. For finer control of the NPU settings, it is recommended to use the xrt-smi mode setting, which overrides the Windows Power mode and ensures optimal results. 311 | - ``powersaver`` - configures the NPU to prioritize power saving, preserving laptop battery life. 312 | - ``balanced`` - configures the NPU to provide a compromise between power saving and performance. 313 | - ``performance`` - configures the NPU to prioritize performance, consuming more power. 314 | - ``turbo`` - configures the NPU for maximum performance performance, requires AC power to be plugged in otherwise uses ``performance`` mode. 315 | 316 | Example: Setting the NPU to high-performance mode 317 | 318 | .. code-block:: shell 319 | 320 | xrt-smi configure --pmode performance 321 | 322 | To check the current performance level, use the following command: 323 | 324 | .. code-block:: shell 325 | 326 | xrt-smi examine --report platform 327 | 328 | --------------------------------------------------------------------------------