├── .github
└── dependabot.yml
├── .nojekyll
├── .readthedocs.yaml
├── License
├── README.md
└── docs
├── _static
└── llm-table.css
├── _templates
└── flavors
│ └── local
│ ├── footer.jinja
│ ├── header.jinja
│ └── left-side-menu.jinja
├── ai_analyzer.rst
├── app_development.rst
├── conf.py
├── examples.rst
├── getstartex.rst
├── gpu
└── ryzenai_gpu.rst
├── hybrid_oga.rst
├── icons.txt
├── images
└── rai-sw.png
├── index.rst
├── inst.rst
├── licenses.rst
├── llm
├── high_level_python.rst
├── overview.rst
└── server_interface.rst
├── model_quantization.rst
├── modelcompat.rst
├── modelrun.rst
├── npu_oga.rst
├── oga_model_prepare.rst
├── rai_linux.rst
├── relnotes.rst
├── ryzen_ai_libraries.rst
├── sphinx
├── requirements.in
└── requirements.txt
└── xrt_smi.rst
/.github/dependabot.yml:
--------------------------------------------------------------------------------
1 | # To get started with Dependabot version updates, you'll need to specify which
2 | # package ecosystems to update and where the package manifests are located.
3 | # Please see the documentation for all configuration options:
4 | # https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
5 |
6 | version: 2
7 | updates:
8 | - package-ecosystem: "pip" # See documentation for possible values
9 | directory: "/docs/sphinx" # Location of package manifests
10 | open-pull-requests-limit: 10
11 | schedule:
12 | interval: "daily"
13 | target-branch: "develop"
14 | labels:
15 | - "dependencies"
16 | reviewers:
17 | - "samjwu"
18 |
--------------------------------------------------------------------------------
/.nojekyll:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/amd/ryzen-ai-documentation/de198aa9295c834055eb64b0d47796dafec63203/.nojekyll
--------------------------------------------------------------------------------
/.readthedocs.yaml:
--------------------------------------------------------------------------------
1 | # Read the Docs configuration file
2 | # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
3 |
4 | version: 2
5 |
6 | sphinx:
7 | configuration: docs/conf.py
8 |
9 | formats: [htmlzip, pdf, epub]
10 |
11 | python:
12 | install:
13 | - requirements: docs/sphinx/requirements.txt
14 |
15 | build:
16 | os: ubuntu-22.04
17 | tools:
18 | python: "3.8"
19 |
--------------------------------------------------------------------------------
/License:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2024, Advanced Micro Devices, Inc. All rights reserved.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6 |
7 | The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
8 |
9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | Ryzen AI Software
4 |
5 |
6 |
7 | # Ryzen AI Software
8 |
9 | Version 1.4
10 |
11 | # License
12 | Ryzen AI is licensed under [MIT License](https://github.com/amd/ryzen-ai-documentation/blob/main/License). Refer to the [LICENSE File](https://github.com/amd/ryzen-ai-documentation/blob/main/License) for the full license text and copyright notice.
13 |
14 | # Please Read: Important Legal Notices
15 | The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or
16 | otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED "AS IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY
17 | DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY
18 | PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF
19 | AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
20 |
21 | ## AUTOMOTIVE APPLICATIONS DISCLAIMER
22 | AUTOMOTIVE PRODUCTS (IDENTIFIED AS "XA" IN THE PART NUMBER) ARE NOT WARRANTED FOR USE IN THE DEPLOYMENT OF AIRBAGS OR FOR USE IN APPLICATIONS
23 | THAT AFFECT CONTROL OF A VEHICLE ("SAFETY APPLICATION") UNLESS THERE IS A SAFETY CONCEPT OR REDUNDANCY FEATURE CONSISTENT WITH THE ISO 26262 AUTOMOTIVE SAFETY STANDARD ("SAFETY DESIGN"). CUSTOMER SHALL, PRIOR TO USING OR DISTRIBUTING ANY SYSTEMS THAT INCORPORATE PRODUCTS, THOROUGHLY TEST SUCH SYSTEMS FOR SAFETY PURPOSES. USE OF PRODUCTS IN A SAFETY APPLICATION WITHOUT A SAFETY DESIGN IS FULLY AT THE RISK OF CUSTOMER, SUBJECT ONLY TO APPLICABLE LAWS AND REGULATIONS GOVERNING LIMITATIONS ON PRODUCT LIABILITY.
24 |
25 | ## Copyright
26 |
27 | © Copyright 2024 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, Ryzen, Vitis AI, and combinations thereof are trademarks of Advanced Micro Devices,
28 | Inc. AMBA, AMBA Designer, Arm, ARM1176JZ-S, CoreSight, Cortex, PrimeCell, Mali, and MPCore are trademarks of Arm Limited in the US and/or elsewhere. PCI, PCIe, and PCI Express are trademarks of PCI-SIG and used under license. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
29 |
--------------------------------------------------------------------------------
/docs/_static/llm-table.css:
--------------------------------------------------------------------------------
1 | /* Software Stack Table */
2 |
3 | .center-table {
4 | margin-left: auto;
5 | margin-right: auto;
6 | text-align: center;
7 | }
8 |
9 | .center-table th,
10 | .center-table td {
11 | border: 1px solid #ffffff; /* Adds vertical and horizontal lines */
12 | }
13 |
14 | /* Supported Lemonade LLMs Table */
15 |
16 | /* Vertical lines for the first and second position in the first row */
17 | .llm-table thead tr:nth-of-type(1) th:nth-of-type(1),
18 | .llm-table thead tr:nth-of-type(1) th:nth-of-type(2) {
19 | border-right: 1px solid #ffffff; /* White color for the border */
20 | }
21 |
22 | /* Vertical lines for the first and second position in the second row */
23 | .llm-table thead tr:nth-of-type(2) th:nth-of-type(1),
24 | .llm-table thead tr:nth-of-type(2) th:nth-of-type(3) {
25 | border-right: 1px solid #ffffff; /* White color for the border */
26 | }
27 |
28 | /* Vertical lines for the first and third position in all other rows */
29 | .llm-table tbody tr td:nth-of-type(1),
30 | .llm-table tbody tr td:nth-of-type(3) {
31 | border-right: 1px solid #ffffff; /* White color for the border */
32 | }
33 |
34 | /* Remove horizontal line between the two heading rows */
35 | .llm-table thead tr:nth-of-type(1) th {
36 | border-bottom: none;
37 | }
38 |
39 | /* Supported DeepSeek LLMs Table */
40 |
41 | /* Add vertical border to the right of the models column */
42 | .deepseek-table td:nth-child(1) {
43 | border-right: 1px solid white;
44 | }
45 |
46 | /* Vertical lines for the first and second position in the first row */
47 | .deepseek-table thead tr:nth-of-type(1) th:nth-of-type(1) {
48 | border-right: 1px solid #ffffff; /* White color for the border */
49 | }
50 |
51 | /* Vertical lines for the first and second position in the second row */
52 | .deepseek-table thead tr:nth-of-type(2) th:nth-of-type(1){
53 | border-right: 1px solid #ffffff; /* White color for the border */
54 | }
55 |
56 | /* Vertical lines for the first and third position in all other rows */
57 | .deepseek-table tbody tr td:nth-of-type(1){
58 | border-right: 1px solid #ffffff; /* White color for the border */
59 | }
60 |
61 | /* Add vertical lines around the hybrid_oga cell */
62 | .deepseek-table td:nth-child(2) {
63 | border-right: 1px solid white;
64 | }
65 |
66 | /* Remove right border between TTFT and TPS columns in the bottom two rows */
67 | .deepseek-table tbody tr:nth-child(2) td:nth-child(2),
68 | .deepseek-table tbody tr:nth-child(3) td:nth-child(2) {
69 | border-right: none;
70 | }
71 |
72 |
73 |
74 |
75 |
76 |
77 |
78 |
79 |
80 |
--------------------------------------------------------------------------------
/docs/_templates/flavors/local/footer.jinja:
--------------------------------------------------------------------------------
1 | {%
2 | set license_link = ("Ryzen AI Licenses and Disclaimers", "")
3 | %}
4 |
--------------------------------------------------------------------------------
/docs/_templates/flavors/local/header.jinja:
--------------------------------------------------------------------------------
1 | {% macro top_level_header(branch, latest_version, release_candidate_version) -%}
2 | Ryzen AI
3 | {%- endmacro -%}
4 |
5 | {%
6 | set nav_secondary_items = {
7 | "GitHub": theme_repository_url|replace("-docs", ""),
8 | "Community": "https://community.amd.com/t5/ai/ct-p/amd_ai",
9 | "Products": "https://www.amd.com/en/products/ryzen-ai"
10 | }
11 | %}
12 |
--------------------------------------------------------------------------------
/docs/_templates/flavors/local/left-side-menu.jinja:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/amd/ryzen-ai-documentation/de198aa9295c834055eb64b0d47796dafec63203/docs/_templates/flavors/local/left-side-menu.jinja
--------------------------------------------------------------------------------
/docs/ai_analyzer.rst:
--------------------------------------------------------------------------------
1 | ###########
2 | AI Analyzer
3 | ###########
4 |
5 | AMD AI Analyzer is a tool that supports analysis and visualization of model compilation and inference on Ryzen AI. The primary goal of the tool is to help users better understand how the models are processed by the hardware, and to identify performance bottlenecks that may be present during model inference. Using AI Analyzer, users can visualize graph and operator partitions between the NPU and CPU.
6 |
7 | Installation
8 | ~~~~~~~~~~~~
9 |
10 | If you installed the Ryzen AI software using automatic installer, AI Analyzer is already installed in the conda environment.
11 |
12 | If you manually installed the software, you will need to install the AI Analyzer wheel file in your environment.
13 |
14 |
15 | .. code-block::
16 |
17 | python -m pip install path\to\RyzenAI\installation\files\aianalyzer-.whl
18 |
19 |
20 | Enabling Profiling and Visualization
21 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
22 |
23 | Profiling and Visualization can be enabled by passing additional provider options to the ONNXRuntime Inference Session. An example is shown below:
24 |
25 | .. code-block::
26 |
27 | provider_options = [{
28 | 'config_file': 'vaip_config.json',
29 | 'cacheDir': str(cache_dir),
30 | 'cacheKey': 'modelcachekey',
31 | 'ai_analyzer_visualization': True,
32 | 'ai_analyzer_profiling': True,
33 | }]
34 | session = ort.InferenceSession(model.SerializeToString(), providers=providers,
35 | provider_options=provider_options)
36 |
37 |
38 | The ``ai_analyzer_profiling`` flag enables generation of artifacts related to the inference profile. The ``ai_analyzer_visualization`` flag enables generation of artifacts related to graph partitions and operator fusion. These artifacts are generated as .json files in the current run directory.
39 |
40 | AI Analyzer also supports native ONNX Runtime profiling, which can be used to analyze the parts of the session running on the CPU. Users can enable ONNX Runtime profiling through session options and pass it alongside the provider options as shown below:
41 |
42 | .. code-block::
43 |
44 | # Configure session options for profiling
45 | sess_options = rt.SessionOptions()
46 | sess_options.enable_profiling = True
47 |
48 | provider_options = [{
49 | 'config_file': 'vaip_config.json',
50 | 'cacheDir': str(cache_dir),
51 | 'cacheKey': 'modelcachekey',
52 | 'ai_analyzer_visualization': True,
53 | 'ai_analyzer_profiling': True,
54 | }]
55 |
56 | session = ort.InferenceSession(model.SerializeToString(), sess_options, providers=providers,
57 | provider_options=provider_options)
58 |
59 |
60 | Launching AI Analyzer
61 | ~~~~~~~~~~~~~~~~~~~~~
62 |
63 | Once the artifacts are generated, `aianalyzer` can be invoked through the command line as follows:
64 |
65 |
66 | .. code-block::
67 |
68 | aianalyzer
69 |
70 |
71 | **Positional Arguments**
72 |
73 | ``logdir``: Path to the folder containing generated artifacts
74 |
75 | Additional Options
76 |
77 | ``-v``, ``--version``: Show the version info and exit.
78 |
79 | ``-b ADDR``, ``--bind ADDR``: Hostname or IP address on which to listen, default is 'localhost'.
80 |
81 | ``-p PORT``, ``--port PORT``: TCP port on which to listen, default is '8000'.
82 |
83 | ``-n``, ``--no-browser``: Prevent the opening of the default url in the browser.
84 |
85 | ``-t TOKEN``, ``--token TOKEN``: Token used for authenticating first-time connections to the server. The default is to generate a new, random token. Setting to an empty string disables authentication altogether, which is NOT RECOMMENDED.
86 |
87 |
88 |
89 | Features
90 | ~~~~~~~~
91 |
92 | AI Analyzer provides visibility into how your AI model is compiled and executed on Ryzen AI hardware. Its two main use cases are:
93 |
94 | 1. Analyzing how the model was partitioned and mapped onto Ryzen AI's CPU and NPU accelerator
95 | 2. Profiling model performance as it executes inferencing workloads
96 |
97 | When launched, the AI Analyzer server will scan the folder specified with the logdir argument and detect and load all files relevant to compilation and/or inferencing per the ai_analyzer_visualization and ai_anlayzer_profiling flags.
98 |
99 | You can instruct the AI Analyzer server to either start a browser on the same host or else return to you a URL that you can then load into a browser on any host.
100 |
101 |
102 | User Interface
103 | ~~~~~~~~~~~~~~
104 |
105 | AI Analyzer has the following three sections as seen in the left-panel navigator
106 |
107 | 1. PARTITIONING - A breakdown of your model was assigned to execute inference across CPU and NPU
108 | 2. NPU INSIGHTS - A detailed look at the how your model was optimized for inference execution on NPU
109 | 3. PERFORMANCE - A breakdown of inference execution through the model
110 |
111 |
112 | These sections are described in more detail below
113 |
114 |
115 |
116 | PARTITIONING
117 | @@@@@@@@@@@@
118 |
119 | This section is comprised of two pages: Summary and Graph
120 |
121 | **Summary**
122 |
123 | The Summary page gives an overview of how the models operators have been assigned to Ryzen's CPU and NPU along with charts capturing GigaOp (GOP) offloading by operator type .
124 |
125 | There is also table titled "CPU Because" that shows the reasons why certain operators were not offloaded to the NPU.
126 |
127 | **Graph**
128 |
129 | The graph page shows an interactive diagram of the partitioned ONNX model, showing graphically how the layers are assigned to the Ryzen hardware.
130 |
131 |
132 |
133 | Toolbar
134 |
135 | - You can choose to show/hide individual NPU partitions, if any, with the "Filter by Partition" button
136 | - A panel that displays properties for selected objects can be shown or hidden via the "Show Properties" toggle button
137 | - The model table can be shown and hidden via the "Show Table" toggle button.
138 | - Settings
139 |
140 | - Show Processor will separate operators that run on CPU and NPU respectively
141 | - Show Partition will separate operators running on the NPU by their respective NPU partition, if any
142 | - Show Instance Name will display the full hierarchical name for the operators in the ONNX model
143 |
144 | All objects in the graph have properties which can be viewed to the right of the graph.
145 |
146 |
147 |
148 | *Model Table*
149 |
150 | This table below the graph lists all objects in the partitioned ONNX model:
151 |
152 | - Processor (NPU or CPU)
153 | - Function (Layer)
154 | - Operator
155 | - Ports
156 | - NPU Partitions
157 |
158 |
159 | NPU INSIGHTS
160 | @@@@@@@@@@@@
161 |
162 | This section is comprised of three pages: Summary, Original Graph, and Optimized Graph.
163 |
164 |
165 |
166 | **Summary**
167 |
168 | The Summary page gives an overview of how your model was mapped to the AMD Ryzen NPU. Charts are displayed showing statistics on the number of operators and total GMACs that have been mapped to the NPU (and if necessary, back to CPU via the "Failsafe CPU" mechanism). The statistics are shown per operator type and per NPU partition.
169 |
170 |
171 |
172 | **Original Graph**
173 |
174 | This is an interactive graph representing your model lowered to supported NPU primitive operators, and broken up into partitions if necessary. As with the PARTITIONING graph, there is a companion table containing all of the model elements that will cross-probe to the graph view. The objects in the graph and table will also cross-probe to the PARTITIONING graph.
175 |
176 | Toolbar
177 |
178 | You can choose to show/hide individual NPU partitions, if any, with the "Filter by Partition" button
179 | A panel that displays properties for selected objects can be shown or hidden via the "Show Properties" toggle button
180 | A code viewer showing the MLIR source code with cross-probing can be shown/hidden via the "Show Code View" button
181 | The table below can be shown and hidden via the "Show Table" toggle button.
182 | Display options for the graph can be accessed with the "Settings" button
183 |
184 |
185 |
186 |
187 | **Optimized Graph**
188 |
189 | This page shows the final model that will be mapped to the NPU after all transformations and optimizations such as fusion and chaining. It will also report the operators that had to be moved back to the CPU via the "Failsafe CPU" mechanism. As usual, there is a companion table below that contains all of the graph's elements, and cross-selection is supported to and from the PARTITIONING graph and the Original Graph.
190 |
191 | Toolbar
192 |
193 | You can choose to show/hide individual NPU partitions, if any, with the "Filter by Partition" button
194 | A panel that displays properties for selected objects can be shown or hidden via the "Show Properties" toggle button
195 | The table below can be shown and hidden via the "Show Table" toggle button.
196 | Display options for the graph can be accessed with the "Settings" button
197 |
198 |
199 | PERFORMANCE
200 | @@@@@@@@@@@
201 |
202 | This section is used to view the performance of your model on RyzenAI when running one or more inferences. It is comprised of two pages: Summary and Timeline.
203 |
204 |
205 |
206 | **Summary**
207 |
208 | The performance summary page shows several overall statistics on the inference(s) as well as charts breaking down operator runtime by operator. If you run with ONNX runtime profiler enabled, you will see overall inference time including layers that run on the CPU. If you have NPU profiling enabled via the ai_analyzer_profiling flag, you will see numerous NPU-based statistics, including GOP and MAC efficiency and a chart of runtime per NPU operator type.
209 |
210 | The clock frequency field shows the assumed NPU clock frequency, but it can be edited. If you change the frequency, all timestamp data that is collected as clock cycles but displayed in time units will be adjusted accordingly.
211 |
212 |
213 | **Timeline**
214 |
215 | The Performance timeline shows a layer-by-layer breakdown of your model's execution. The upper section is a graphical depiction of layer execution across a timeline, while the lower section shows the same information in tabular format. It is important to note that the Timeline page shows one inference at a time, so if you have captured profiling data for two or more inferences, you can choose which one to display with the "Inferences" chooser.
216 |
217 |
218 |
219 | Within each inference, you can examine the overall model execution or the detailed NPU execution data by using the "Partition" chooser.
220 |
221 |
222 |
223 | Toolbar
224 |
225 | A panel that displays properties for selected objects can be shown or hidden via the "Show Properties" toggle button
226 | The table below can be shown and hidden via the "Show Table" toggle button.
227 | The graphical timeline can be downloaded to SVG via the "Export to SVG" button
228 |
229 |
230 | ..
231 | ------------
232 |
233 | #####################################
234 | License
235 | #####################################
236 |
237 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
238 |
239 |
--------------------------------------------------------------------------------
/docs/app_development.rst:
--------------------------------------------------------------------------------
1 | .. include:: /icons.txt
2 |
3 | #######################
4 | Application Development
5 | #######################
6 |
7 | This page captures requirements and recommendations for developers looking to create, package and distribute applications targeting NPU-enabled AMD processors.
8 |
9 |
10 |
11 | .. _driver-compatibility:
12 |
13 | *************************************
14 | VitisAI EP / NPU Driver Compatibility
15 | *************************************
16 |
17 | The VitisAI EP requires a compatible version of the NPU drivers. For each version of the VitisAI EP, compatible drivers are bounded by a minimum version and a maximum release date. NPU drivers are backward compatible with VitisAI EP released up to 3 years before. The maximum driver release date is therefore set to 3 years after the release date of the corresponding VitisAI EP.
18 |
19 | The table below summarizes the driver requirements for the different versions of the VitisAI EP.
20 |
21 | .. list-table::
22 | :header-rows: 1
23 |
24 | * - VitisAI EP version
25 | - Minimum NPU Driver version
26 | - Maximum NPU Driver release date
27 | * - 1.4
28 | - 32.0.203.257
29 | - March 25th, 2028
30 | * - 1.3.1
31 | - 32.0.203.242
32 | - January 17th, 2028
33 | * - 1.3
34 | - 32.0.203.237
35 | - November 26th, 2027
36 | * - 1.2
37 | - 32.0.201.204
38 | - July 30th, 2027
39 |
40 | The application must check that NPU drivers compatible with the version of the Vitis AI EP being used are installed.
41 |
42 | .. _apu-types:
43 |
44 | *****************
45 | APU Types
46 | *****************
47 |
48 | The Ryzen AI Software supports different types of NPU-enabled APUs. These APU types are referred to as PHX, HPT, STX and KRK.
49 |
50 | To programmatically determine the type of the local APU, it is possible to enumerate the PCI devices and check for an instance with a matching Hardware ID.
51 |
52 | .. list-table::
53 | :header-rows: 1
54 |
55 | * - Vendor
56 | - Device
57 | - Revision
58 | - APU Type
59 | * - 0x1022
60 | - 0x1502
61 | - 0x00
62 | - PHX or HPT
63 | * - 0x1022
64 | - 0x17F0
65 | - 0x00
66 | - STX
67 | * - 0x1022
68 | - 0x17F0
69 | - 0x10
70 | - STX
71 | * - 0x1022
72 | - 0x17F0
73 | - 0x11
74 | - STX
75 | * - 0x1022
76 | - 0x17F0
77 | - 0x20
78 | - KRK
79 |
80 | The application must check that it is running on an AMD processor with an NPU, and that the NPU type is supported by the version of the Vitis AI EP being used.
81 |
82 |
83 |
84 | ************************************
85 | Application Development Requirements
86 | ************************************
87 |
88 | ONNX-RT Session
89 | ===============
90 |
91 | The application should only use the Vitis AI Execution Provider if the following conditions are met:
92 |
93 | - The application is running on an AMD processor with an NPU type supported by the version of the Vitis AI EP being used. See :ref:`list ` above in this page.
94 | - NPU drivers compatible with the version of the Vitis AI EP being used are installed. See :ref:`compatibility table ` above in this page.
95 |
96 | |memo| **NOTE**: Sample C++ code implementing the compatibility checks to be performed before using the VitisAI EP is provided here: https://github.com/amd/RyzenAI-SW/tree/main/utilities/npu_check
97 |
98 |
99 | VitisAI EP Provider Options
100 | ===========================
101 |
102 | For INT8 models, the application should detect which type of APU is present (PHX/HPT/STX/KRK) and set the ``xclbin`` provider option accordingly. Refer to the section about :ref:`compilation of INT8 models ` for details about this.
103 |
104 | For BF16 models, the application should set the ``config_file`` provider option to use the same file as the one which was used to precompile the BF16 model. Refer to the section about :ref:`compilation of BF16 models ` for details about this.
105 |
106 |
107 | Cache Management
108 | ================
109 |
110 | Cache directories generated by the Vitis AI Execution Provider should not be reused across different versions of the Vitis AI EP or across different version of the NPU drivers.
111 |
112 | The application should check the version of the Vitis AI EP and of the NPU drivers. If the application detects a version change, it should delete the cache, or create a new cache directory with a different name.
113 |
114 |
115 | Pre-Compiled Models
116 | ===================
117 |
118 | The deployment version of the VitisAI Execution Provider (EP) does not support the on-the-fly compilation of BF16 models. Applications utilizing BF16 models must include pre-compiled versions of these models. The VitisAI EP can then load the pre-compiled models and deploy them efficiently on the NPU.
119 |
120 | Although including pre-compiled versions of INT8 models is not mandatory, it is beneficial as it reduces session creation time and enhances the end-user experience.
121 |
122 | |
123 |
124 | **********************************
125 | Application Packaging Requirements
126 | **********************************
127 |
128 | |excl| **IMPORTANT**: A patched version of the ``%RYZEN_AI_INSTALLATION_PATH%\deployment`` folder is available for download at the following link: `Download Here `_. This patched ``deployment`` folder is designed to replace the one included in the official installation of Ryzen AI 1.4. The following instructions assume that the original ``deployment`` folder has been replaced with the updated version.
129 |
130 | A C++ application built on the Ryzen AI ONNX Runtime requires the following components to be included in its distribution package.
131 |
132 | .. rubric:: For INT8 models
133 |
134 | - DLLs:
135 |
136 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\dyn_dispatch_core.dll
137 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime.dll
138 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_shared.dll
139 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_vitisai.dll
140 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_vitisai_ep.dll
141 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\transaction.dll
142 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\xclbin.dll
143 |
144 | - NPU Binary files (.xclbin) from the ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins`` folder
145 |
146 | - Recommended but not mandatory: pre-compiled models in the form of :ref:`Vitis AI EP cache folders ` or :ref:`Onnx Runtime EP context models `
147 |
148 | .. rubric:: For BF16 models
149 |
150 | - DLLs:
151 |
152 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\dyn_dispatch_core.dll
153 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime.dll
154 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_shared.dll
155 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_vitisai.dll
156 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_vitisai_ep.dll
157 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\transaction.dll
158 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\xclbin.dll
159 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\flexmlrt\\flexmlrt.dll
160 |
161 | - Pre-compiled models in the form of :ref:`Vitis AI EP cache folders `
162 |
163 | .. rubric:: For Hybrid LLMs
164 |
165 | - DLLs:
166 |
167 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\hybrid-llm\\onnx_custom_ops.dll
168 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\hybrid-llm\\onnxruntime-genai.dll
169 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\hybrid-llm\\ryzen_mm.dll
170 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\hybrid-llm\\ryzenai_onnx_utils.dll
171 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\DirectML.dll
172 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime.dll
173 |
174 | .. rubric:: For NPU-only LLMs
175 |
176 | - DLLs:
177 |
178 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\npu-llm\\onnxruntime-genai.dll
179 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_vitis_ai_custom_ops.dll
180 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_shared.dll
181 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_vitisai_ep.dll
182 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\dyn_dispatch_core.dll
183 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime_providers_vitisai.dll
184 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\transaction.dll
185 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\onnxruntime.dll
186 | - %RYZEN_AI_INSTALLATION_PATH%\\deployment\\voe\\xclbin.dll
187 |
188 | - VAIP LLM configuration file: %RYZEN_AI_INSTALLATION_PATH%\\deployment\\npu-llm\\vaip_llm.json
189 |
190 |
191 | ..
192 | ------------
193 |
194 | #####################################
195 | License
196 | #####################################
197 |
198 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
199 |
--------------------------------------------------------------------------------
/docs/conf.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | #
4 | # Configuration file for the Sphinx documentation builder.
5 | #
6 | # This file does only contain a selection of the most common options. For a
7 | # full list see the documentation:
8 | # http://www.sphinx-doc.org/en/master/config
9 |
10 | # -- Path setup --------------------------------------------------------------
11 |
12 | # If extensions (or modules to document with autodoc) are in another directory,
13 | # add these directories to sys.path here. If the directory is relative to the
14 | # documentation root, use os.path.abspath to make it absolute, like shown here.
15 | #
16 | import os
17 | import sys
18 | import urllib.parse
19 | # import recommonmark
20 | # from recommonmark.transform import AutoStructify
21 | # from recommonmark.parser import CommonMarkParser
22 |
23 | # sys.path.insert(0, os.path.abspath('.'))
24 | sys.path.insert(0, os.path.abspath('_ext'))
25 | sys.path.insert(0, os.path.abspath('docs'))
26 |
27 | # -- Project information -----------------------------------------------------
28 |
29 | project = 'Ryzen AI Software'
30 | copyright = '2023-2024, Advanced Micro Devices, Inc'
31 | author = 'Advanced Micro Devices, Inc'
32 |
33 | # The short X.Y version
34 | version = '1.4'
35 | # The full version, including alpha/beta/rc tags
36 | release = '1.4'
37 | html_last_updated_fmt = 'March 24, 2025'
38 |
39 |
40 | # -- General configuration ---------------------------------------------------
41 |
42 | html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "ryzenai.docs.amd.com")
43 | html_context = {}
44 | if os.environ.get("READTHEDOCS", "") == "True":
45 | html_context["READTHEDOCS"] = True
46 |
47 | # If your documentation needs a minimal Sphinx version, state it here.
48 | #
49 | # needs_sphinx = '1.0'
50 |
51 | # Add any Sphinx extension module names here, as strings. They can be
52 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
53 | # ones.
54 | extensions = [
55 | 'sphinx.ext.graphviz',
56 | 'breathe',
57 | 'sphinx.ext.autodoc',
58 | 'sphinx.ext.doctest',
59 | 'sphinx.ext.intersphinx',
60 | 'sphinx.ext.todo',
61 | 'sphinx.ext.coverage',
62 | 'sphinx.ext.mathjax',
63 | 'sphinx.ext.ifconfig',
64 | 'sphinx.ext.viewcode',
65 | 'sphinx.ext.githubpages',
66 | 'linuxdoc.rstFlatTable',
67 | "notfound.extension",
68 | #'recommonmark',
69 | #'sphinx_markdown_tables',
70 | #'edit_on_github',
71 | # Auto-generate section labels.
72 | #'sphinx.ext.autosectionlabel',
73 | #'rst2pdf.pdfbuilder'
74 | ]
75 |
76 | graphviz_output_format = 'svg'
77 |
78 | # Prefix document path to section labels, otherwise autogenerated labels would look like 'heading'
79 | # rather than 'path/to/file:heading'
80 | autosectionlabel_prefix_document = True
81 |
82 |
83 |
84 | # Breathe Configuration
85 | breathe_projects = {
86 | "XRT":"../xml",
87 | }
88 |
89 |
90 |
91 | # Configuration for rst2pdf
92 | pdf_documents = [('index', u'', u'', u'AMD, Inc.'),]
93 | # index - master document
94 | # rst2pdf - name of the file that will be created
95 | # Sample rst2pdf doc - title of the pdf
96 | # Your Name - author name in the pdf
97 |
98 |
99 | # Configure 'Edit on GitHub' extension
100 | edit_on_github_project = '/amd/ryzen-ai-documentation'
101 | edit_on_github_branch = 'main/docs'
102 |
103 | # Add any paths that contain templates here, relative to this directory.
104 | templates_path = ['_templates']
105 |
106 | # Expand/Collapse functionality
107 | def setup(app):
108 | app.add_css_file('custom.css')
109 | app.add_css_file("llm-table.css")
110 |
111 |
112 | # The suffix(es) of source filenames.
113 | # You can specify multiple suffix as a list of string:
114 | #
115 | # source_suffix = ['.rst', '.md']
116 | source_suffix = {
117 | '.rst': 'restructuredtext',
118 | #'.txt': 'restructuredtext',
119 | '.md': 'markdown',
120 | }
121 |
122 | # For MD support
123 | source_parsers = {
124 | #'.md': CommonMarkParser,
125 | # myst_parser testing
126 | #'.md':
127 | }
128 |
129 | # The master toctree document.
130 | master_doc = 'index'
131 |
132 | # The language for content autogenerated by Sphinx. Refer to documentation
133 | # for a list of supported languages.
134 | #
135 | # This is also used if you do content translation via gettext catalogs.
136 | # Usually you set "language" from the command line for these cases.
137 | language = 'en'
138 |
139 | # List of patterns, relative to source directory, that match files and
140 | # directories to ignore when looking for source files.
141 | # This patterns also effect to html_static_path and html_extra_path
142 | exclude_patterns = ['include', 'api_rst', '_build', 'Thumbs.db', '.DS_Store']
143 |
144 | # The name of the Pygments (syntax highlighting) style to use.
145 | pygments_style = 'sphinx'
146 |
147 | # If true, `todo` and `todoList` produce output, else they produce nothing.
148 | todo_include_todos = False
149 |
150 | primary_domain = 'c'
151 | highlight_language = 'none'
152 |
153 |
154 | # -- Options for HTML output -------------------------------------------------
155 |
156 | # The theme to use for HTML and HTML Help pages. See the documentation for
157 | # a list of builtin themes.
158 | #
159 | ##html_theme = 'karma_sphinx_theme'
160 | html_theme = 'rocm_docs_theme'
161 | ##html_theme_path = ["./_themes"]
162 |
163 |
164 | # Theme options are theme-specific and customize the look and feel of a theme
165 | # further. For a list of options available for each theme, see the
166 | # documentation.
167 | #
168 | ##html_logo = '_static/xilinx-header-logo.svg'
169 | html_theme_options = {
170 | "link_main_doc": False,
171 | "flavor": "local"
172 | }
173 |
174 | # Add any paths that contain custom static files (such as style sheets) here,
175 | # relative to this directory. They are copied after the builtin static files,
176 | # so a file named "default.css" will overwrite the builtin "default.css".
177 | html_static_path = ["_static"]
178 | html_css_files = ["_static/llm-table.css"]
179 |
180 | # Custom sidebar templates, must be a dictionary that maps document names
181 | # to template names.
182 | #
183 | # The default sidebars (for documents that don't match any pattern) are
184 | # defined by theme itself. Builtin themes are using these templates by
185 | # default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
186 | # 'searchbox.html']``.
187 | #
188 | #html_sidebars = {
189 | # '**': [
190 | # 'about.html',
191 | # 'navigation.html',
192 | # 'relations.html',
193 | # 'searchbox.html',
194 | # 'donate.html',
195 | # ]}
196 |
197 |
198 | # -- Options for HTMLHelp output ---------------------------------------------
199 |
200 | # Output file base name for HTML help builder.
201 | htmlhelp_basename = 'ProjectName'
202 |
203 |
204 | # -- Options for LaTeX output ------------------------------------------------
205 | latex_engine = 'pdflatex'
206 | latex_elements = {
207 | # The paper size ('letterpaper' or 'a4paper').
208 | #
209 | 'papersize': 'letterpaper',
210 |
211 | # The font size ('10pt', '11pt' or '12pt').
212 | #
213 | 'pointsize': '12pt',
214 |
215 | # Additional stuff for the LaTeX preamble.
216 | #
217 | # 'preamble': '',
218 |
219 | # Latex figure (float) alignment
220 | #
221 | # 'figure_align': 'htbp',
222 | }
223 |
224 | # Grouping the document tree into LaTeX files. List of tuples
225 | # (source start file, target name, title,
226 | # author, documentclass [howto, manual, or own class]).
227 | latex_documents = [
228 | (master_doc, 'ryzenai.tex', 'Ryzen AI',
229 | 'AMD', 'manual'),
230 | ]
231 |
232 |
233 | # -- Options for manual page output ------------------------------------------
234 |
235 | # One entry per manual page. List of tuples
236 | # (source start file, name, description, authors, manual section).
237 | man_pages = [
238 | (master_doc, 'ryzenai.tex', 'Ryzen AI',
239 | [author], 1)
240 | ]
241 |
242 |
243 | # -- Options for Texinfo output ----------------------------------------------
244 |
245 | # Grouping the document tree into Texinfo files. List of tuples
246 | # (source start file, target name, title, author,
247 | # dir menu entry, description, category)
248 | texinfo_documents = [
249 | (master_doc, 'ENTER YOUR LIBRARY ID HERE. FOR EXAMPLE: xfopencv', 'ENTER YOUR LIBRARY PROJECT NAME HERE',
250 | author, 'AMD', 'One line description of project.',
251 | 'Miscellaneous'),
252 | ]
253 |
254 |
255 | # -- Options for Epub output -------------------------------------------------
256 |
257 | # Bibliographic Dublin Core info.
258 | epub_title = project
259 |
260 | # The unique identifier of the text. This can be a ISBN number
261 | # or the project homepage.
262 | #
263 | # epub_identifier = ''
264 |
265 | # A unique identification for the text.
266 | #
267 | # epub_uid = ''
268 |
269 | # A list of files that should not be packed into the epub file.
270 | epub_exclude_files = ['search.html']
271 |
272 |
273 |
274 |
275 | # -- Options for rinoh ------------------------------------------
276 |
277 |
278 | rinoh_documents = [dict(doc='index', # top-level file (index.rst)
279 | target='manual')] # output file (manual.pdf)
280 |
281 |
282 |
283 | # -- Notfound (404) extension settings
284 |
285 | if "READTHEDOCS" in os.environ:
286 | components = urllib.parse.urlparse(os.environ["READTHEDOCS_CANONICAL_URL"])
287 | notfound_urls_prefix = components.path
288 |
289 |
290 | # -- Extension configuration -------------------------------------------------
291 | # At the bottom of conf.py
292 | #def setup(app):
293 | # app.add_config_value('recommonmark_config', {
294 | # 'url_resolver': lambda url: github_doc_root + url,
295 | # 'auto_toc_tree_section': 'Contents',
296 | # }, True)
297 | # app.add_transform(AutoStructify)
298 |
299 |
300 | #################################################################################
301 | #License
302 | #Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
303 |
--------------------------------------------------------------------------------
/docs/examples.rst:
--------------------------------------------------------------------------------
1 | ##########################
2 | Examples, Demos, Tutorials
3 | ##########################
4 |
5 | This page introduces various demos, examples, and tutorials currently available with the Ryzen™ AI Software.
6 |
7 | *************************
8 | Getting Started Tutorials
9 | *************************
10 |
11 | NPU
12 | ~~~
13 |
14 | - The :doc:`Getting Started Tutorial ` deploys a custom ResNet model demonstrating:
15 |
16 | - Pretrained model conversion to ONNX
17 | - Quantization using AMD Quark quantizer
18 | - Deployment using ONNX Runtime C++ and Python code
19 |
20 | - `Hello World Jupyter Notebook Tutorial `_
21 |
22 | - New BF16 Model examples:
23 |
24 | - `Image Classification `_
25 | - `Finetuned DistilBERT for Text Classification `_
26 | - `Text Embedding Model Alibaba-NLP/gte-large-en-v1.5 `_
27 |
28 | iGPU
29 | ~~~~
30 |
31 | - `ResNet50 on iGPU `_
32 |
33 |
34 | ************************************
35 | Other examples, demos, and tutorials
36 | ************************************
37 |
38 | - Refer to `RyzenAI-SW repo `_
39 |
40 |
41 |
42 | ..
43 | ------------
44 |
45 | #####################################
46 | License
47 | #####################################
48 |
49 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/docs/getstartex.rst:
--------------------------------------------------------------------------------
1 | :orphan:
2 |
3 | ########################
4 | Getting Started Tutorial
5 | ########################
6 |
7 | This tutorial uses a fine-tuned version of the ResNet model (using the CIFAR-10 dataset) to demonstrate the process of preparing, quantizing, and deploying a model using Ryzen AI Software. The tutorial features deployment using both Python and C++ ONNX runtime code.
8 |
9 | .. note::
10 | In this documentation, "NPU" is used in descriptions, while "IPU" is retained in some of the tool's language, code, screenshots, and commands. This intentional
11 | distinction aligns with existing tool references and does not affect functionality. Avoid making replacements in the code.
12 |
13 | - The source code files can be downloaded from `this link `_. Alternatively, you can clone the RyzenAI-SW repo and change the directory into "tutorial".
14 |
15 | .. code-block::
16 |
17 | git clone https://github.com/amd/RyzenAI-SW.git
18 | cd tutorial/getting_started_resnet
19 |
20 | |
21 |
22 | The following are the steps and the required files to run the example:
23 |
24 | .. list-table::
25 | :widths: 20 25 25
26 | :header-rows: 1
27 |
28 | * - Steps
29 | - Files Used
30 | - Description
31 | * - Installation
32 | - ``requirements.txt``
33 | - Install the necessary package for this example.
34 | * - Preparation
35 | - ``prepare_model_data.py``,
36 | ``resnet_utils.py``
37 | - The script ``prepare_model_data.py`` prepares the model and the data for the rest of the tutorial.
38 |
39 | 1. To prepare the model the script converts pre-trained PyTorch model to ONNX format.
40 | 2. To prepare the necessary data the script downloads and extracts CIFAR-10 dataset.
41 |
42 | * - Pretrained model
43 | - ``models/resnet_trained_for_cifar10.pt``
44 | - The ResNet model trained using CIFAR-10 is provided in .pt format.
45 | * - Quantization
46 | - ``resnet_quantize.py``
47 | - Convert the model to the NPU-deployable model by performing Post-Training Quantization flow using AMD Quark Quantization.
48 | * - Deployment - Python
49 | - ``predict.py``
50 | - Run the Quantized model using the ONNX Runtime code. We demonstrate running the model on both CPU and NPU.
51 | * - Deployment - C++
52 | - ``cpp/resnet_cifar/.``
53 | - This folder contains the source code ``resnet_cifar.cpp`` that demonstrates running inference using C++ APIs. We additionally provide the infrastructure (required libraries, CMake files and header files) required by the example.
54 |
55 |
56 | |
57 | |
58 |
59 | ************************
60 | Step 1: Install Packages
61 | ************************
62 |
63 | * Ensure that the Ryzen AI Software is correctly installed. For more details, see the :doc:`installation instructions `.
64 |
65 | * Use the conda environment created during the installation for the rest of the steps. This example requires a couple of additional packages. Run the following command to install them:
66 |
67 |
68 | .. code-block::
69 |
70 | python -m pip install -r requirements.txt
71 |
72 | |
73 | |
74 |
75 |
76 | **************************************
77 | Step 2: Prepare dataset and ONNX model
78 | **************************************
79 |
80 | In this example, we utilize a custom ResNet model finetuned using the CIFAR-10 dataset
81 |
82 | The ``prepare_model_data.py`` script downloads the CIFAR-10 dataset in pickle format (for python) and binary format (for C++). This dataset will be used in the subsequent steps for quantization and inference. The script also exports the provided PyTorch model into ONNX format. The following snippet from the script shows how the ONNX model is exported:
83 |
84 | .. code-block::
85 |
86 | dummy_inputs = torch.randn(1, 3, 32, 32)
87 | input_names = ['input']
88 | output_names = ['output']
89 | dynamic_axes = {'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}
90 | tmp_model_path = str(models_dir / "resnet_trained_for_cifar10.onnx")
91 | torch.onnx.export(
92 | model,
93 | dummy_inputs,
94 | tmp_model_path,
95 | export_params=True,
96 | opset_version=13,
97 | input_names=input_names,
98 | output_names=output_names,
99 | dynamic_axes=dynamic_axes,
100 | )
101 |
102 | Note the following settings for the onnx conversion:
103 |
104 | - Ryzen AI supports a batch size=1, so dummy input is fixed to a batch_size =1 during model conversion
105 | - Recommended ``opset_version`` setting 13 is used.
106 |
107 | Run the following command to prepare the dataset and export the ONNX model:
108 |
109 | .. code-block::
110 |
111 | python prepare_model_data.py
112 |
113 | * The downloaded CIFAR-10 dataset is saved in the current directory at the following location: ``data/*``.
114 | * The ONNX model is generated at models/resnet_trained_for_cifar10.onnx
115 |
116 | |
117 | |
118 |
119 | **************************
120 | Step 3: Quantize the Model
121 | **************************
122 |
123 | Quantizing AI models from floating-point to 8-bit integers reduces computational power and the memory footprint required for inference. This example utilizes Quark for ONNX quantizer workflow. Quark takes the pre-trained float32 model from the previous step (``resnet_trained_for_cifar10.onnx``) and provides a quantized model.
124 |
125 | .. code-block::
126 |
127 | python resnet_quantize.py
128 |
129 | This generates a quantized model using QDQ quant format and generate Quantized model with default configuration. After the completion of the run, the quantized ONNX model ``resnet_quantized.onnx`` is saved to models/resnet_quantized.onnx
130 |
131 | The :file:`resnet_quantize.py` file has ``ModelQuantizer::quantize_model`` function that applies quantization to the model.
132 |
133 | .. code-block::
134 |
135 | from quark.onnx.quantization.config import (Config, get_default_config)
136 | from quark.onnx import ModelQuantizer
137 |
138 | # Get quantization configuration
139 | quant_config = get_default_config("XINT8")
140 | config = Config(global_quant_config=quant_config)
141 |
142 | # Create an ONNX quantizer
143 | quantizer = ModelQuantizer(config)
144 |
145 | # Quantize the ONNX model
146 | quantizer.quantize_model(input_model_path, output_model_path, dr)
147 |
148 | The parameters of this function are:
149 |
150 | * **input_model_path**: (String) The file path of the model to be quantized.
151 | * **output_model_path**: (String) The file path where the quantized model is saved.
152 | * **dr**: (Object or None) Calibration data reader that enumerates the calibration data and producing inputs for the original model. In this example, CIFAR10 dataset is used for calibration during the quantization process.
153 |
154 |
155 | |
156 | |
157 |
158 | ************************
159 | Step 4: Deploy the Model
160 | ************************
161 |
162 | We demonstrate deploying the quantized model using both Python and C++ APIs.
163 |
164 | * :ref:`Deployment - Python `
165 | * :ref:`Deployment - C++ `
166 |
167 | .. note::
168 | During the Python and C++ deployment, the compiled model artifacts are saved in the cache folder named ``/modelcachekey``. Ryzen-AI does not support the complied model artifacts across the versions, so if the model artifacts exist from the previous software version, ensure to delete the folder ``modelcachekey`` before the deployment steps.
169 |
170 |
171 | .. _dep-python:
172 |
173 | Deployment - Python
174 | ===========================
175 |
176 | The ``predict.py`` script is used to deploy the model. It extracts the first ten images from the CIFAR-10 test dataset and converts them to the .png format. The script then reads all those ten images and classifies them by running the quantized custom ResNet model on CPU or NPU.
177 |
178 | Deploy the Model on the CPU
179 | ----------------------------
180 |
181 | By default, ``predict.py`` runs the model on CPU.
182 |
183 | .. code-block::
184 |
185 | python predict.py
186 |
187 | Typical output
188 |
189 | .. code-block::
190 |
191 | Image 0: Actual Label cat, Predicted Label cat
192 | Image 1: Actual Label ship, Predicted Label ship
193 | Image 2: Actual Label ship, Predicted Label airplane
194 | Image 3: Actual Label airplane, Predicted Label airplane
195 | Image 4: Actual Label frog, Predicted Label frog
196 | Image 5: Actual Label frog, Predicted Label frog
197 | Image 6: Actual Label automobile, Predicted Label automobile
198 | Image 7: Actual Label frog, Predicted Label frog
199 | Image 8: Actual Label cat, Predicted Label cat
200 | Image 9: Actual Label automobile, Predicted Label automobile
201 |
202 |
203 | Deploy the Model on the Ryzen AI NPU
204 | ------------------------------------
205 |
206 | To successfully run the model on the NPU, run the following setup steps:
207 |
208 | - Ensure ``RYZEN_AI_INSTALLATION_PATH`` points to ``path\to\ryzen-ai-sw-\``. If you installed Ryzen-AI software using the MSI installer, this variable should already be set. Ensure that the Ryzen-AI software package has not been moved post installation, in which case ``RYZEN_AI_INSTALLATION_PATH`` will have to be set again.
209 |
210 | - By default, the Ryzen AI Conda environment automatically sets the standard binary for all inference sessions through the ``XLNX_VART_FIRMWARE`` environment variable. However, explicitly passing the xclbin option in provider_options overrides the default setting.
211 |
212 | .. code-block::
213 |
214 | parser = argparse.ArgumentParser()
215 | parser.add_argument('--ep', type=str, default ='cpu',choices = ['cpu','npu'], help='EP backend selection')
216 | opt = parser.parse_args()
217 |
218 | providers = ['CPUExecutionProvider']
219 | provider_options = [{}]
220 |
221 | if opt.ep == 'npu':
222 | providers = ['VitisAIExecutionProvider']
223 | cache_dir = Path(__file__).parent.resolve()
224 | provider_options = [{
225 | 'cacheDir': str(cache_dir),
226 | 'cacheKey': 'modelcachekey',
227 | 'xclbin': 'path/to/xclbin'
228 | }]
229 |
230 | session = ort.InferenceSession(model.SerializeToString(), providers=providers,
231 | provider_options=provider_options)
232 |
233 |
234 | Run the ``predict.py`` with the ``--ep npu`` switch to run the custom ResNet model on the Ryzen AI NPU:
235 |
236 |
237 | .. code-block::
238 |
239 | python predict.py --ep npu
240 |
241 | Typical output
242 |
243 | .. code-block::
244 |
245 | [Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50%
246 | [Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1
247 | ...
248 | Image 0: Actual Label cat, Predicted Label cat
249 | Image 1: Actual Label ship, Predicted Label ship
250 | Image 2: Actual Label ship, Predicted Label ship
251 | Image 3: Actual Label airplane, Predicted Label airplane
252 | Image 4: Actual Label frog, Predicted Label frog
253 | Image 5: Actual Label frog, Predicted Label frog
254 | Image 6: Actual Label automobile, Predicted Label truck
255 | Image 7: Actual Label frog, Predicted Label frog
256 | Image 8: Actual Label cat, Predicted Label cat
257 | Image 9: Actual Label automobile, Predicted Label automobile
258 |
259 |
260 | .. _dep-cpp:
261 |
262 | Deployment - C++
263 | ===========================
264 |
265 | Prerequisites
266 | -------------
267 |
268 | 1. Visual Studio 2022 Community edition, ensure "Desktop Development with C++" is installed
269 | 2. cmake (version >= 3.26)
270 | 3. opencv (version=4.6.0) required for the custom resnet example
271 |
272 | Install OpenCV
273 | --------------
274 |
275 | It is recommended to build OpenCV from the source code and use static build. The default installation location is "\install" , the following instruction installs OpenCV in the location "C:\\opencv" as an example. You may first change the directory to where you want to clone the OpenCV repository.
276 |
277 | .. code-block:: bash
278 |
279 | git clone https://github.com/opencv/opencv.git -b 4.6.0
280 | cd opencv
281 | cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_CONFIGURATION_TYPES=Release -A x64 -T host=x64 -G "Visual Studio 17 2022" "-DCMAKE_INSTALL_PREFIX=C:\opencv" "-DCMAKE_PREFIX_PATH=C:\opencv" -DCMAKE_BUILD_TYPE=Release -DBUILD_opencv_python2=OFF -DBUILD_opencv_python3=OFF -DBUILD_WITH_STATIC_CRT=OFF -B build
282 | cmake --build build --config Release
283 | cmake --install build --config Release
284 |
285 | The build files will be written to ``build\``.
286 |
287 | Build and Run Custom Resnet C++ sample
288 | --------------------------------------
289 |
290 | The C++ source files, CMake list files and related artifacts are provided in the ``cpp/resnet_cifar/*`` folder. The source file ``cpp/resnet_cifar/resnet_cifar.cpp`` takes 10 images from the CIFAR-10 test set, converts them to .png format, preprocesses them, and performs model inference. The example has onnxruntime dependencies, that are provided in ``%RYZEN_AI_INSTALLATION_PATH%/onnxruntime/*``.
291 |
292 | Run the following command to build the resnet example. Assign ``-DOpenCV_DIR`` to the OpenCV build directory.
293 |
294 | .. code-block:: bash
295 |
296 | cd getting_started_resnet/cpp
297 | cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_CONFIGURATION_TYPES=Release -A x64 -T host=x64 -DCMAKE_INSTALL_PREFIX=. -DCMAKE_PREFIX_PATH=. -B build -S resnet_cifar -DOpenCV_DIR="C:/opencv/build" -G "Visual Studio 17 2022"
298 |
299 | This should generate the build directory with the ``resnet_cifar.sln`` solution file along with other project files. Open the solution file using Visual Studio 2022 and build to compile. You can also use "Developer Command Prompt for VS 2022" to open the solution file in Visual Studio.
300 |
301 | .. code-block:: bash
302 |
303 | devenv build/resnet_cifar.sln
304 |
305 | Now to deploy our model, we will go back to the parent directory (getting_started_resnet) of this example. After compilation, the executable should be generated in ``cpp/build/Release/resnet_cifar.exe``. We will copy this application over to the parent directory:
306 |
307 | .. code-block:: bash
308 |
309 | cd ..
310 | xcopy cpp\build\Release\resnet_cifar.exe .
311 |
312 | Additionally, we will also need to copy the onnxruntime DLLs from the Vitis AI Execution Provider package to the current directory. The following commands copy the required files in the current directory:
313 |
314 | .. code-block:: bash
315 |
316 | xcopy %RYZEN_AI_INSTALLATION_PATH%\onnxruntime\bin\* /E /I
317 |
318 |
319 | The C++ application that was generated takes 3 arguments:
320 |
321 | #. Path to the quantized ONNX model generated in Step 3
322 | #. The execution provider of choice (cpu or NPU)
323 | #. vaip_config.json (pass None if running on CPU)
324 |
325 |
326 | Deploy the Model on the CPU
327 | ****************************
328 |
329 | To run the model on the CPU, use the following command:
330 |
331 | .. code-block:: bash
332 |
333 | resnet_cifar.exe models\resnet_quantized.onnx cpu
334 |
335 | Typical output:
336 |
337 | .. code-block:: bash
338 |
339 | model name:models\resnet_quantized.onnx
340 | ep:cpu
341 | Input Node Name/Shape (1):
342 | input : -1x3x32x32
343 | Output Node Name/Shape (1):
344 | output : -1x10
345 | Final results:
346 | Predicted label is cat and actual label is cat
347 | Predicted label is ship and actual label is ship
348 | Predicted label is ship and actual label is ship
349 | Predicted label is airplane and actual label is airplane
350 | Predicted label is frog and actual label is frog
351 | Predicted label is frog and actual label is frog
352 | Predicted label is truck and actual label is automobile
353 | Predicted label is frog and actual label is frog
354 | Predicted label is cat and actual label is cat
355 | Predicted label is automobile and actual label is automobile
356 |
357 | Deploy the Model on the NPU
358 | ****************************
359 |
360 | To successfully run the model on the NPU:
361 |
362 | - Ensure ``RYZEN_AI_INSTALLATION_PATH`` points to ``path\to\ryzen-ai-sw-\``. If you installed Ryzen-AI software using the MSI installer, this variable should already be set. Ensure that the Ryzen-AI software package has not been moved post installation, in which case ``RYZEN_AI_INSTALLATION_PATH`` will have to be set again.
363 |
364 | - By default, the Ryzen AI Conda environment automatically sets the standard binary for all inference sessions through the ``XLNX_VART_FIRMWARE`` environment variable. However, explicitly passing the xclbin option in provider_options overrides the default setting.
365 |
366 | The following code block from ``reset_cifar.cpp`` shows how ONNX Runtime is configured to deploy the model on the Ryzen AI NPU:
367 |
368 | .. code-block:: bash
369 |
370 | auto session_options = Ort::SessionOptions();
371 |
372 | auto cache_dir = std::filesystem::current_path().string();
373 |
374 | if(ep=="npu")
375 | {
376 | auto options =
377 | std::unordered_map{ {"cacheDir", cache_dir}, {"cacheKey", "modelcachekey"}, {"xclbin", "path/to/xclbin"}};
378 | session_options.AppendExecutionProvider_VitisAI(options)
379 | }
380 |
381 | auto session = Ort::Session(env, model_name.data(), session_options);
382 |
383 | To run the model on the NPU, we will pass the npu flag and the vaip_config.json file as arguments to the C++ application. Use the following command to run the model on the NPU:
384 |
385 | .. code-block:: bash
386 |
387 | resnet_cifar.exe models\resnet_quantized.onnx npu
388 |
389 | Typical output:
390 |
391 | .. code-block::
392 |
393 | [Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50%
394 | [Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1
395 | ...
396 | Final results:
397 | Predicted label is cat and actual label is cat
398 | Predicted label is ship and actual label is ship
399 | Predicted label is ship and actual label is ship
400 | Predicted label is airplane and actual label is airplane
401 | Predicted label is frog and actual label is frog
402 | Predicted label is frog and actual label is frog
403 | Predicted label is truck and actual label is automobile
404 | Predicted label is frog and actual label is frog
405 | Predicted label is cat and actual label is cat
406 | Predicted label is automobile and actual label is automobile
407 | ..
408 | ------------
409 |
410 | #####################################
411 | License
412 | #####################################
413 |
414 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
415 |
--------------------------------------------------------------------------------
/docs/gpu/ryzenai_gpu.rst:
--------------------------------------------------------------------------------
1 | ###########################
2 | DirectML Flow
3 | ###########################
4 |
5 | *************
6 | Prerequisites
7 | *************
8 |
9 | - DirectX12 capable Windows OS (Windows 11 recommended)
10 | - Latest AMD `GPU device driver `_ installed
11 | - `Microsoft Olive `_ for model conversion and optimization
12 | - Latest `ONNX Runtime DirectML EP `_
13 |
14 | You can ensure GPU driver and DirectX version from ``Windows Task Manager`` -> ``Performance`` -> ``GPU``
15 |
16 | ******************************
17 | Running models on Ryzen AI GPU
18 | ******************************
19 |
20 | Running models on the Ryzen AI GPU is accomplished in two simple steps:
21 |
22 | **Model Conversion and Optimization**: After the model is trained, Microsoft Olive Optimizer can be used to convert the model to ONNX and optimize it for optimal target execution.
23 |
24 | For additional information, refer to the `Microsoft Olive Documentation `_
25 |
26 |
27 | **Deployment**: Once the model is in the ONNX format, the ONNX Runtime DirectML EP (``DmlExecutionProvider``) is used to run the model on the AMD Ryzen AI GPU.
28 |
29 | For additional information, refer to the `ONNX Runtime documentation for the DirectML Execution Provider `_
30 |
31 |
32 | ********
33 | Examples
34 | ********
35 |
36 | - Optimizing and running `ResNet on Ryzen AI GPU `_
37 |
38 |
39 | ********************
40 | Additional Resources
41 | ********************
42 |
43 |
44 | - Article on how AMD and Black Magic Design worked together to accelerate `Davinci Resolve Studio `_ workload on AMD hardware:
45 |
46 | - `AI Accelerated Video Editing with DaVinci Resolve 18.6 & AMD Radeon Graphics `_
47 |
48 | |
49 |
50 | - Blog posts on using the Ryzen AI Software for various generative AI workloads on GPU:
51 |
52 | - `Automatic1111 Stable Diffusion WebUI with DirectML Extension on AMD GPUs `_
53 |
54 | - `Running Optimized Llama2 with Microsoft DirectML on AMD Radeon Graphics `_
55 |
56 | - `AI-Assisted Mobile Workstation Workflows Powered by AMD Ryzen™ AI `_
57 |
--------------------------------------------------------------------------------
/docs/hybrid_oga.rst:
--------------------------------------------------------------------------------
1 | ############################
2 | OnnxRuntime GenAI (OGA) Flow
3 | ############################
4 |
5 | Ryzen AI Software supports deploying LLMs on Ryzen AI PCs using the native ONNX Runtime Generate (OGA) C++ or Python API. The OGA API is the lowest-level API available for building LLM applications on a Ryzen AI PC. This documentation covers the Hybrid execution mode for LLMs, which utilizes both the NPU and GPU
6 |
7 | **Note**: Refer to :doc:`npu_oga` for NPU only execution mode.
8 |
9 | ************************
10 | Supported Configurations
11 | ************************
12 |
13 | The Ryzen AI OGA flow supports Strix and Krackan Point processors. Phoenix (PHX) and Hawk (HPT) processors are not supported.
14 |
15 |
16 | ************
17 | Requirements
18 | ************
19 |
20 | - Install NPU Drivers and Ryzen AI MSI installer according to the :doc:`inst`
21 | - Install GPU device driver: Ensure GPU device driver https://www.amd.com/en/support is installed
22 | - Install Git for Windows (needed to download models from HF): https://git-scm.com/downloads
23 |
24 | ********************
25 | Pre-optimized Models
26 | ********************
27 |
28 | AMD provides a set of pre-optimized LLMs ready to be deployed with Ryzen AI Software and the supporting runtime for hybrid execution. These models can be found on Hugging Face:
29 |
30 | - https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
31 | - https://huggingface.co/amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
32 | - https://huggingface.co/amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid
33 | - https://huggingface.co/amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-fp16-onnx-hybrid
34 | - https://huggingface.co/amd/chatglm3-6b-awq-g128-int4-asym-fp16-onnx-hybrid
35 | - https://huggingface.co/amd/Llama-2-7b-hf-awq-g128-int4-asym-fp16-onnx-hybrid
36 | - https://huggingface.co/amd/Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid
37 | - https://huggingface.co/amd/Llama-3-8B-awq-g128-int4-asym-fp16-onnx-hybrid/tree/main
38 | - https://huggingface.co/amd/Llama-3.1-8B-awq-g128-int4-asym-fp16-onnx-hybrid/tree/main
39 | - https://huggingface.co/amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid
40 | - https://huggingface.co/amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid
41 | - https://huggingface.co/amd/Mistral-7B-Instruct-v0.1-hybrid
42 | - https://huggingface.co/amd/Mistral-7B-Instruct-v0.2-hybrid
43 | - https://huggingface.co/amd/Mistral-7B-v0.3-hybrid
44 | - https://huggingface.co/amd/Llama-3.1-8B-Instruct-hybrid
45 | - https://huggingface.co/amd/CodeLlama-7b-instruct-g128-hybrid
46 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-hybrid
47 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Qwen-1.5B-awq-asym-uint4-g128-lmhead-onnx-hybrid
48 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Qwen-7B-awq-asym-uint4-g128-lmhead-onnx-hybrid
49 | - https://huggingface.co/amd/AMD-OLMo-1B-SFT-DPO-hybrid
50 | - https://huggingface.co/amd/Qwen2-7B-awq-uint4-asym-g128-lmhead-fp16-onnx-hybrid
51 | - https://huggingface.co/amd/Qwen2-1.5B-awq-uint4-asym-global-g128-lmhead-g32-fp16-onnx-hybrid
52 | - https://huggingface.co/amd/gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid
53 |
54 |
55 | The steps for deploying the pre-optimized models using Python or C++ are described in the following sections.
56 |
57 | ******************************
58 | Hybrid Execution of OGA Models
59 | ******************************
60 |
61 | Setup
62 | =====
63 |
64 | Activate the Ryzen AI 1.4 Conda environment:
65 |
66 | .. code-block::
67 |
68 | conda activate ryzen-ai-1.4.0
69 |
70 | Copy the required files in a local folder to run the LLMs from:
71 |
72 | .. code-block::
73 |
74 | mkdir hybrid_run
75 | cd hybrid_run
76 | xcopy /Y /E "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\onnxruntime_genai\benchmark" .
77 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\amd_genai_prompt.txt" .
78 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\onnxruntime-genai.dll" .
79 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\onnx_custom_ops.dll" .
80 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\ryzen_mm.dll" .
81 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\ryzenai_onnx_utils.dll" .
82 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\DirectML.dll" .
83 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime.dll" .
84 |
85 | Download Models from HuggingFace
86 | ================================
87 |
88 | Download the desired models from the list of pre-optimized models on Hugging Face:
89 |
90 | .. code-block::
91 |
92 | # Make sure you have git-lfs installed (https://git-lfs.com)
93 | git lfs install
94 | git clone
95 |
96 | For example, for Llama-2-7b-chat:
97 |
98 | .. code-block::
99 |
100 | git lfs install
101 | git clone https://huggingface.co/amd/Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid
102 |
103 |
104 | Enabling Performance Mode (Optional)
105 | ====================================
106 |
107 | To run the LLMs in the best performance mode, follow these steps:
108 |
109 | - Go to ``Windows`` → ``Settings`` → ``System`` → ``Power`` and set the power mode to Best Performance.
110 | - Execute the following commands in the terminal:
111 |
112 | .. code-block::
113 |
114 | cd C:\Windows\System32\AMD
115 | xrt-smi configure --pmode performance
116 |
117 |
118 | Sample C++ Program
119 | ==================
120 |
121 | The ``model_benchmark.exe`` test application provides a simple mechanism for running and evaluating Hybrid OGA models using the native OGA C++ APIs. The source code for this application can be used a reference implementation for how to integrate LLMs using the native OGA C++ APIs.
122 |
123 | The ``model_benchmark.exe`` test application can be used as follows:
124 |
125 | .. code-block::
126 |
127 | # To see available options and default settings
128 | .\model_benchmark.exe -h
129 |
130 | # To run with default settings
131 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file -l $list_of_prompt_lengths
132 |
133 | # To show more informational output
134 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file --verbose
135 |
136 | # To run with given number of generated tokens
137 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file -l $list_of_prompt_lengths -g $num_tokens
138 |
139 | # To run with given number of warmup iterations
140 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file -l $list_of_prompt_lengths -w $num_warmup
141 |
142 | # To run with given number of iterations
143 | .\model_benchmark.exe -i $path_to_model_dir -f $prompt_file -l $list_of_prompt_lengths -r $num_iterations
144 |
145 |
146 | For example, for Llama-2-7b-chat:
147 |
148 | .. code-block::
149 |
150 | .\model_benchmark.exe -i Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid -f amd_genai_prompt.txt -l "1024" --verbose
151 |
152 | |
153 |
154 | **NOTE**: The C++ source code for the ``model_benchmark.exe`` executable can be found in the ``%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\c`` folder. This source code can be modified and recompiled if necessary using the commands below.
155 |
156 | .. code-block::
157 |
158 | :: Copy project files
159 | xcopy /E /I "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\c" .\sources
160 | xcopy /E /I "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\onnxruntime_genai\include" .\sources\include
161 | xcopy /E /I "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\onnxruntime_genai\lib" .\sources\lib
162 |
163 | :: Build project
164 | cd sources
165 | cmake -G "Visual Studio 17 2022" -A x64 -S . -B build
166 | cmake --build build --config Release
167 |
168 | :: Copy runtime DLLs
169 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\onnxruntime-genai.dll" .\build\Release
170 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\onnx_custom_ops.dll" .\build\Release
171 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\ryzen_mm.dll" .\build\Release
172 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\hybrid-llm\ryzenai_onnx_utils.dll" .\build\Release
173 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\DirectML.dll" .\build\Release
174 | xcopy /Y "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime.dll" .\build\Release
175 |
176 |
177 | Sample Python Scripts
178 | =====================
179 |
180 | To run LLMs other than ChatGLM, use the following command:
181 |
182 | .. code-block::
183 |
184 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\llama3\run_model.py" --model_dir
185 |
186 | To run ChatGLM, use the following command:
187 |
188 | .. code-block::
189 |
190 | pip install transformers==4.44.0
191 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\chatglm\model-generate-chatglm3.py" --model
192 |
193 |
194 | For example, for Llama-2-7b-chat:
195 |
196 | .. code-block::
197 |
198 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\llama3\run_model.py" --model_dir Llama-2-7b-chat-hf-awq-g128-int4-asym-fp16-onnx-hybrid
199 |
200 |
201 | ***********************
202 | Using Fine-Tuned Models
203 | ***********************
204 |
205 | It is also possible to run fine-tuned versions of the pre-optimized OGA models.
206 |
207 | To do this, the fine-tuned models must first be prepared for execution with the OGA Hybrid flow. For instructions on how to do this, refer to the page about :doc:`oga_model_prepare`.
208 |
209 | Once a fine-tuned model has been prepared for Hybrid execution, it can be deployed by following the steps described above in this page.
210 |
--------------------------------------------------------------------------------
/docs/icons.txt:
--------------------------------------------------------------------------------
1 |
2 | .. |warning| unicode:: U+26A0 .. Warning Sign
3 | .. |excl| unicode:: U+2757 .. Heavy Exclamation Mark
4 | .. |pen| unicode:: U+270E .. Lower Right Pencil
5 | .. |bulb| unicode:: U+1F4A1 .. Electric Light Bulb
6 | .. |folder| unicode:: U+1F4C1 .. File folder
7 | .. |clipboard| unicode:: U+1F4CB .. Clipboard
8 | .. |pushpin| unicode:: U+1F4CC .. Pushpin
9 | .. |roundpin| unicode:: U+1F4CD .. Round Pushpin
10 | .. |memo| unicode:: U+1F4DD .. Memo
11 | .. |info| unicode:: U+1F6C8 .. Circled Information Source
12 | .. |checkmark| unicode:: U+2705 .. White Heavy Check Mark
13 | .. |crossmark| unicode:: U+274C .. Red Cross Mark
14 |
--------------------------------------------------------------------------------
/docs/images/rai-sw.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/amd/ryzen-ai-documentation/de198aa9295c834055eb64b0d47796dafec63203/docs/images/rai-sw.png
--------------------------------------------------------------------------------
/docs/index.rst:
--------------------------------------------------------------------------------
1 | ##########################
2 | Ryzen AI Software
3 | ##########################
4 |
5 | AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs. Ryzen AI software enables applications to run on the neural processing unit (NPU) built in the AMD XDNA™ architecture, as well as on the integrated GPU. This allows developers to build and deploy models trained in PyTorch or TensorFlow and run them directly on laptops powered by Ryzen AI using ONNX Runtime and the Vitis™ AI Execution Provider (EP).
6 |
7 | .. image:: images/rai-sw.png
8 | :align: center
9 |
10 | ***********
11 | Quick Start
12 | ***********
13 |
14 | - :ref:`Supported Configurations `
15 | - :doc:`inst`
16 | - :doc:`examples`
17 |
18 | *************************
19 | Development Flow Overview
20 | *************************
21 |
22 | The Ryzen AI development flow does not require any modifications to the existing model training processes and methods. The pre-trained model can be used as the starting point of the Ryzen AI flow.
23 |
24 | Quantization
25 | ============
26 | Quantization involves converting the AI model's parameters from floating-point to lower-precision representations, such as bfloat16 floating-point or 8-bit integer. Quantized models are more power-efficient, utilize less memory, and offer better performance.
27 |
28 | **AMD Quark** is a comprehensive cross-platform deep learning toolkit designed to simplify and enhance the quantization of deep learning models. Supporting both PyTorch and ONNX models, Quark empowers developers to optimize their models for deployment on a wide range of hardware backends, achieving significant performance gains without compromising accuracy.
29 |
30 | For more details, refer to the :doc:`model_quantization` page.
31 |
32 | Compilation and Deployment
33 | ==========================
34 | The AI model is deployed using the ONNX Runtime with either C++ or Python APIs. The Vitis AI Execution Provider included in the ONNX Runtime intelligently determines what portions of the AI model should run on the NPU, optimizing workloads to ensure optimal performance with lower power consumption.
35 |
36 | For more details, refer to the :doc:`modelrun` page.
37 |
38 |
39 | |
40 | |
41 |
42 |
43 | .. toctree::
44 | :maxdepth: 1
45 | :hidden:
46 |
47 | relnotes.rst
48 |
49 |
50 | .. toctree::
51 | :maxdepth: 1
52 | :hidden:
53 | :caption: Getting Started on the NPU
54 |
55 | inst.rst
56 | examples.rst
57 |
58 | .. toctree::
59 | :maxdepth: 1
60 | :hidden:
61 | :caption: Running Models on the NPU
62 |
63 | model_quantization.rst
64 | modelrun.rst
65 | app_development.rst
66 |
67 | .. toctree::
68 | :maxdepth: 1
69 | :hidden:
70 | :caption: Running LLMs on the NPU
71 |
72 | llm/overview.rst
73 | llm/high_level_python.rst
74 | llm/server_interface.rst
75 | hybrid_oga.rst
76 | oga_model_prepare.rst
77 |
78 | .. toctree::
79 | :maxdepth: 1
80 | :hidden:
81 | :caption: Running Models on the GPU
82 |
83 | gpu/ryzenai_gpu.rst
84 |
85 | .. toctree::
86 | :maxdepth: 1
87 | :hidden:
88 | :caption: Additional Features
89 |
90 | xrt_smi.rst
91 | ai_analyzer.rst
92 | ryzen_ai_libraries.rst
93 |
94 |
95 | .. toctree::
96 | :maxdepth: 1
97 | :hidden:
98 | :caption: Additional Topics
99 |
100 | Model Zoo
101 | Licensing Information
102 |
103 |
104 |
105 | ..
106 | ------------
107 | #####################################
108 | License
109 | #####################################
110 |
111 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
112 |
--------------------------------------------------------------------------------
/docs/inst.rst:
--------------------------------------------------------------------------------
1 | .. include:: /icons.txt
2 |
3 | #########################
4 | Installation Instructions
5 | #########################
6 |
7 |
8 |
9 | *************
10 | Prerequisites
11 | *************
12 |
13 | The Ryzen AI Software supports AMD processors with a Neural Processing Unit (NPU). Consult the release notes for the full list of :ref:`supported configurations `.
14 |
15 | The following dependencies must be present on the system before installing the Ryzen AI Software:
16 |
17 | .. list-table::
18 | :widths: 25 25
19 | :header-rows: 1
20 |
21 | * - Dependencies
22 | - Version Requirement
23 | * - Windows 11
24 | - build >= 22621.3527
25 | * - Visual Studio
26 | - 2022
27 | * - cmake
28 | - version >= 3.26
29 | * - Anaconda or Miniconda
30 | - Latest version
31 |
32 | |
33 |
34 | |warning| **IMPORTANT**:
35 |
36 | - Visual Studio 2022 Community: ensure that "Desktop Development with C++" is installed
37 |
38 | - Anaconda or Miniconda: ensure that the following path is set in the System PATH variable: ``path\to\anaconda3\Scripts`` or ``path\to\miniconda3\Scripts`` (The System PATH variable should be set in the *System Variables* section of the *Environment Variables* window).
39 |
40 | |
41 |
42 | .. _install-driver:
43 |
44 | *******************
45 | Install NPU Drivers
46 | *******************
47 |
48 | - Download the NPU driver installation package :download:`NPU Driver `
49 |
50 | - Install the NPU drivers by following these steps:
51 |
52 | - Extract the downloaded ``NPU_RAI1.4_GA_257_WHQL.zip`` zip file.
53 | - Open a terminal in administrator mode and execute the ``.\npu_sw_installer.exe`` exe file.
54 |
55 | - Ensure that NPU MCDM driver (Version:32.0.203.257, Date:3/12/2025) is correctly installed by opening ``Device Manager`` -> ``Neural processors`` -> ``NPU Compute Accelerator Device``.
56 |
57 |
58 | .. _install-bundled:
59 |
60 | *************************
61 | Install Ryzen AI Software
62 | *************************
63 |
64 | - Download the RyzenAI Software installer :download:`ryzen-ai-1.4.0.exe `.
65 |
66 | - Launch the MSI installer and follow the instructions on the installation wizard:
67 |
68 | - Accept the terms of the Licence agreement
69 | - Provide the destination folder for Ryzen AI installation (default: ``C:\Program Files\RyzenAI\1.4.0``)
70 | - Specify the name for the conda environment (default: ``ryzen-ai-1.4.0``)
71 |
72 |
73 | The Ryzen AI Software packages are now installed in the conda environment created by the installer.
74 |
75 |
76 | .. _quicktest:
77 |
78 |
79 | *********************
80 | Test the Installation
81 | *********************
82 |
83 | The Ryzen AI Software installation folder contains test to verify that the software is correctly installed. This installation test can be found in the ``quicktest`` subfolder.
84 |
85 | - Open a Conda command prompt (search for "Anaconda Prompt" in the Windows start menu)
86 |
87 | - Activate the Conda environment created by the Ryzen AI installer:
88 |
89 | .. code-block::
90 |
91 | conda activate
92 |
93 | - Run the test:
94 |
95 | .. code-block::
96 |
97 | cd %RYZEN_AI_INSTALLATION_PATH%/quicktest
98 | python quicktest.py
99 |
100 |
101 | - The quicktest.py script sets up the environment and runs a simple CNN model. On a successful run, you will see an output similar to the one shown below. This indicates that the model is running on the NPU and that the installation of the Ryzen AI Software was successful:
102 |
103 | .. code-block::
104 |
105 | [Vitis AI EP] No. of Operators : CPU 2 NPU 398
106 | [Vitis AI EP] No. of Subgraphs : NPU 1 Actually running on NPU 1
107 | ...
108 | Test Passed
109 | ...
110 |
111 |
112 | .. note::
113 |
114 | The full path to the Ryzen AI Software installation folder is stored in the ``RYZEN_AI_INSTALLATION_PATH`` environment variable.
115 |
116 |
117 | **************************
118 | Other Installation Options
119 | **************************
120 |
121 | Linux Installer
122 | ~~~~~~~~~~~~~~~
123 |
124 | Compiling BF16 models requires more processing power than compiling INT8 models. If a larger BF16 model cannot be compiled on a Windows machine due to hardware limitations (e.g., insufficient RAM), an alternative Linux-based compilation flow is supported. More details can be found here: :doc:`rai_linux`
125 |
126 |
127 |
128 | Lightweight Installer
129 | ~~~~~~~~~~~~~~~~~~~~~
130 |
131 | A lightweight installer is available with reduced features. It cannot be used for compiling BF16 models but fully supports compiling and running INT8 models and running LLM models.
132 |
133 | - Download the RyzenAI Software Runtime MSI installer :download:`ryzen-ai-rt-1.4.0.msi `.
134 |
135 | - Launch the MSI installer and follow the instructions on the installation wizard:
136 |
137 | - Accept the terms of the Licence agreement
138 | - Provide the destination folder for Ryzen AI installation
139 | - Specify the name for the conda environment
140 |
141 |
142 |
143 | ..
144 | ------------
145 |
146 | #####################################
147 | License
148 | #####################################
149 |
150 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
151 |
--------------------------------------------------------------------------------
/docs/licenses.rst:
--------------------------------------------------------------------------------
1 | Licensing Information
2 | =====================
3 |
4 | Ryzen AI is released by Advanced Micro Devices, Inc. (AMD) and is subject to the licensing terms listed below. Some components may include third-party software that is subject to additional licenses. Review the following links for more information:
5 |
6 | - `AMD End User License Agreement `_
7 | - `Third Party End User License Agreement `_
8 |
--------------------------------------------------------------------------------
/docs/llm/high_level_python.rst:
--------------------------------------------------------------------------------
1 | .. Heading guidelines
2 | .. # with overline, for parts
3 | .. * with overline, for chapters
4 | .. =, for sections
5 | .. -, for subsections
6 | .. ^, for subsubsections
7 | .. “, for paragraphs
8 |
9 | #####################
10 | High-Level Python SDK
11 | #####################
12 |
13 | A Python environment offers flexibility for experimenting with LLMs, profiling them, and integrating them into Python applications. We use the `Lemonade SDK `_ to get up and running quickly.
14 |
15 | To get started, follow these instructions.
16 |
17 | ***************************
18 | System-level pre-requisites
19 | ***************************
20 |
21 | You only need to do this once per computer:
22 |
23 | 1. Make sure your system has the recommended Ryzen AI driver installed as described in :ref:`install-driver`.
24 | 2. Download and install `Miniconda for Windows `_.
25 | 3. Launch a terminal and call ``conda init``.
26 |
27 |
28 | *****************
29 | Environment Setup
30 | *****************
31 |
32 | To create and set up an environment, run these commands in your terminal:
33 |
34 | .. code-block:: bash
35 |
36 | conda create -n ryzenai-llm python=3.10
37 | conda activate ryzenai-llm
38 | pip install lemonade-sdk[llm-oga-hybrid]
39 | lemonade-install --ryzenai hybrid
40 |
41 | ****************
42 | Validation Tools
43 | ****************
44 |
45 | Now that you have completed installation, you can try prompting an LLM like this (where ``PROMPT`` is any prompt you like).
46 |
47 | Run this command in a terminal that has your environment activated:
48 |
49 | .. code-block:: bash
50 |
51 | lemonade -i amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid oga-load --device hybrid --dtype int4 llm-prompt --max-new-tokens 64 -p PROMPT
52 |
53 | Each example linked in the :ref:`featured-llms` table also has `example commands `_ for validating the speed and accuracy of each model.
54 |
55 | **********
56 | Python API
57 | **********
58 | You can also run this code to try out the high-level Lemonade API in a Python script:
59 |
60 | .. code-block:: python
61 |
62 | from lemonade.api import from_pretrained
63 |
64 | model, tokenizer = from_pretrained(
65 | "amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid", recipe="oga-hybrid"
66 | )
67 |
68 | input_ids = tokenizer("This is my prompt", return_tensors="pt").input_ids
69 | response = model.generate(input_ids, max_new_tokens=30)
70 |
71 | print(tokenizer.decode(response[0]))
72 |
73 | Each example linked in the :ref:`featured-llms` table also has an `example script `_ for streaming the text output of the LLM.
74 |
75 | **********
76 | Next Steps
77 | **********
78 |
79 | From here, you can check out `an example `_ or any of the other :ref:`featured-llms`.
80 |
81 | The examples pages also provide code for:
82 |
83 | #. Additional validation tools for measuring speed and accuracy.
84 | #. Streaming responses with the API.
85 | #. Integrating the API into applications.
86 | #. Launching the server interface from the Python environment.
87 |
88 |
89 |
90 |
91 | ..
92 | ------------
93 | #####################################
94 | License
95 | #####################################
96 |
97 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
98 |
--------------------------------------------------------------------------------
/docs/llm/overview.rst:
--------------------------------------------------------------------------------
1 | ########
2 | Overview
3 | ########
4 |
5 | ************************************
6 | OGA-based Flow with Hybrid Execution
7 | ************************************
8 |
9 | Ryzen AI Software supports deploying quantized 4-bit LLMs on Ryzen AI 300-series PCs. This solution uses a hybrid execution mode, which leverages both the NPU and integrated GPU (iGPU), and is built on the OnnxRuntime GenAI (OGA) framework.
10 |
11 | Hybrid execution mode optimally partitions the model such that different operations are scheduled on NPU vs. iGPU. This minimizes time-to-first-token (TTFT) in the prefill-phase and maximizes token generation (tokens per second, TPS) in the decode phase.
12 |
13 | OGA is a multi-vendor generative AI framework from Microsoft that provides a convenient LLM interface for execution backends such as Ryzen AI.
14 |
15 | Supported Configurations
16 | ========================
17 |
18 | - Only Ryzen AI 300-series Strix Point (STX) and Krackan Point (KRK) processors support OGA-based hybrid execution.
19 | - Developers with Ryzen AI 7000- and 8000-series processors can get started using the CPU-based examples linked in the :ref:`featured-llms` table.
20 | - Windows 11 is the required operating system.
21 |
22 |
23 | *******************************
24 | Development Interfaces
25 | *******************************
26 |
27 | The Ryzen AI LLM software stack is available through three development interfaces, each suited for specific use cases as outlined in the sections below. All three interfaces are built on top of native OnnxRuntime GenAI (OGA) libraries, as shown in the :ref:`software-stack-table` diagram below.
28 |
29 | The high-level Python APIs, as well as the Server Interface, also leverage the Lemonade SDK, which is multi-vendor open-source software that provides everything necessary for quickly getting started with LLMs on OGA.
30 |
31 | A key benefit of both OGA and Lemonade is that software developed against their interfaces is portable to many other execution backends.
32 |
33 | .. _software-stack-table:
34 |
35 | .. flat-table:: Ryzen AI Software Stack
36 | :header-rows: 1
37 | :class: center-table
38 |
39 | * - Your Python Application
40 | - Your LLM Stack
41 | - Your Native Application
42 | * - `Lemonade Python API* <#high-level-python-sdk>`_
43 | - `Lemonade Server Interface* <#server-interface-rest-api>`_
44 | - :rspan:`1` `OGA C++ Headers <../hybrid_oga.html>`_
45 | * - :cspan:`1` `OGA Python API* `_
46 | * - :cspan:`2` `Custom AMD OnnxRuntime GenAI (OGA) `_
47 | * - :cspan:`2` `AMD Ryzen AI Driver and Hardware `_
48 |
49 | \* indicates open-source software (OSS).
50 |
51 | High-Level Python SDK
52 | =====================
53 |
54 | The high-level Python SDK, Lemonade, allows you to get started using PyPI installation in approximately 5 minutes.
55 |
56 | This SDK allows you to:
57 |
58 | - Experiment with models in hybrid execution mode on Ryzen AI hardware.
59 | - Validate inference speed and task performance.
60 | - Integrate with Python apps using a high-level API.
61 |
62 | To get started in Python, follow these instructions: :doc:`high_level_python`.
63 |
64 |
65 | Server Interface (REST API)
66 | ===========================
67 |
68 | The Server Interface provides a convenient means to integrate with applications that:
69 |
70 | - Already support an LLM server interface, such as the Ollama server or OpenAI API.
71 | - Are written in any language (C++, C#, Javascript, etc.) that supports REST APIs.
72 | - Benefits from process isolation for the LLM backend.
73 |
74 | To get started with the server interface, follow these instructions: :doc:`server_interface`.
75 |
76 | For example applications that have been tested with Lemonade Server, see the `Lemonade Server Examples `_.
77 |
78 |
79 | OGA APIs for C++ Libraries and Python
80 | =====================================
81 |
82 | Native C++ libraries for OGA are available to give full customizability for deployment into native applications.
83 |
84 | The Python bindings for OGA also provide a customizable interface for Python development.
85 |
86 | To get started with the OGA APIs, follow these instructions: :doc:`../hybrid_oga`.
87 |
88 |
89 | .. _featured-llms:
90 |
91 | *******************************
92 | Featured LLMs
93 | *******************************
94 |
95 | The following tables contain a curated list of LLMs that have been validated on Ryzen AI hybrid execution mode. The hybrid examples are built on top of OnnxRuntime GenAI (OGA).
96 |
97 | The comprehensive set of pre-optimized models for hybrid execution used in these examples are available in the `AMD hybrid collection on Hugging Face `_. It is also possible to run fine-tuned versions of the models listed (for example, fine-tuned versions of Llama2 or Llama3). For instructions on how to prepare a fine-tuned OGA model for hybrid execution, refer to :doc:`../oga_model_prepare`.
98 |
99 | .. _ryzen-ai-oga-featured-llms:
100 |
101 | .. flat-table:: Ryzen AI OGA Featured LLMs
102 | :header-rows: 2
103 | :class: llm-table
104 |
105 | * -
106 | - :cspan:`1` CPU Baseline (HF bfloat16)
107 | - :cspan:`3` Ryzen AI Hybrid (OGA int4)
108 | * - Model
109 | - Example
110 | - Validation
111 | - Example
112 | - TTFT Speedup
113 | - Tokens/S Speedup
114 | - Validation
115 |
116 | * - `DeepSeek-R1-Distill-Qwen-7B `_
117 | - `Link `__
118 | - 🟢
119 | - `Link `__
120 | - 3.4x
121 | - 8.4x
122 | - 🟢
123 | * - `DeepSeek-R1-Distill-Llama-8B `_
124 | - `Link `__
125 | - 🟢
126 | - `Link `__
127 | - 4.2x
128 | - 7.6x
129 | - 🟢
130 | * - `Llama-3.2-1B-Instruct `_
131 | - `Link `__
132 | - 🟢
133 | - `Link `__
134 | - 1.9x
135 | - 5.1x
136 | - 🟢
137 | * - `Llama-3.2-3B-Instruct `_
138 | - `Link `__
139 | - 🟢
140 | - `Link `__
141 | - 2.8x
142 | - 8.1x
143 | - 🟢
144 | * - `Phi-3-mini-4k-instruct `_
145 | - `Link `__
146 | - 🟢
147 | - `Link `__
148 | - 3.6x
149 | - 7.8x
150 | - 🟢
151 | * - `Qwen1.5-7B-Chat `_
152 | - `Link `__
153 | - 🟢
154 | - `Link `__
155 | - 4.0x
156 | - 7.3x
157 | - 🟢
158 | * - `Mistral-7B-Instruct-v0.3 `_
159 | - `Link `__
160 | - 🟢
161 | - `Link `__
162 | - 5.0x
163 | - 8.1x
164 | - 🟢
165 | * - `Llama-3.1-8B-Instruct `_
166 | - `Link `__
167 | - 🟢
168 | - `Link `__
169 | - 3.9x
170 | - 8.9x
171 | - 🟢
172 |
173 | The :ref:`ryzen-ai-oga-featured-llms` table was compiled using validation, benchmarking, and accuracy metrics as measured by the `ONNX TurnkeyML v6.1.0 `_ ``lemonade`` commands in each example link. After this table was created, the Lemonade SDK moved to the new location found `here `_.
174 |
175 | Data collection details:
176 |
177 | * All validation, performance, and accuracy metrics are collected on the same system configuration:
178 |
179 | * System: HP OmniBook Ultra Laptop 14z
180 | * Processor: AMD Ryzen AI 9 HX 375 W/ Radeon 890M
181 | * Memory: 32GB of RAM
182 |
183 | * The Hugging Face ``transformers`` framework is used as the baseline implementation for speedup and accuracy comparisons.
184 |
185 | * The baseline checkpoint is the original ``safetensors`` Hugging Face checkpoint linked in each table row, in the ``bfloat16`` data type.
186 |
187 | * All speedup numbers are the measured performance of the model with input sequence length (ISL) of ``1024`` and output sequence length (OSL) of ``64``, on the specified backend, divided by the measured performance of the baseline.
188 | * We assign the 🟢 validation score based on this criteria: all commands in the example guide ran successfully.
189 |
190 |
191 | **************************************
192 | OGA-based Flow with NPU-only Execution
193 | **************************************
194 |
195 | The primary OGA-based flow for LLMs employs a hybrid execution mode which leverages both the NPU and iGPU. AMD also provides support for an OGA-based flow where the iGPU is not solicited and where the compute-intensive operations are exclusively offloaded to the NPU.
196 |
197 | The OGA-based NPU-only execution mode is supported on STX and KRK platforms.
198 |
199 | To get started with the OGA-based NPU-only execution mode, follow these instructions :doc:`../npu_oga`.
200 |
201 |
202 |
203 |
204 | ..
205 | ------------
206 |
207 | #####################################
208 | License
209 | #####################################
210 |
211 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
212 |
--------------------------------------------------------------------------------
/docs/llm/server_interface.rst:
--------------------------------------------------------------------------------
1 | .. Heading guidelines
2 | .. # with overline, for parts
3 | .. * with overline, for chapters
4 | .. =, for sections
5 | .. -, for subsections
6 | .. ^, for subsubsections
7 | .. “, for paragraphs
8 |
9 | ###########################
10 | Server Interface (REST API)
11 | ###########################
12 |
13 | The Lemonade SDK offers a server interface that allows your application to load an LLM on Ryzen AI hardware in a process, and then communicate with this process using standard ``REST`` APIs. This allows applications written in any language (C#, JavaScript, Python, C++, etc.) to easily integrate with Ryzen AI LLMs.
14 |
15 | Server interfaces are used across the LLM ecosystem because they allow for no-code plug-and-play between the higher level of the application stack (GUIs, agents, RAG, etc.) with the LLM and hardware that have been abstracted by the server.
16 |
17 | For example, open source projects such as `Open WebUI <#open-webui-demo>`_ have out-of-box support for connecting to a variety of server interfaces, which in turn allows users to quickly start working with LLMs in a GUI.
18 |
19 | ************
20 | Server Setup
21 | ************
22 |
23 | Lemonade Server can be installed via the Lemonade Server Installer executable by following these steps:
24 |
25 | 1. Make sure your system has the recommended Ryzen AI driver installed as described in :ref:`install-driver`.
26 | 2. Download and install ``Lemonade_Server_Installer.exe`` from the `latest Lemonade release `_.
27 | 3. Launch the server by double-clicking the ``lemonade_server`` shortcut added to your desktop.
28 |
29 | See the `Lemonade Server README `_ for more details.
30 |
31 | ************
32 | Server Usage
33 | ************
34 |
35 | The Lemonade Server provides the following OpenAI-compatible endpoints:
36 |
37 | - POST ``/api/v0/chat/completions`` - Chat Completions (messages to completions)
38 | - POST ``/api/v0/completions`` - Text Completions (prompt to completion)
39 | - GET ``/api/v0/models`` - List available models
40 |
41 | Please refer to the `server specification `_ document in the Lemonade repository for details about the request and response formats for each endpoint.
42 |
43 | The `OpenAI API documentation `_ also has code examples for integrating streaming completions into an application.
44 |
45 | Open WebUI Demo
46 | ===============
47 |
48 | To experience the Lemonade Server, try using it with an OpenAI-compatible application, such as Open WebUI.
49 |
50 | Instructions:
51 | -------------
52 |
53 | 1. **Launch Lemonade Server:** Double-click the lemon icon on your desktop. See `server setup <#server-setup>`_ for installation instructions.
54 |
55 | 2. **Install and Run Open WebUI:** In a terminal, install Open WebUI using the following commands:
56 |
57 | .. code-block:: bash
58 |
59 | conda create -n webui python=3.11
60 | conda activate webui
61 | pip install open-webui
62 | open-webui serve
63 |
64 | 3. **Launch Open WebUI**: In a browser, navigate to ` `_.
65 |
66 | 4. **Connect Open WebUI to Lemonade Server:** In the top-right corner of the UI, click the profile icon and then:
67 |
68 | - Go to ``Settings`` → ``Connections``.
69 | - Click the ``+`` button to add our OpenAI-compatible connection.
70 | - In the URL field, enter ``http://localhost:8000/api/v0``, and in the key field put ``-``, then press save.
71 |
72 | **Done!** You are now able to run Open WebUI with Hybrid models. Feel free to choose any of the available “-Hybrid” models in the model selection menu.
73 |
74 | **********
75 | Next Steps
76 | **********
77 |
78 | - See `Lemonade Server Examples `_ to find applications that have been tested with Lemonade Server.
79 | - Check out the `Lemonade Server specification `_ to learn more about supported features.
80 | - Try out your Lemonade Server install with any application that uses the OpenAI chat completions API.
81 |
82 |
83 | ..
84 | ------------
85 | #####################################
86 | License
87 | #####################################
88 |
89 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
90 |
--------------------------------------------------------------------------------
/docs/model_quantization.rst:
--------------------------------------------------------------------------------
1 | ##################
2 | Model Quantization
3 | ##################
4 |
5 | **Model quantization** is the process of mapping high-precision weights/activations to a lower precision format, such as BF16/INT8, while maintaining model accuracy. This technique enhances the computational and memory efficiency of the model for deployment on NPU devices. It can be applied post-training, allowing existing models to be optimized without the need for retraining.
6 |
7 | The Ryzen AI compiler supports input models quantized to either INT8 or BF16 format:
8 |
9 | - CNN models: INT8 or BF16
10 | - Transformer models: BF16
11 |
12 | Quantization introduces several challenges, primarily revolving around the potential drop in model accuracy. Choosing the right quantization parameters—such as data type, bit-width, scaling factors, and the decision between per-channel or per-tensor quantization—adds layers of complexity to the design process.
13 |
14 | *********
15 | AMD Quark
16 | *********
17 |
18 | **AMD Quark** is a comprehensive cross-platform deep learning toolkit designed to simplify and enhance the quantization of deep learning models. Supporting both PyTorch and ONNX models, Quark empowers developers to optimize their models for deployment on a wide range of hardware backends, achieving significant performance gains without compromising accuracy.
19 |
20 | For more challenging model quantization needs **AMD Quark** supports advanced quantization technique like **Fast Finetuning** that helps recover the lost accuracy of the quantized model.
21 |
22 | Documentation
23 | =============
24 | The complete documentation for AMD Quark for Ryzen AI can be found here: https://quark.docs.amd.com/latest/supported_accelerators/ryzenai/index.html
25 |
26 |
27 | INT8 Examples
28 | =============
29 | **AMD Quark** provides default configrations that support INT8 quantization configuration. For example, `XINT8` uses symmetric INT8 activation and weights quantization with power-of-two scales using the MinMSE calibration method.
30 | The quantization configuration can be customized using the `QuantizationConfig` class. The following example shows how to set up the quantization configuration for INT8 quantization:
31 |
32 | .. code-block::
33 |
34 | quant_config = QuantizationConfig(calibrate_method=PowerOfTwoMethod.MinMSE,
35 | activation_type=QuantType.QUInt8,
36 | weight_type=QuantType.QInt8,
37 | enable_npu_cnn=True,
38 | extra_options={'ActivationSymmetric': True})
39 | config = Config(global_quant_config=quant_config)
40 | print("The configuration of the quantization is {}".format(config))
41 |
42 | The user can use the `get_default_config('XINT8')` function to get the default configuration for INT8 quantization.
43 |
44 | For more details
45 | ~~~~~~~~~~~~~~~~
46 | - `AMD Quark Tutorial `_ for Ryzen AI Deployment
47 | - Running INT8 model on NPU using :doc:`Getting Started Tutorial `
48 | - Advanced quantization techniques `Fast Finetuning and Cross Layer Equalization `_ for INT8 model
49 |
50 |
51 | BF16 Examples
52 | =============
53 | **AMD Quark** provides default configrations that support BFLOAT16 (BF16) model quantization. For example, BF16 is a 16-bit floating-point format designed to have same exponent size as FP32, allowing a wide dynamic range, but with reduced precision to save memory and speed up computations.
54 | The BFLOAT16 (BF16) model needs to be converted from QDQ nodes to Cast operations to run with VAIML compiler. AMD Quark support this conversion with the configuration option `BF16QDQToCast`.
55 |
56 | .. code-block::
57 |
58 | quant_config = get_default_config("BF16")
59 | quant_config.extra_options["BF16QDQToCast"] = True
60 | config = Config(global_quant_config=quant_config)
61 | print("The configuration of the quantization is {}".format(config))
62 |
63 | For more details
64 | ~~~~~~~~~~~~~~~~
65 | - `Image Classification `_ using ResNet50 to run BF16 model on NPU
66 | - `Finetuned DistilBERT for Text Classification `_
67 | - `Text Embedding Model Alibaba-NLP/gte-large-en-v1.5 `_
68 | - Advanced quantization techniques `Fast Finetuning `_ for BF16 models.
69 |
70 |
71 | ..
72 | ------------
73 |
74 | #####################################
75 | License
76 | #####################################
77 |
78 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
79 |
--------------------------------------------------------------------------------
/docs/modelcompat.rst:
--------------------------------------------------------------------------------
1 | .. include:: icons.txt
2 |
3 | ###################
4 | Model Compatibility
5 | ###################
6 |
7 | The Ryzen AI Software supports deploying quantized model saved in the ONNX format.
8 |
9 | Currently, the NPU supports a subset of the ONNX operators. At runtime, the ONNX graph is automatically partitioned into multiple subgraphs by the Vitis AI ONNX Execution Provider (VAI EP). The subgraph(s) containing operators supported by the NPU are executed on the NPU. The remaining subgraph(s) are executed on the CPU. This graph partitioning and deployment technique across CPU and NPU is fully automated by the VAI EP and is totally transparent to the end-user.
10 |
11 | |memo| **NOTE**: Models with ONNX opset 17 are recommended. If your model uses a different opset version, consider converting it using the `ONNX Version Converter `_
12 |
13 |
14 | The Ryzen AI compiler supports input models quantized to either INT8 or BF16 format:
15 |
16 | - CNN models: INT8 or BF16
17 | - Transformer models: BF16
18 |
19 | BF16 models (CNN or Transformer) require processing power in terms of core count and memory, depending on model size. If a larger model cannot be compiled on a Windows machine due to hardware limitations (e.g., insufficient RAM), an alternative Linux-based compilation flow is supported. More details can be found here: .
20 |
21 | The list of the ONNX operators currently supported by the NPU is as follows:
22 |
23 |
24 |
25 | ..
26 | ------------
27 |
28 | #####################################
29 | License
30 | #####################################
31 |
32 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
33 |
--------------------------------------------------------------------------------
/docs/modelrun.rst:
--------------------------------------------------------------------------------
1 | .. include:: /icons.txt
2 |
3 | ################################
4 | Model Compilation and Deployment
5 | ################################
6 |
7 | *****************
8 | Introduction
9 | *****************
10 |
11 | The Ryzen AI Software supports compiling and deploying quantized model saved in the ONNX format. The ONNX graph is automatically partitioned into multiple subgraphs by the VitisAI Execution Provider (EP). The subgraph(s) containing operators supported by the NPU are executed on the NPU. The remaining subgraph(s) are executed on the CPU. This graph partitioning and deployment technique across CPU and NPU is fully automated by the VAI EP and is totally transparent to the end-user.
12 |
13 | |memo| **NOTE**: Models with ONNX opset 17 are recommended. If your model uses a different opset version, consider converting it using the `ONNX Version Converter `_
14 |
15 | Models are compiled for the NPU by creating an ONNX inference session using the Vitis AI Execution Provider (VAI EP):
16 |
17 | .. code-block:: python
18 |
19 | providers = ['VitisAIExecutionProvider']
20 | session = ort.InferenceSession(
21 | model,
22 | sess_options = sess_opt,
23 | providers = providers,
24 | provider_options = provider_options
25 | )
26 |
27 |
28 | The ``provider_options`` parameter allows passing special options to the Vitis AI EP.
29 |
30 | .. list-table::
31 | :widths: 20 35
32 | :header-rows: 1
33 |
34 | * - Provider Options
35 | - Description
36 | * - config_file
37 | - Configuration file to pass certain compile-specific options, used for BF16 compilation.
38 | * - xclbin
39 | - NPU binary file to specify NPU configuration, used for INT8 models.
40 | * - cache_dir
41 | - The path and name of the cache directory.
42 | * - cache_key
43 | - The subfolder in the cache directory where the compiled model is stored.
44 | * - encryptionKey
45 | - Used for generating an encrypted compiled model.
46 |
47 | Detailed usage of these options is discussed in the following sections of this page.
48 |
49 |
50 | .. _compile-bf16:
51 |
52 | **************************
53 | Compiling BF16 models
54 | **************************
55 |
56 | |memo| **NOTE**: For compiling large BF16 models a machine with at least 32GB of memory is recommended. The machine does not need to have an NPU. It is also possible to compile BF16 models on a Linux workstation. More details can be found here: :doc:`rai_linux`
57 |
58 | When compiling BF16 models, a compilation configuration file must be provided through the ``config_file`` provider options.
59 |
60 | .. code-block:: python
61 |
62 | providers = ['VitisAIExecutionProvider']
63 |
64 | provider_options = [{
65 | 'config_file': 'vai_ep_config.json'
66 | }]
67 |
68 | session = ort.InferenceSession(
69 | "resnet50.onnx",
70 | providers=providers,
71 | provider_options=provider_options
72 | )
73 |
74 |
75 | By default, the configuration file for compiling BF16 models should contain the following:
76 |
77 | .. code-block:: json
78 |
79 | {
80 | "passes": [
81 | {
82 | "name": "init",
83 | "plugin": "vaip-pass_init"
84 | },
85 | {
86 | "name": "vaiml_partition",
87 | "plugin": "vaip-pass_vaiml_partition",
88 | "vaiml_config": {}
89 | }
90 | ]
91 | }
92 |
93 |
94 | Additional options can be specified in the ``vaiml_config`` section of the configuration file, as described below.
95 |
96 | **Performance Optimization**
97 |
98 | The default compilation optimization level is 1. The optimization level can be changed as follows:
99 |
100 | .. code-block:: json
101 |
102 | "vaiml_config": {"optimize_level": 2}
103 |
104 | Supported values: 1 (default), 2
105 |
106 |
107 | **Automatic FP32 to BF16 Conversion**
108 |
109 | If a FP32 model is used, the compiler will automatically cast it to BF16 if this option is enabled. For better control over accuracy, it is recommended to quantize the model to BF16 using Quark.
110 |
111 | .. code-block:: json
112 |
113 | "vaiml_config": {"enable_f32_to_bf16_conversion": true}
114 |
115 | Supported values: false (default), true
116 |
117 |
118 | **Optimizations for Transformer-Based Models**
119 |
120 | By default, the compiler vectorizes the data to optimize performance for CNN models. However, transformers perform best with unvectorized data. To better optimize transformer-based models, set:
121 |
122 | .. code-block:: json
123 |
124 | "vaiml_config": {"preferred_data_storage": "unvectorized"}
125 |
126 | Supported values: "vectorized" (default), "unvectorized"
127 |
128 |
129 | .. _compile-int8:
130 |
131 | **************************
132 | Compiling INT8 models
133 | **************************
134 |
135 | When compiling INT8 models, the NPU configuration must be specified through the ``xclbin`` provider option. This option is not required for BF16 models.
136 |
137 | There are two types of NPU configurations for INT8 models: standard and benchmark. Setting the NPU configuration involves specifying a specific ``.xclbin`` binary file, which is located in the Ryzen AI Software installation tree.
138 |
139 | Depending on the target processor and binary type (standard/benchmark), the following ``.xclbin`` files should be used:
140 |
141 | **For STX/KRK APUs**:
142 |
143 | - Standard binary: ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins\strix\AMD_AIE2P_Nx4_Overlay.xclbin``
144 | - Benchmark binary: ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins\strix\AMD_AIE2P_4x4_Overlay.xclbin``
145 |
146 | **For PHX/HPT APUs**:
147 |
148 | - Standard binary: ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins\phoenix\1x4.xclbin``
149 | - Benchmark binary: ``%RYZEN_AI_INSTALLATION_PATH%\voe-4.0-win_amd64\xclbins\phoenix\4x4.xclbin``
150 |
151 | Python example selecting the standard NPU configuration for STX/KRK:
152 |
153 | .. code-block:: python
154 |
155 | providers = ['VitisAIExecutionProvider']
156 |
157 | provider_options = [{
158 | 'xclbin': '{}\\voe-4.0-win_amd64\\xclbins\\strix\\AMD_AIE2P_Nx4_Overlay.xclbin'.format(os.environ["RYZEN_AI_INSTALLATION_PATH"])
159 | }]
160 |
161 | session = ort.InferenceSession(
162 | "resnet50.onnx",
163 | providers=providers,
164 | provider_options=provider_options
165 | )
166 |
167 | |
168 |
169 | By default, the Ryzen AI Conda environment automatically sets the standard binary for all inference sessions through the ``XLNX_VART_FIRMWARE`` environment variable. However, explicitly passing the xclbin option in the provider options overrides the environment variable.
170 |
171 | .. code-block::
172 |
173 | > echo %XLNX_VART_FIRMWARE%
174 | C:\Program Files\RyzenAI\1.4.0\voe-4.0-win_amd64\xclbins\strix\AMD_AIE2P_Nx4_Overlay.xclbin
175 |
176 |
177 |
178 | |
179 |
180 | ************************************
181 | Managing Compiled Models
182 | ************************************
183 |
184 | To avoid the overhead of recompiling models, it is very advantageous to save the compiled models and use these pre-compiled versions in the final application. Pre-compiled models can be loaded instantaneously and immediately executed on the NPU. This greatly improves the session creation time and overall end-user experience.
185 |
186 | The RyzenAI Software supports two mechanisms for saving and reloading compiled models:
187 |
188 | - VitisAI EP Cache
189 | - OnnxRuntime EP Context Cache
190 |
191 | .. _vitisai-ep-cache:
192 |
193 | VitisAI EP Cache
194 | ================
195 |
196 | The VitisAI EP includes a built-in caching mechanism. This mechanism is enabled by default. When a model is compiled for the first time, it is automatically saved in the VitisAI EP cache directory. Any subsequent creation of an ONNX Runtime session using the same model will load the precompiled model from the cache directory, thereby reducing session creation time.
197 |
198 | The location of the VitisAI EP cache is specified with the ``cache_dir`` and ``cache_key`` provider options:
199 |
200 | - ``cache_dir`` - Specifies the path and name of the cache directory.
201 | - ``cache_key`` - Specifies the subfolder in the cache directory where the compiled model is stored.
202 |
203 | Python example:
204 |
205 | .. code-block:: python
206 |
207 | from pathlib import Path
208 |
209 | providers = ['VitisAIExecutionProvider']
210 | cache_dir = Path(__file__).parent.resolve()
211 | provider_options = [{'cache_dir': str(cache_dir),
212 | 'cache_key': 'compiled_resnet50'}]
213 |
214 | session = ort.InferenceSession(
215 | "resnet50.onnx",
216 | providers=providers,
217 | provider_options=provider_options
218 | )
219 |
220 |
221 | In the example above, the cache directory is set to the absolute path of the folder containing the script being executed. Once the session is created, the compiled model is saved inside a subdirectory named ``compiled_resnet50`` within the specified cache folder.
222 |
223 | Default Settings
224 | ----------------
225 | In the current release, if ``cache_dir`` is not set, the default cache location is determined by the type of model:
226 |
227 | - INT8 models - ``C:\temp\%USERNAME%\vaip\.cache``
228 | - BF16 models - The directory where the script or program is executed
229 |
230 |
231 | Disabling the Cache
232 | -------------------
233 | To ignore cached models and force recompilation, unset the ``XLNX_ENABLE_CACHE`` environment variable before running the application:
234 |
235 | .. code-block::
236 |
237 | set XLNX_ENABLE_CACHE=
238 |
239 |
240 |
241 | VitisAI EP Cache Encryption
242 | ---------------------------
243 |
244 | The contents of the VitisAI EP cache folder can be encrypted using AES256. Cache encryption is enabled by passing an encryption key through the VAI EP provider options. The same key must be used to decrypt the model when loading it from the cache. The key is a 256-bit value represented as a 64-digit string.
245 |
246 | Python example:
247 |
248 | .. code-block:: python
249 |
250 | session = onnxruntime.InferenceSession(
251 | "resnet50.onnx",
252 | providers=["VitisAIExecutionProvider"],
253 | provider_options=[{
254 | "config_file":"/path/to/vaip_config.json",
255 | "encryptionKey": "89703f950ed9f738d956f6769d7e45a385d3c988ca753838b5afbc569ebf35b2"
256 | }])
257 |
258 | C++ example:
259 |
260 | .. code-block:: cpp
261 |
262 | auto onnx_model_path = "resnet50.onnx"
263 | Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "resnet50");
264 | auto session_options = Ort::SessionOptions();
265 | auto options = std::unorderd_map({});
266 | options["config_file"] = "/path/to/vaip_config.json";
267 | options["encryptionKey"] = "89703f950ed9f738d956f6769d7e45a385d3c988ca753838b5afbc569ebf35b2";
268 |
269 | session_options.AppendExecutionProvider("VitisAI", options);
270 | auto session = Ort::Experimental::Session(env, model_name, session_options);
271 |
272 | As a result of encryption, the model generated in the cache directory cannot be opened with Netron. Additionally, dumping is disabled to prevent the leakage of sensitive information about the model.
273 |
274 | .. _ort-ep-context-cache:
275 |
276 | OnnxRuntime EP Context Cache
277 | ============================
278 |
279 | The Vitis AI EP supports the ONNX Runtime EP context cache feature. This features allows dumping and reloading a snapshot of the EP context before deployment. Currently, this feature is only available for INT8 models.
280 |
281 | The user can enable dumping of the EP context by setting the ``ep.context_enable`` session option to 1.
282 |
283 | The following options can be used for additional control:
284 |
285 | - ``ep.context_file_path`` – Specifies the output path for the dumped context model.
286 | - ``ep.context_embed_mode`` – Embeds the EP context into the ONNX model when set to 1.
287 |
288 | For further details, refer to the official ONNX Runtime documentation: https://onnxruntime.ai/docs/execution-providers/EP-Context-Design.html
289 |
290 |
291 | EP Context Encryption
292 | ---------------------
293 |
294 | By default, the generated context model is unencrypted and can be used directly during inference. If needed, the context model can be encrypted using one of the methods described below.
295 |
296 | User-managed encryption
297 | ~~~~~~~~~~~~~~~~~~~~~~~
298 | After the context model is generated, the developer can encrypt the generated file using a method of choice. At runtime, the encrypted file can be loaded by the application, decrypted in memory and passed as a serialized string to the inference session. This method gives complete control to the developer over the encryption process.
299 |
300 | EP-managed encryption
301 | ~~~~~~~~~~~~~~~~~~~~~~~
302 | The Vitis AI EP encryption mechanism can be used to encrypt the context model. This is enabled by passing an encryption key via the ``encryptionKey`` provider option (discussed in the previous section). The model is encrypted using AES256. At runtime, the same encryption key must be provided to decrypt and load the context model. With this method, encryption and decryption is seamlessly managed by the VitisAI EP.
303 |
304 | Python example:
305 |
306 | .. code-block:: python
307 |
308 | # Compilation session
309 | session_options = ort.SessionOptions()
310 | session_options.add_session_config_entry('ep.context_enable', '1')
311 | session_options.add_session_config_entry('ep.context_file_path', 'context_model.onnx')
312 | session_options.add_session_config_entry('ep.context_embed_mode', '1')
313 | session = ort.InferenceSession(
314 | path_or_bytes='resnet50.onnx',
315 | sess_options=session_options,
316 | providers=['VitisAIExecutionProvider'],
317 | provider_options=[{'encryptionKey': '89703f950ed9f738d956f6769d7e45a385d3c988ca753838b5afbc569ebf35b2'}]
318 | )
319 |
320 | # Inference session
321 | session_options = ort.SessionOptions()
322 | session = ort.InferenceSession(
323 | path_or_bytes='context_model.onnx',
324 | sess_options=session_options,
325 | providers=['VitisAIExecutionProvider'],
326 | provider_options=[{'encryptionKey': '89703f950ed9f738d956f6769d7e45a385d3c988ca753838b5afbc569ebf35b2'}]
327 | )
328 |
329 |
330 | **NOTE**: When compiling with encryptionKey, ensure that any existing cache directory (either the default cache directory or the directory specified by the ``cache_dir`` provider option) is deleted before compiling.
331 |
332 | |
333 |
334 | **************************
335 | Operator Assignment Report
336 | **************************
337 |
338 |
339 | Vitis AI EP generates a file named ``vitisai_ep_report.json`` that provides a report on model operator assignments across CPU and NPU. This file is automatically generated in the cache directory if no explicit cache location is specified in the code. This report includes information such as the total number of nodes, the list of operator types in the model, and which nodes and operators runs on the NPU or on the CPU. Additionally, the report includes node statistics, such as input to a node, the applied operation, and output from the node.
340 |
341 |
342 | .. code-block::
343 |
344 | {
345 | "deviceStat": [
346 | {
347 | "name": "all",
348 | "nodeNum": 400,
349 | "supportedOpType": [
350 | "::Add",
351 | "::Conv",
352 | ...
353 | ]
354 | },
355 | {
356 | "name": "CPU",
357 | "nodeNum": 2,
358 | "supportedOpType": [
359 | "::DequantizeLinear",
360 | "::QuantizeLinear"
361 | ]
362 | },
363 | {
364 | "name": "NPU",
365 | "nodeNum": 398,
366 | "supportedOpType": [
367 | "::Add",
368 | "::Conv",
369 | ...
370 | ]
371 | ...
372 |
373 | ..
374 | ------------
375 |
376 | #####################################
377 | License
378 | #####################################
379 |
380 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
381 |
--------------------------------------------------------------------------------
/docs/npu_oga.rst:
--------------------------------------------------------------------------------
1 | :orphan:
2 |
3 | ######################
4 | OGA NPU Execution Mode
5 | ######################
6 |
7 | Ryzen AI Software supports deploying LLMs on Ryzen AI PCs using the native ONNX Runtime Generate (OGA) C++ or Python API. The OGA API is the lowest-level API available for building LLM applications on a Ryzen AI PC. This documentation covers the NPU execution mode for LLMs, which utilizes only the NPU.
8 |
9 | **Note**: Refer to :doc:`hybrid_oga` for Hybrid NPU + GPU execution mode.
10 |
11 |
12 | ************************
13 | Supported Configurations
14 | ************************
15 |
16 | The Ryzen AI OGA flow supports Strix and Krackan Point processors. Phoenix (PHX) and Hawk (HPT) processors are not supported.
17 |
18 |
19 | ************
20 | Requirements
21 | ************
22 | - Install NPU Drivers and Ryzen AI MSI installer according to the :doc:`inst`
23 | - Install Git for Windows (needed to download models from HF): https://git-scm.com/downloads
24 |
25 |
26 | ********************
27 | Pre-optimized Models
28 | ********************
29 |
30 | AMD provides a set of pre-optimized LLMs ready to be deployed with Ryzen AI Software and the supporting runtime for NPU execution. These models can be found on Hugging Face in the following collection:
31 |
32 | - https://huggingface.co/amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
33 | - https://huggingface.co/amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
34 | - https://huggingface.co/amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix
35 | - https://huggingface.co/amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-bf16-onnx-ryzen-strix
36 | - https://huggingface.co/amd/chatglm3-6b-awq-g128-int4-asym-bf16-onnx-ryzen-strix
37 | - https://huggingface.co/amd/Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix
38 | - https://huggingface.co/amd/Llama2-7b-chat-awq-g128-int4-asym-bf16-onnx-ryzen-strix
39 | - https://huggingface.co/amd/Llama-3-8B-awq-g128-int4-asym-bf16-onnx-ryzen-strix
40 | - https://huggingface.co/amd/Llama-3.1-8B-awq-g128-int4-asym-bf16-onnx-ryzen-strix
41 | - https://huggingface.co/amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
42 | - https://huggingface.co/amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
43 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Llama-8B-awq-g128-int4-asym-bf16-onnx-ryzen-strix
44 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Qwen-1.5B-awq-g128-int4-asym-bf16-onnx-ryzen-strix
45 | - https://huggingface.co/amd/DeepSeek-R1-Distill-Qwen-7B-awq-g128-int4-asym-bf16-onnx-ryzen-strix
46 | - https://huggingface.co/amd/AMD-OLMo-1B-SFT-DPO-awq-g128-int4-asym-bf16-onnx-ryzen-strix
47 |
48 | The steps for deploying the pre-optimized models using C++ and python are described in the following sections.
49 |
50 | ***************************
51 | NPU Execution of OGA Models
52 | ***************************
53 |
54 | Setup
55 | =====
56 |
57 | Activate the Ryzen AI 1.4 Conda environment:
58 |
59 | .. code-block::
60 |
61 | conda activate ryzen-ai-1.4.0
62 |
63 | Create a folder to run the LLMs from, and copy the required files:
64 |
65 | .. code-block::
66 |
67 | mkdir npu_run
68 | cd npu_run
69 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\npu-llm\exe" .\libs
70 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\npu-llm\libs\vaip_llm.json" libs
71 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\npu-llm\onnxruntime-genai.dll" libs
72 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_vitis_ai_custom_ops.dll" libs
73 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_providers_shared.dll" libs
74 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_vitisai_ep.dll" libs
75 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\dyn_dispatch_core.dll" libs
76 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_providers_vitisai.dll" libs
77 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\transaction.dll" libs
78 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime.dll" libs
79 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\xclbin.dll" libs
80 |
81 |
82 | Download Models from HuggingFace
83 | ================================
84 |
85 | Download the desired models from the list of pre-optimized models on Hugging Face:
86 |
87 | .. code-block::
88 |
89 | # Make sure you have git-lfs installed (https://git-lfs.com)
90 | git lfs install
91 | git clone
92 |
93 | For example, for Llama-2-7b:
94 |
95 | .. code-block::
96 |
97 | git lfs install
98 | git clone https://huggingface.co/amd/Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix
99 |
100 |
101 | **NOTE**: Ensure the models are cloned in the ``npu_run`` folder.
102 |
103 |
104 | Enabling Performance Mode (Optional)
105 | ====================================
106 |
107 | To run the LLMs in the best performance mode, follow these steps:
108 |
109 | - Go to ``Windows`` → ``Settings`` → ``System`` → ``Power`` and set the power mode to Best Performance.
110 | - Execute the following commands in the terminal:
111 |
112 | .. code-block::
113 |
114 | cd C:\Windows\System32\AMD
115 | xrt-smi configure --pmode performance
116 |
117 |
118 |
119 | Sample C++ Programs
120 | ===================
121 |
122 | The ``run_llm.exe`` test application provides a simple interface to run LLMs. The source code for this application can also be used a reference for how to integrate LLMs using the native OGA C++ APIs.
123 |
124 | It supports the following command line options::
125 |
126 | -m: model path
127 | -f: prompt file
128 | -n: max new tokens
129 | -c: use chat template
130 | -t: input prompt token length
131 | -l: max length to be set in search options
132 | -h: help
133 |
134 |
135 | Example usage:
136 |
137 | .. code-block::
138 |
139 | .\libs\run_llm.exe -m "Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix" -f "Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix\prompts.txt" -t "1024" -n 20
140 |
141 | |
142 |
143 | The ``model_benchmark.exe`` program can be used to profile the execution of LLMs and report various metrics. It supports the following command line options::
144 |
145 | -i,--input_folder
146 | Path to the ONNX model directory to benchmark, compatible with onnxruntime-genai.
147 | -l,--prompt_length
148 | List of number of tokens in the prompt to use.
149 | -p,--prompt_file
150 | Name of prompt file (txt) expected in the input model directory.
151 | -g,--generation_length
152 | Number of tokens to generate. Default: 128
153 | -r,--repetitions
154 | Number of times to repeat the benchmark. Default: 5
155 | -w,--warmup
156 | Number of warmup runs before benchmarking. Default: 1
157 | -t,--cpu_util_time_interval
158 | Sampling time interval for peak cpu utilization calculation, in milliseconds. Default: 250
159 | -v,--verbose
160 | Show more informational output.
161 | -h,--help
162 | Show this help message and exit.
163 |
164 |
165 | For example, for Llama-2-7b:
166 |
167 | .. code-block::
168 |
169 | .\libs\model_benchmark.exe -i "Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix" -g 20 -p "Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix\prompts.txt" -l "2048,1024,512,256,128"
170 |
171 | |
172 |
173 | **NOTE**: The C++ source code for the ``run_llm.exe`` and ``model_benchmark.exe`` executables can be found in the ``%RYZEN_AI_INSTALLATION_PATH%\npu-llm\cpp`` folder. This source code can be modified and recompiled using the commands below.
174 |
175 | .. code-block::
176 |
177 | :: Copy project files
178 | xcopy /E /I "%RYZEN_AI_INSTALLATION_PATH%\npu-llm\cpp" .\sources
179 |
180 | :: Build project
181 | cd sources
182 | cmake -G "Visual Studio 17 2022" -A x64 -S . -B build
183 | cmake --build build --config Release
184 |
185 | :: Copy executables in the "libs" folder
186 | xcopy /I build\Release .\libs
187 |
188 | :: Copy runtime dependencies in the "libs" folder
189 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\npu-llm\libs\vaip_llm.json" libs
190 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\npu-llm\onnxruntime-genai.dll" libs
191 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_vitis_ai_custom_ops.dll" libs
192 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_providers_shared.dll" libs
193 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_vitisai_ep.dll" libs
194 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\dyn_dispatch_core.dll" libs
195 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime_providers_vitisai.dll" libs
196 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\transaction.dll" libs
197 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\onnxruntime.dll" libs
198 | xcopy /I "%RYZEN_AI_INSTALLATION_PATH%\deployment\voe\xclbin.dll" libs
199 |
200 | Sample Python Scripts
201 | =====================
202 |
203 | In the model directory, open the ``genai_config.json`` file located in the folder of the downloaded model. Update the value of the "custom_ops_library" key with the path to the ``onnxruntime_vitis_ai_custom_ops.dll``, located in the ``npu_run\libs`` folder:
204 |
205 | .. code-block::
206 |
207 | "session_options": {
208 | ...
209 | "custom_ops_library": "libs\\onnxruntime_vitis_ai_custom_ops.dll",
210 | ...
211 | }
212 |
213 | To run LLMs other than ChatGLM, use the following command:
214 |
215 | .. code-block::
216 |
217 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\llama3\run_model.py" --model_dir
218 |
219 | To run ChatGLM, use the following command:
220 |
221 | .. code-block::
222 |
223 | pip install transformers==4.44.0
224 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\chatglm\model-generate-chatglm3.py" -m
225 |
226 | For example, for Llama-2-7b:
227 |
228 | .. code-block::
229 |
230 | python "%RYZEN_AI_INSTALLATION_PATH%\hybrid-llm\examples\python\llama3\run_model.py" --model_dir Llama-2-7b-hf-awq-g128-int4-asym-bf16-onnx-ryzen-strix
231 |
232 |
233 |
234 | ***********************
235 | Using Fine-Tuned Models
236 | ***********************
237 |
238 | It is also possible to run fine-tuned versions of the pre-optimized OGA models.
239 |
240 | To do this, the fine-tuned models must first be prepared for execution with the OGA NPU-only flow. For instructions on how to do this, refer to the page about :doc:`oga_model_prepare`.
241 |
242 | Once a fine-tuned model has been prepared for NPU-only execution, it can be deployed by following the steps described above in this page.
243 |
--------------------------------------------------------------------------------
/docs/oga_model_prepare.rst:
--------------------------------------------------------------------------------
1 | ####################
2 | Preparing OGA Models
3 | ####################
4 |
5 | This section describes the process for preparing LLMs for deployment on a Ryzen AI PC using the hybrid or NPU-only execution mode. Currently, the flow supports only fine-tuned versions of the models already supported (as listed in :doc:`hybrid_oga` page). For example, fine-tuned versions of Llama2 or Llama3 can be used. However, different model families with architectures not supported by the hybrid flow cannot be used.
6 |
7 | Preparing a LLM for deployment on a Ryzen AI PC involves 2 steps:
8 |
9 | 1. **Quantization**: The pretrained model is quantized to reduce memory footprint and better map to compute resources in the hardware accelerators
10 | 2. **Postprocessing**: During the postprocessing the model is exported to OGA followed by NPU-only or Hybrid execution mode specific postprocess to obtain the final deployable model.
11 |
12 | ************
13 | Quantization
14 | ************
15 |
16 | Prerequisites
17 | =============
18 | Linux machine with AMD (e.g., AMD Instinct MI Series) or Nvidia GPUs
19 |
20 | Setup
21 | =====
22 |
23 | 1. Create and activate Conda Environment
24 |
25 | .. code-block::
26 |
27 | conda create --name python=3.11
28 | conda activate
29 |
30 | 2. If Using AMD GPUs, update PyTorch to use ROCm
31 |
32 | .. code-block::
33 |
34 | pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
35 | python -c "import torch; print(torch.cuda.is_available())" # Must return `True`
36 |
37 | 3. Download :download:`AMD Quark 0.8 ` and unzip the archive
38 |
39 | 4. Install Quark:
40 |
41 | .. code-block::
42 |
43 | cd
44 | pip install amd_quark-0.8+<>.whl
45 |
46 | 5. Install other dependencies
47 |
48 | .. code-block::
49 |
50 | pip install datasets
51 | pip install transformers
52 | pip install accelerate
53 | pip install evaluate
54 |
55 |
56 | Some models may require a specific version of ``transformers``. For example, ChatGLM3 requires version 4.44.0.
57 |
58 | Generate Quantized Model
59 | ========================
60 |
61 | Use following command to run Quantization. In a GPU equipped Linux machine the quantization can take about 30-60 minutes.
62 |
63 | .. code-block::
64 |
65 | cd examples/torch/language_modeling/llm_ptq/
66 |
67 | python quantize_quark.py \
68 | --model_dir "meta-llama/Llama-2-7b-chat-hf" \
69 | --output_dir \
70 | --quant_scheme w_uint4_per_group_asym \
71 | --num_calib_data 128 \
72 | --quant_algo awq \
73 | --dataset pileval_for_awq_benchmark \
74 | --model_export hf_format \
75 | --data_type \
76 | --exclude_layers
77 |
78 |
79 | - To generate OGA model for NPU only execution mode use ``--datatype float32``
80 | - To generate OGA model for Hybrid execution mode use ``--datatype float16``
81 | - For a BF16 pretrained model, you can use ``--data_type bfloat16``.
82 |
83 | The quantized model is generated in the folder.
84 |
85 | **************
86 | Postprocessing
87 | **************
88 |
89 | Copy the quantized model to the Windows PC with Ryzen AI installed, activate the Ryzen AI Conda environment, and execute ``model_generate`` command to generate the final model.
90 |
91 | Generate the final model for Hybrid execution mode:
92 |
93 | .. code-block::
94 |
95 | conda activate ryzen-ai-
96 |
97 | model_generate --hybrid
98 |
99 |
100 | Generate the final model for NPU execution mode:
101 |
102 | .. code-block::
103 |
104 | conda activate ryzen-ai-
105 |
106 | model_generate --npu
107 |
108 | ..
109 | ------------
110 |
111 | #####################################
112 | License
113 | #####################################
114 |
115 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
116 |
--------------------------------------------------------------------------------
/docs/rai_linux.rst:
--------------------------------------------------------------------------------
1 | :orphan:
2 |
3 | ##########################
4 | Ryzen AI Software on Linux
5 | ##########################
6 |
7 | This guide provides instructions for using Ryzen AI 1.4 on Linux for model compilation and followed by running inference on Windows.
8 |
9 | *************
10 | Prerequisites
11 | *************
12 | The following are the recommended system configuration for RyzenAI Linux installer
13 |
14 | .. list-table::
15 | :widths: 25 25
16 | :header-rows: 1
17 |
18 | * - Dependencies
19 | - Version Requirement
20 | * - Ubuntu
21 | - 22.04
22 | * - RAM
23 | - 32GB or Higher
24 | * - CPU cores
25 | - >= 8
26 | * - Python
27 | - 3.10 or Higher
28 |
29 |
30 | *************************
31 | Installation Instructions
32 | *************************
33 |
34 | - Download the Ryzen AI Software Linux installer: :download:`ryzen_ai-1.4.0.tgz `.
35 |
36 | - Extract the .tgz using the following command:
37 |
38 | .. code-block::
39 |
40 | tar -xvzf ryzen_ai-1.4.0.tgz
41 |
42 | - Run the installer with default settings. This will prompt to read and agree to the EULA:
43 |
44 | .. code-block::
45 |
46 | cd ryzen_ai-1.4.0
47 | ./install_ryzen_ai_1_4.sh
48 |
49 | - After reading the EULA, re-run the installer with options to agree to the EULA and create a Python virtual environment:
50 |
51 | .. code-block::
52 |
53 | ./install_ryzen_ai_1_4.sh -a yes -p -l
54 |
55 | - Activate the virtual environment to start using the Ryzen AI Software:
56 |
57 | .. code-block::
58 |
59 | source /bin/activate
60 |
61 |
62 | ******************
63 | Usage Instructions
64 | ******************
65 |
66 | The process for model compilation on Linux is similar to that on Windows. Refer to the instructions provided in the :doc:`modelrun` page for complete details.
67 |
68 | Once the model has been successfully compiled on your Linux machine, proceed to copy the entire working directory to a Windows machine that operates on an STX-based system.
69 |
70 | Prior to running the compiled model on the Windows machine, ensure that all required prerequisites are satisfied as listed in the :doc:`inst` page.
71 |
--------------------------------------------------------------------------------
/docs/relnotes.rst:
--------------------------------------------------------------------------------
1 | .. include:: /icons.txt
2 |
3 | #############
4 | Release Notes
5 | #############
6 |
7 | .. _supported-configurations:
8 |
9 | ************************
10 | Supported Configurations
11 | ************************
12 |
13 | Ryzen AI 1.4 Software supports AMD processors codenamed Phoenix, Hawk Point, Strix, Strix Halo, and Krackan Point. These processors can be found in the following Ryzen series:
14 |
15 | - Ryzen 200 Series
16 | - Ryzen 7000 Series, Ryzen PRO 7000 Series
17 | - Ryzen 8000 Series, Ryzen PRO 8000 Series
18 | - Ryzen AI 300 Series, Ryzen AI PRO Series, Ryzen AI Max 300 Series
19 |
20 | For a complete list of supported devices, refer to the `processor specifications `_ page (look for the "AMD Ryzen AI" column towards the right side of the table, and select "Available" from the pull-down menu).
21 |
22 | The rest of this document will refer to Phoenix as PHX, Hawk Point as HPT, Strix and Strix Halo as STX, and Krackan Point as KRK.
23 |
24 |
25 | *************************
26 | Model Compatibility Table
27 | *************************
28 |
29 | The following table lists which types of models are supported on what hardware platforms.
30 |
31 | .. list-table::
32 | :header-rows: 1
33 |
34 | * - Model Type
35 | - PHX/HPT
36 | - STX/KRK
37 | * - CNN INT8
38 | - |checkmark|
39 | - |checkmark|
40 | * - CNN BF16
41 | -
42 | - |checkmark|
43 | * - NLP BF16
44 | -
45 | - |checkmark|
46 | * - LLM (OGA)
47 | -
48 | - |checkmark|
49 |
50 |
51 | ***********
52 | Version 1.4
53 | ***********
54 |
55 | - New Features:
56 |
57 | - `New architecture support for Ryzen AI 300 series processors `_
58 | - Unified support for LLMs, INT8, and BF16 models in a single release package
59 | - Public release for compilation of BF16 CNN and NLP models on Windows
60 | - `Public release of the LLM Hybrid OGA flow `_
61 | - `LLM building flow for finetuned LLM `_
62 | - Support for up to 16 hardware contexts on Ryzen AI 300 series processors
63 | - Vitis AI EP now supports the ONNX Runtime EP context cache feature (for custom handling of pre-compiled models)
64 | - Ryzen AI environment variables converted to VitisAI EP session options
65 | - Improved exception handling and fallback to CPU
66 |
67 | - `New Hybrid execution mode LLMs `_:
68 |
69 | - DeepSeek-R1-Distill-Llama-8B
70 | - DeepSeek-R1-Distill-Qwen-1.5B
71 | - DeepSeek-R1-Distill-Qwen-7B
72 | - Gemma2-2B
73 | - Qwen2-1.5B
74 | - Qwen2-7B
75 | - AMD-OLMO-1B-SFT-DPO
76 | - Mistral-7B-Instruct-v0.1
77 | - Mistral-7B-Instruct-v0.2
78 | - Mistral-7B-v0.3
79 | - Llama3.1-8B-Instruct
80 | - Codellama-7B-Instruct
81 |
82 | - :doc:`New BF16 model examples `:
83 |
84 | - Image classification
85 | - Finetuned DistilBERT for text classification
86 | - Text embedding model Alibaba-NLP/gte-large-en-v1.5
87 |
88 | - New Tools:
89 |
90 | - `Lemonade SDK `_
91 |
92 | - `Lemonade Server `_: A server interface that uses the standard Open AI API, allowing applications in any language to integrate with Lemonade Server for local LLM deployment and compatibility with existing Open AI apps.
93 | - `Lemonade Python API `_: Offers High-Level API for easy integration of Lemonade LLMs into Python applications and Low-Level API for custom experiments with specific checkpoints, devices, and tools.
94 | - `Lemonade Command Line `_ Interface easily benchmark, measure accuracy, prompt or gather memory usage of your LLM.
95 | - `TurnkeyML `_ – Open-source tool that includes low-code APIs for general ONNX workflows.
96 | - `Digest AI `_ – A Model Ingestion and Analysis Tool in collaboration with the Linux Foundation.
97 | - `GAIA `_ – An open-source application designed for the quick setup and execution of generative AI applications on local PC hardware.
98 |
99 | - Quark-torch:
100 |
101 | - Added ROUGE and METEOR evaluation metrics for LLMs
102 | - Support for evaluating ONNX models exported using OGA
103 | - Support for offline evaluation (evaluation without generation) for LLMs
104 | - Support for Hugging Face integration
105 | - Support for Gemma2 quantization using the OGA flow
106 | - Support for Llama-3.2 quantization with FP8 (weights, activation, and KV-cache) for the vision and language components
107 |
108 | - Quark-onnx:
109 |
110 | - Support compatibility with ONNX Runtime version 1.20.0, and 1.20.1
111 | - Support for microexponents (MX) data types, including MX4, MX6, and MX9
112 | - Support for BF16 data type for VAIML
113 | - Support for excluding pre and post-processing from quantization
114 | - Support for mixed precision with any data type
115 | - Support for Quarot rotation R1 algorithm
116 | - Support for microexponents and microscaling AdaQuant
117 | - Support for an auto-search algorithm to automatically find the best accuracy quantized model
118 | - Added tools for evaluating L2, PSNR, VMAF, and cosine
119 |
120 | - ONNX Runtime EP:
121 |
122 | - Support for Chinese characters in the ``filename/cache_dir/cache_key/xclbin``
123 | - Support for ``int4/uint4`` data type
124 | - Support for configurable failure handling: CPU fallback or exception
125 | - Update for encrypt/decrypt feature
126 |
127 | - Known Issues:
128 |
129 | - Microsoft Windows Insider Program (WIP) users may see warnings or need to restart when running all applications concurrently.
130 |
131 | - NPU driver and workloads will continue to work.
132 |
133 | - Context creation may appear to be limited when some application do not close contexts quickly.
134 |
135 |
136 | ***********
137 | Version 1.3
138 | ***********
139 |
140 | - New Features:
141 |
142 | - Initial release of the Quark quantizer
143 | - Support for mixed precision data types
144 | - Compatibility with Copilot+ applications
145 |
146 | - Improved support for :doc:`LLMs using OGA `
147 |
148 | - New EoU Tools:
149 |
150 | - CNN profiling tool for VAI-ML flow
151 | - Idle detection and suspension of contexts
152 | - Rebalance feature for AIE hardware resource optimization
153 |
154 | - NPU and Compiler:
155 |
156 | - New Op Support:
157 |
158 | - MAC
159 | - QResize Bilinear
160 | - LUT Q-Power
161 | - Expand
162 | - Q-Hsoftmax
163 | - A16 Q-Pad
164 | - Q-Reduce-Mean along H/W dimension
165 | - A16 Q-Global-AvgPool
166 | - A16 Padding with non-zero values
167 | - A16 Q-Sqrt
168 | - Support for XINT8/XINT16 MatMul and A16W16/A8W8 Q-MatMul
169 |
170 | - Performance Improvements:
171 |
172 | - Q-Conv, Q-Pool, Q-Add, Q-Mul, Q-InstanceNorm
173 | - Enhanced QDQ support for a range of operations
174 | - Enhanced the tiling algorithm
175 | - Improved graph-level optimization with extra transpose removal
176 | - Enhanced AT/MT fusion
177 | - Optimized memory usage and compile time
178 | - Improved compilation messages
179 |
180 | - Quark for PyTorch:
181 |
182 | - Model Support:
183 |
184 | - Examples of LLM PTQ, such as Llama3.2 and Llama3.2-Vision models
185 | - Example of YOLO-NAS detection model PTQ/QAT
186 | - Example of SDXL v1.0 with weight INT8 activation INT8
187 |
188 | - PyTorch Quantizer Enhancements:
189 |
190 | - Partial model quantization by user configuration under FX mode
191 | - Quantization of ConvTranspose2d in Eager Mode and FX mode
192 | - Advanced Quantization Algorithms with auto-generated configurations
193 | - Optimized Configuration with DataTypeSpec for ease of use
194 | - Accelerated in-place replacement under Eager Mode
195 | - Loading configuration from file of algorithms and pre-optimizations
196 |
197 | - Quark for ONNX:
198 |
199 | - New Features:
200 |
201 | - Compatibility with ONNX Runtime version 1.18, 1.19
202 | - Support for int4, uint4, Microscaling data types
203 | - Quantization for arbitrary specified operators
204 | - Quantization type alignment of element-wise operators for mixed precision
205 | - ONNX graph cleaning
206 | - Int32 bias quantization
207 |
208 | - ONNX Quantizer Enhancements:
209 |
210 | - Fast fine-tuning support for the MatMul operator, BFP data type, and GPU acceleration
211 | - Improved ONNX quantization of LLM models
212 | - Optimized quantization of FP16 models
213 | - Custom operator compilation process
214 | - Default parameters for auto mixed precision
215 | - Optimized Ryzen AI workflow by aligning with hardware constraints of the NPU
216 |
217 | - ONNX Runtime EP:
218 |
219 | - Support for ONNX Runtime EP shared libraries
220 | - Python dependency removal
221 | - Memory optimization during the compile phase
222 | - Pattern API enhancement with multiple outputs and commutable arguments support
223 |
224 | - Known Issues:
225 |
226 | - Extended compile time for some models with BF16/BFP16 data types
227 | - LLM models with 4K sequence length may revert to CPU execution
228 | - Accuracy drop in some Transformer models using BF16/BFP16 data types, requiring Quark intervention
229 |
230 | ***********
231 | Version 1.2
232 | ***********
233 |
234 | - New features:
235 |
236 | - Support added for Strix Point NPUs
237 | - Support added for integrated GPU
238 | - Smart installer for Ryzen AI 1.2
239 | - NPU DPM based on power slider
240 |
241 | - New model support:
242 |
243 | - `LLM flow support `_ for multiple models in both PyTorch and ONNX flow (optimized model support will be released asynchronously)
244 | - SDXL-T with limited performance optimization
245 |
246 | - New EoU tools:
247 |
248 | - `AI Analyzer `_ : Analysis and visualization of model compilation and inference profiling
249 | - Platform/NPU inspection and management tool (`xrt-smi `_)
250 | - `Onnx Benchmarking tool `_
251 |
252 | - New Demos:
253 |
254 | - NPU-GPU multi-model pipeline application `demo `_
255 |
256 | - NPU and Compiler
257 |
258 | - New device support: Strix Nx4 and 4x4 Overlay
259 | - New Op support:
260 |
261 | - InstanceNorm
262 | - Silu
263 | - Floating scale quantization operators (INT8, INT16)
264 | - Support new rounding mode (Round to even)
265 | - Performance Improvement:
266 |
267 | - Reduced the model compilation time
268 | - Improved instruction loading
269 | - Improved synchronization in large overlay
270 | - Enhanced strided_slice performance
271 | - Enhanced convolution MT fusion
272 | - Enhanced convolution AT fusion
273 | - Enhanced data movement op performance
274 | - ONNX Quantizer updates
275 |
276 | - Improved usability with various features and tools, including weights-only quantization, graph optimization, dynamic shape fixing, and format transformations.
277 | - Improved the accuracy of quantized models through automatic mixed precision and enhanced AdaRound and AdaQuant techniques.
278 | - Enhanced support for the BFP data type, including more attributes and shape inference capability.
279 | - Optimized the NPU workflow by aligning with the hardware constraints of the NPU.
280 | - Supported compilation for Windows and Linux.
281 | - Bugfix:
282 |
283 | - Fixed the problem where per-channel quantization is not compatible with onnxruntime 1.17.
284 | - Fixed the bug of CLE when conv with groups.
285 | - Fixed the bug of bias correction.
286 | - Pytorch Quantizer updates
287 |
288 | - Tiny value quantization protection.
289 | - Higher onnx version support in quantized model exporting.
290 | - Relu6 hardware constrains support.
291 | - Support of mean operation with keepdim=True.
292 | - Resolved issues:
293 |
294 | - NPU SW stack will fail to initialize when the system is out of memory. This could impact camera functionality when Microsoft Effect Pack is enabled.
295 | - If Microsoft Effects Pack is overloaded with other 4+ applications that use NPU to do inference, then camera functionality can be impacted. Can be fixed with a reboot. This will be fixed in the next release.
296 |
297 | ***********
298 | Version 1.1
299 | ***********
300 |
301 | - New model support:
302 |
303 | - Llama 2 7B with w4abf16 (3-bit and 4-bit) quantization (Beta)
304 | - Whisper base (EA access)
305 |
306 | - New EoU tools:
307 |
308 | - CNN Benchmarking tool on RyzenAI-SW Repo
309 | - Platform/NPU inspection and management tool
310 |
311 | Quantizer
312 | =========
313 |
314 | - ONNX Quantizer:
315 |
316 | - Improved usability with various features and tools, including diverse parameter configurations, graph optimization, shape fixing, and format transformations.
317 | - Improved quantization accuracy through the implementation of experimental algorithmic improvements, including AdaRound and AdaQuant.
318 | - Optimized the NPU workflow by distinguishing between different targets and aligning with the hardware constraints of the NPU.
319 | - Introduced new utilities for model conversion.
320 |
321 | - PyTorch Quantizer:
322 |
323 | - Mixed data type quantization enhancement and bug fix.
324 | - Corner bug fixes for add, sub, and conv1d operations.
325 | - Tool for converting the S8S8 model to the U8S8 model.
326 | - Tool for converting the customized Q/DQ to onnxruntime contributed Q/DQ with the "microsoft" domain.
327 | - Tool for fixing a dynamic shapes model to fixed shape model.
328 |
329 | - Bug fixes
330 |
331 | - Fix for incorrect logging when simulating the LeakyRelu alpha value.
332 | - Fix for useless initializers not being cleaned up during optimization.
333 | - Fix for external data cannot be found when using use_external_data_format.
334 | - Fix for custom Ops cannot be registered due to GLIBC version mismatch
335 |
336 | NPU and Compiler
337 | ================
338 |
339 | - New op support:
340 |
341 | - Support Channel-wie Prelu.
342 | - Gstiling with reverse = false.
343 | - Fixed issues:
344 |
345 | - Fixed Transpose-convolution and concat optimization issues.
346 | - Fixed Conv stride 3 corner case hang issue.
347 | - Performance improvement:
348 |
349 | - Updated Conv 1x1 stride 2x2 optimization.
350 | - Enhanced Conv 7x7 performance.
351 | - Improved padding performance.
352 | - Enhanced convolution MT fusion.
353 | - Improved the performance for NCHW layout model.
354 | - Enhanced the performance for eltwise-like op.
355 | - Enhanced Conv and eltwise AT fusion.
356 | - Improved the output convolution/transpose-convolution’s performance.
357 | - Enhanced the logging message for EoU.
358 |
359 |
360 | ONNX Runtime EP
361 | ===============
362 |
363 | - End-2-End Application support on NPU
364 |
365 | - Enhanced existing support: Provided high-level APIs to enable seamless incorporation of pre/post-processing operations into the model to run on NPU
366 | - Two examples (resnet50 and yolov8) published to demonstrate the usage of these APIs to run end-to-end models on the NPU
367 | - Bug fixes for ONNXRT EP to support customers’ models
368 |
369 | Misc
370 | ====
371 |
372 | - Contains mitigation for the following CVEs: CVE-2024-21974, CVE-2024-21975, CVE-2024-21976
373 |
374 | *************
375 | Version 1.0.1
376 | *************
377 |
378 | - Minor fix for Single click installation without given env name.
379 | - Perform improvement in the NPU driver.
380 | - Bug fix in elementwise subtraction in the compiler.
381 | - Runtime stability fixes for minor corner cases.
382 | - Quantizer update to resolve performance drop with default settings.
383 |
384 | ***********
385 | Version 1.0
386 | ***********
387 | Quantizer
388 | =========
389 |
390 | - ONNX Quantizer
391 |
392 | - Support for ONNXRuntime 1.16.
393 | - Support for the Cross-Layer-Equalization (CLE) algorithm in quantization, which can balance the weights of consecutive Conv nodes to make it more quantize-friendly in per-tensor quantization.
394 | - Support for mixed precision quantization including UINT16/INT16/UINT32/INT32/FLOAT16/BFLOAT16, and support asymmetric quantization for BFLOAT16.
395 | - Support for the MinMSE method for INT16/UINT16/INT32/UINT32 quantization.
396 | - Support for quantization using the INT16 scale.
397 | - Support for unsigned ReLU in symmetric activation configuration.
398 | - Support for converting Float16 to Float32 during quantization.
399 | - Support for converting NCHW model to NHWC model during quantization.
400 | - Support for two more modes for MinMSE for better accuracy. The "All" mode computes the scales with all batches while the "MostCommon" mode computes the scale for each batch and uses the most common scales.
401 | - Support for the quantization of more operations:
402 |
403 | - PReLU, Sub, Max, DepthToSpace, SpaceToDepth, Slice, InstanceNormalization, and LpNormalization.
404 | - Non-4D ReduceMean.
405 | - Leakyrelu with arbitrary alpha.
406 | - Split by converting it to Slice.
407 |
408 | - Support for op fusing of InstanceNormalization and L2Normalization in NPU workflow.
409 | - Support for converting Clip to ReLU when the minimal value is 0.
410 | - Updated shift_bias, shift_read, and shift_write constraints in the NPU workflow and added an option "IPULimitationCheck" to disable it.
411 | - Support for disabling the op fusing of Conv + LeakyReLU/PReLU in the NPU workflow.
412 | - Support for logging for quantization configurations and summary information.
413 | - Support for removing initializer from input to support models converted from old version pytorch where weights are stored as inputs.
414 | - Added a recommended configuration for the IPU_Transformer platform.
415 | - New utilities:
416 |
417 | - Tool for converting the float16 model to the float32 model.
418 | - Tool for converting the NCHW model to the NHWC model.
419 | - Tool for quantized models with random input.
420 |
421 | - Three examples for quantization models from Timm, Torchvision, and ONNXRuntime modelzoo respectively.
422 | - Bugfixes:
423 |
424 | - Fix a bug that weights are quantized with the "NonOverflow" method when using the "MinMSE" method.
425 |
426 | - Pytorch Quantizer
427 |
428 | - Support of some operations quantization in quantizer: inplace div, inplace sub
429 | - Log and document enhancement to emphasize fast-finetune
430 | - Timm models quantization script example
431 | - Bug fix for operators: clamp and prelu
432 | - QAT Support quantization of operations with multiple outputs
433 | - QAT EOU enhancements: significantly reduces the need for network modifications
434 | - QAT ONNX exporting enhancements: support more configurations
435 | - New QAT examples
436 |
437 | - TF2 Quantizer
438 |
439 | - Support for Tensorflow 2.11 and 2.12.
440 | - Support for the 'tf.linalg.matmul' operator.
441 | - Updated shift_bias constraints for NPU workflow.
442 | - Support for dumping models containing operations with multiple outputs.
443 | - Added an example of a sequential model.
444 | - Bugfixes:
445 |
446 | - Fix a bug that Hardsigmoid and Hardswish are not mapped to DPU without Batch Normalization.
447 | - Fix a bug when both align_pool and align_concat are used simultaneously.
448 | - Fix a bug in the sequential model when a layer has multiple consumers.
449 |
450 | - TF1 Quantizer
451 |
452 | - Update shift_bias constraints for NPU workflow.
453 | - Bugfixes:
454 |
455 | - Fix a bug in fast_finetune when the 'input_node' and 'quant_node' are inconsistent.
456 | - Fix a bug that AddV2 op identified as BiasAdd.
457 | - Fix a bug when the data type of the concat op is not float.
458 | - Fix a bug in split_large_kernel_pool when the stride is not equal to 1.
459 |
460 | ONNXRuntime Execution Provider
461 | ==============================
462 |
463 | - Support new OPs, such as PRelu, ReduceSum, LpNormlization, DepthToSpace(DCR).
464 | - Increase the percentage of model operators performed on the NPU.
465 | - Fixed some issues causing model operators allocation to CPU.
466 | - Improved report summary
467 | - Support the encryption of the VOE cache
468 | - End-2-End Application support on NPU
469 |
470 | - Enable running pre/post/custom ops on NPU, utilizing ONNX feature of E2E extensions.
471 | - Two examples published for yolov8 and resnet50, in which preprocessing custom op is added and runs on NPU.
472 |
473 | - Performance: latency improves by up to 18% and power savings by up to 35% by additionally running preprocessing on NPU apart from inference.
474 | - Multiple NPU overlays support
475 |
476 | - VOE configuration that supports both CNN-centric and GEMM-centric NPU overlays.
477 | - Increases number of ops that run on NPU, especially for models which have both GEMM and CNN ops.
478 | - Examples published for use with some of the vision transformer models.
479 |
480 | NPU and Compiler
481 | ==============================
482 |
483 | - New operators support
484 |
485 | - Global average pooling with large spatial dimensions
486 | - Single Activation (no fusion with conv2d, e.g. relu/single alpha PRelu)
487 |
488 | - Operator support enhancement
489 |
490 | - Enlarge the width dimension support range for depthwise-conv2d
491 | - Support more generic broadcast for element-wise like operator
492 | - Support output channel not aligned with 4B GStiling
493 | - Support Mul and LeakyRelu fusion
494 | - Concatenation’s redundant input elimination
495 | - Channel Augmentation for conv2d (3x3, stride=2)
496 |
497 | - Performance optimization
498 |
499 | - PDI partition refine to reduce the overhead for PDI swap
500 | - Enabled cost model for some specific models
501 |
502 | - Fixed asynchronous error in multiple thread scenario
503 | - Fixed known issue on tanh and transpose-conv2d hang issue
504 |
505 | Known Issues
506 | ==============================
507 |
508 | - Support for multiple applications is limited to up to eight
509 | - Windows Studio Effects should be disabled when using the Latency profile. To disable Windows Studio Effects, open **Settings > Bluetooth & devices > Camera**, select your primary camera, and then disable all camera effects.
510 |
511 |
512 |
513 | ***********
514 | Version 0.9
515 | ***********
516 |
517 | Quantizer
518 | =========
519 |
520 | - Pytorch Quantizer
521 |
522 | - Dict input/output support for model forward function
523 | - Keywords argument support for model forward function
524 | - Matmul subroutine quantization support
525 | - Support of some operations in quantizer: softmax, div, exp, clamp
526 | - Support quantization of some non-standard conv2d.
527 |
528 |
529 | - ONNX Quantizer
530 |
531 | - Add support for Float16 and BFloat16 quantization.
532 | - Add C++ kernels for customized QuantizeLinear and DequantizeLinaer operations.
533 | - Support saving quantizer version info to the quantized models' producer field.
534 | - Support conversion of ReduceMean to AvgPool in NPU workflow.
535 | - Support conversion of BatchNorm to Conv in NPU workflow.
536 | - Support optimization of large kernel GlobalAvgPool and AvgPool operations in NPU workflow.
537 | - Supports hardware constraints check and adjustment of Gemm, Add, and Mul operations in NPU workflow.
538 | - Supports quantization for LayerNormalization, HardSigmoid, Erf, Div, and Tanh for NPU.
539 |
540 | ONNXRuntime Execution Provider
541 | ==============================
542 |
543 | - Support new OPs, such as Conv1d, LayerNorm, Clip, Abs, Unsqueeze, ConvTranspose.
544 | - Support pad and depad based on NPU subgraph’s inputs and outputs.
545 | - Support for U8S8 models quantized by ONNX quantizer.
546 | - Improve report summary tools.
547 |
548 | NPU and Compiler
549 | ================
550 |
551 | - Supported exp/tanh/channel-shuffle/pixel-unshuffle/space2depth
552 | - Performance uplift of xint8 output softmax
553 | - Improve the partition messages for CPU/DPU
554 | - Improve the validation check for some operators
555 | - Accelerate the speed of compiling large models
556 | - Fix the elew/pool/dwc/reshape mismatch issue and fix the stride_slice hang issue
557 | - Fix str_w != str_h issue in Conv
558 |
559 |
560 | LLM
561 | ===
562 |
563 | - Smoothquant for OPT1.3b, 2.7b, 6.7b, 13b models.
564 | - Huggingface Optimum ORT Quantizer for ONNX and Pytorch dynamic quantizer for Pytorch
565 | - Enabled Flash attention v2 for larger prompts as a custom torch.nn.Module
566 | - Enabled all CPU ops in bfloat16 or float32 with Pytorch
567 | - int32 accumulator in AIE (previously int16)
568 | - DynamicQuantLinear op support in ONNX
569 | - Support different compute primitives for prefill/prompt and token phases
570 | - Zero copy of weights shared between different op primitives
571 | - Model saving after quantization and loading at runtime for both Pytorch and ONNX
572 | - Enabled profiling prefill/prompt and token time using local copy of OPT Model with additional timer instrumentation
573 | - Added demo mode script with greedy, stochastic and contrastive search options
574 |
575 | ASR
576 | ===
577 | - Support Whipser-tiny
578 | - All GEMMs offloaded to AIE
579 | - Improved compile time
580 | - Improved WER
581 |
582 | Known issues
583 | ============
584 |
585 | - Flow control OPs including "Loop", "If", "Reduce" not supported by VOE
586 | - Resizing OP in ONNX opset 10 or lower is not supported by VOE
587 | - Tensorflow 2.x quantizer supports models within tf.keras.model only
588 | - Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue
589 | - Running multiple concurrent models using temporal sharing on the 5x4 binary is not supported
590 | - Only batch sizes of 1 are supported
591 | - Only models with the pretrained weights setting = TRUE should be imported
592 | - Launching multiple processes on 4 1x4 binaries can cause hangs, especially when models have many sub-graphs
593 |
594 | |
595 | |
596 |
597 | ***********
598 | Version 0.8
599 | ***********
600 |
601 | Quantizer
602 | =========
603 |
604 | - Pytorch Quantizer
605 |
606 | - Pytorch 1.13 and 2.0 support
607 | - Mixed precision quantization support, supporting float32/float16/bfloat16/intx mixed quantization
608 | - Support of bit-wise accuracy cross check between quantizer and ONNX-runtime
609 | - Split and chunk operators were automatically converted to slicing
610 | - Add support for BFP data type quantization
611 | - Support of some operations in quantizer: where, less, less_equal, greater, greater_equal, not, and, or, eq, maximum, minimum, sqrt, Elu, Reduction_min, argmin
612 | - QAT supports training on multiple GPUs
613 | - QAT supports operations with multiple inputs or outputs
614 |
615 | - ONNX Quantizer
616 |
617 | - Provided Python wheel file for installation
618 | - Support OnnxRuntime 1.15
619 | - Supports setting input shapes of random data reader
620 | - Supports random data reader in the dump model function
621 | - Supports saving the S8S8 model in U8S8 format for NPU
622 | - Supports simulation of Sigmoid, Swish, Softmax, AvgPool, GlobalAvgPool, ReduceMean and LeakyRelu for NPU
623 | - Supports node fusions for NPU
624 |
625 | ONNXRuntime Execution Provider
626 | ==============================
627 |
628 | - Supports for U8S8 quantized ONNX models
629 | - Improve the function of falling back to CPU EP
630 | - Improve AIE plugin framework
631 |
632 | - Supports LLM Demo
633 | - Supports Gemm ASR
634 | - Supports E2E AIE acceleration for Pre/Post ops
635 | - Improve the easy-of-use for partition and deployment
636 | - Supports models containing subgraphs
637 | - Supports report summary about OP assignment
638 | - Supports report summary about DPU subgraphs falling back to CPU
639 | - Improve log printing and troubleshooting tools.
640 | - Upstreamed to ONNX Runtime Github repo for any data type support and bug fix
641 |
642 | NPU and Compiler
643 | ================
644 |
645 | - Extended the support range of some operators
646 |
647 | - Larger input size: conv2d, dwc
648 | - Padding mode: pad
649 | - Broadcast: add
650 | - Variant dimension (non-NHWC shape): reshape, transpose, add
651 | - Support new operators, e.g. reducemax(min/sum/avg), argmax(min)
652 | - Enhanced multi-level fusion
653 | - Performance enhancement for some operators
654 | - Add quantization information validation
655 | - Improvement in device partition
656 |
657 | - User friendly message
658 | - Target-dependency check
659 |
660 | Demos
661 | =====
662 |
663 | - New Demos link: https://account.amd.com/en/forms/downloads/ryzen-ai-software-platform-xef.html?filename=transformers_2308.zip
664 |
665 | - LLM demo with OPT-1.3B/2.7B/6.7B
666 | - Automatic speech recognition demo with Whisper-tiny
667 |
668 | Known issues
669 | ============
670 | - Flow control OPs including "Loop", "If", "Reduce" not supported by VOE
671 | - Resize OP in ONNX opset 10 or lower not supported by VOE
672 | - Tensorflow 2.x quantizer supports models within tf.keras.model only
673 | - Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue
674 | - Run multiple concurrent models by temporal sharing on the Performance optimized overlay (5x4.xclbin) is not supported
675 | - Support batch size 1 only for NPU
676 |
677 |
678 | |
679 | |
680 |
681 | ***********
682 | Version 0.7
683 | ***********
684 |
685 | Quantizer
686 | =========
687 |
688 | - Docker Containers
689 |
690 | - Provided CPU dockers for Pytorch, Tensorflow 1.x, and Tensorflow 2.x quantizer
691 | - Provided GPU Docker files to build GPU dockers
692 |
693 | - Pytorch Quantizer
694 |
695 | - Supports multiple output conversion to slicing
696 | - Enhanced transpose OP optimization
697 | - Inspector support new IP targets for NPU
698 |
699 | - ONNX Quantizer
700 |
701 | - Provided Python wheel file for installation
702 | - Supports quantizing ONNX models for NPU as a plugin for the ONNX Runtime native quantizer
703 | - Supports power-of-two quantization with both QDQ and QOP format
704 | - Supports Non-overflow and Min-MSE quantization methods
705 | - Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format.
706 |
707 | - Supports signed and unsigned configurations.
708 | - Supports symmetry and asymmetry configurations.
709 | - Supports per-tensor and per-channel configurations.
710 | - Supports bias quantization using int8 datatype for NPU.
711 | - Supports quantization parameters (scale) refinement for NPU.
712 | - Supports excluding certain operations from quantization for NPU.
713 | - Supports ONNX models larger than 2GB.
714 | - Supports using CUDAExecutionProvider for calibration in quantization
715 | - Open source and upstreamed to Microsoft Olive Github repo
716 |
717 | - TensorFlow 2.x Quantizer
718 |
719 | - Added support for exporting the quantized model ONNX format.
720 | - Added support for the keras.layers.Activation('leaky_relu')
721 |
722 | - TensorFlow 1.x Quantizer
723 |
724 | - Added support for folding Reshape and ResizeNearestNeighbor operators.
725 | - Added support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes.
726 | - Added support for quantizing Sum, StridedSlice, and Maximum operators.
727 | - Added support for setting the input shape of the model, which is useful in deploying models with undefined input shapes.
728 | - Add support for setting the opset version in exporting ONNX format
729 |
730 | ONNX Runtime Execution Provider
731 | ===============================
732 |
733 | - Vitis ONNX Runtime Execution Provider (VOE)
734 |
735 | - Supports ONNX Opset version 18, ONNX Runtime 1.16.0, and ONNX version 1.13
736 | - Supports both C++ and Python APIs(Python version 3)
737 | - Supports deploy model with other EPs
738 | - Supports falling back to CPU EP
739 | - Open source and upstreamed to ONNX Runtime Github repo
740 | - Compiler
741 |
742 | - Multiple Level op fusion
743 | - Supports the same muti-output operator like chunk split
744 | - Supports split big pooling to small pooling
745 | - Supports 2-channel writeback feature for Hard-Sigmoid and Depthwise-Convolution
746 | - Supports 1-channel GStiling
747 | - Explicit pad-fix in CPU subgraph for 4-byte alignment
748 | - Tuning the performance for multiple models
749 |
750 | NPU
751 | ===
752 |
753 | - Two configurations
754 |
755 | - Power Optimized Overlay
756 |
757 | - Suitable for smaller AI models (1x4.xclbin)
758 | - Supports spatial sharing, up to 4 concurrent AI workloads
759 |
760 | - Performance Optimized Overlay (5x4.xclbin)
761 |
762 | - Suitable for larger AI models
763 |
764 | Known issues
765 | ============
766 | - Flow control OPs including "Loop", "If", "Reduce" are not supported by VOE
767 | - Resize OP in ONNX opset 10 or lower not supported by VOE
768 | - Tensorflow 2.x quantizer supports models within tf.keras.model only
769 | - Running quantizer docker in WSL on Ryzen AI laptops may encounter OOM (Out-of-memory) issue
770 | - Run multiple concurrent models by temporal sharing on the Performance optimized overlay (5x4.xclbin) is not supported
771 |
772 |
773 |
774 |
775 | ..
776 | ------------
777 |
778 | #####################################
779 | License
780 | #####################################
781 |
782 | Ryzen AI is licensed under `MIT License `_ . Refer to the `LICENSE File `_ for the full license text and copyright notice.
783 |
--------------------------------------------------------------------------------
/docs/ryzen_ai_libraries.rst:
--------------------------------------------------------------------------------
1 | .. Copyright (C) 2023-2025 Advanced Micro Devices, Inc. All rights reserved.
2 |
3 | #####################
4 | Ryzen AI CVML library
5 | #####################
6 |
7 | The Ryzen AI CVML libraries build on top of the Ryzen AI drivers and execution infrastructure to provide powerful AI capabilities to C++ applications without having to worry about training specific AI models and integrating them to the Ryzen AI framework.
8 |
9 | Each Ryzen AI CVML library feature offers a simple C++ application programming interface (API) that can be easily incorporated into existing applications.
10 |
11 | The Ryzen AI CVML library is distributed through the RyzenAI-SW Github repository: https://github.com/amd/RyzenAI-SW/tree/main/Ryzen-AI-CVML-Library
12 |
13 | *************
14 | Prerequisites
15 | *************
16 | Ensure that the following software tools/packages are installed on the development system.
17 |
18 | 1. Visual Studio 2022 Community edition or newer, ensure “Desktop Development with C++” is installed
19 | 2. Cmake (version >= 3.18)
20 | 3. OpenCV (version=4.8.1 or newer)
21 |
22 | **************************************************
23 | Building sample applications
24 | **************************************************
25 | This section describes the steps to build Ryzen AI CVML library sample applications.
26 |
27 | Navigate to the folder containing Ryzen AI samples
28 | ==================================================
29 | Download the Ryzen AI CVML sources, and go to the 'samples' sub-folder of the library. ::
30 |
31 | git clone https://github.com/amd/RyzenAI-SW.git -b main --depth-1
32 | chdir RyzenAI-SW\Ryzen-AI-CVML-Library\samples
33 |
34 | OpenCV libraries
35 | ================
36 | Ryzen AI CVML library samples make use of OpenCV, so set an environment variable to let the build scripts know where to find OpenCV. ::
37 |
38 | set OPENCV_INSTALL_ROOT=
39 |
40 | Build Instructions
41 | ==================
42 | Create a build folder and use CMAKE to build the sample(s). ::
43 |
44 | mkdir build-samples
45 | cmake -S %CD% -B %CD%\build-samples -DOPENCV_INSTALL_ROOT=%OPENCV_INSTALL_ROOT%
46 | cmake --build %CD%\build-samples --config Release
47 |
48 | The compiled sample application(s) will be placed in the various build-samples\\Release folder(s) under the 'samples' folder.
49 |
50 | *************************************************
51 | Running sample applications
52 | *************************************************
53 | This section describes how to execute Ryzen AI CVML library sample applications.
54 |
55 | Update the console and/or system PATH
56 | =====================================
57 | Ryzen AI CVML library applications need to be able to find the library files. One way to do this is to add the location of the libraries to the system or console PATH environment variable.
58 |
59 | In this example, the location of OpenCV's runtime libraries is also added to the PATH environment variable. ::
60 |
61 | set PATH=%PATH%;\windows
62 | set PATH=%PATH%;%OPENCV_INSTALL_ROOT%\x64\vc16\bin
63 |
64 | Adjust the aforementioned commands to match the actual location of Ryzen AI and OpenCV libraries, respectively.
65 |
66 | Select an input source/image/video
67 | ==================================
68 | Ryzen AI CVML library samples can accept a variety of image and video input formats, or even open the default camera on the system if "0" is specified as an input.
69 |
70 | In this example, a publicly available video file is used for the application's input. ::
71 |
72 | curl -o dancing.mp4 https://videos.pexels.com/video-files/4540332/4540332-hd_1920_1080_25fps.mp4
73 |
74 | Execute the sample application
75 | ==============================
76 | Finally, the previously built sample application can be executed with the selected input source. ::
77 |
78 | build-samples\cvml-sample-depth-estimation\Release\cvml-sample-depth-estimation.exe -i dancing.mp4
79 |
80 | ..
81 | ------------
82 |
83 | #####################################
84 | License
85 | #####################################
86 |
87 | Ryzen AI is licensed under MIT License. Refer to the LICENSE file for the full license text and copyright notice.
88 |
--------------------------------------------------------------------------------
/docs/sphinx/requirements.in:
--------------------------------------------------------------------------------
1 | rocm-docs-core==0.24.2
2 |
--------------------------------------------------------------------------------
/docs/sphinx/requirements.txt:
--------------------------------------------------------------------------------
1 | #
2 | # This file is autogenerated by pip-compile with Python 3.8
3 | # by the following command:
4 | #
5 | # pip-compile --resolver=backtracking requirements.in
6 | #
7 | accessible-pygments==0.0.4
8 | # via pydata-sphinx-theme
9 | alabaster==0.7.13
10 | # via sphinx
11 | babel==2.12.1
12 | # via
13 | # pydata-sphinx-theme
14 | # sphinx
15 | beautifulsoup4==4.12.2
16 | # via pydata-sphinx-theme
17 | breathe==4.35.0
18 | # via rocm-docs-core
19 | certifi==2023.7.22
20 | # via requests
21 | cffi==1.15.1
22 | # via
23 | # cryptography
24 | # pynacl
25 | charset-normalizer==3.2.0
26 | # via requests
27 | click==8.1.7
28 | # via sphinx-external-toc
29 | cryptography==41.0.4
30 | # via pyjwt
31 | deprecated==1.2.14
32 | # via pygithub
33 | docutils==0.19
34 | # via
35 | # breathe
36 | # myst-parser
37 | # pydata-sphinx-theme
38 | # sphinx
39 | fastjsonschema==2.18.0
40 | # via rocm-docs-core
41 | fspath==20230629
42 | # via linuxdoc
43 | gitdb==4.0.10
44 | # via gitpython
45 | gitpython==3.1.37
46 | # via rocm-docs-core
47 | idna==3.4
48 | # via requests
49 | imagesize==1.4.1
50 | # via sphinx
51 | importlib-metadata==6.8.0
52 | # via sphinx
53 | importlib-resources==6.1.0
54 | # via rocm-docs-core
55 | jinja2==3.1.2
56 | # via
57 | # myst-parser
58 | # sphinx
59 | linuxdoc==20240924
60 | # via sphinx
61 | markdown-it-py==2.2.0
62 | # via
63 | # mdit-py-plugins
64 | # myst-parser
65 | markupsafe==2.1.3
66 | # via jinja2
67 | mdit-py-plugins==0.3.5
68 | # via myst-parser
69 | mdurl==0.1.2
70 | # via markdown-it-py
71 | myst-parser==1.0.0
72 | # via rocm-docs-core
73 | packaging==23.1
74 | # via
75 | # pydata-sphinx-theme
76 | # sphinx
77 | pycparser==2.21
78 | # via cffi
79 | pydata-sphinx-theme==0.14.1
80 | # via
81 | # rocm-docs-core
82 | # sphinx-book-theme
83 | pygithub==1.59.1
84 | # via rocm-docs-core
85 | pygments==2.16.1
86 | # via
87 | # accessible-pygments
88 | # pydata-sphinx-theme
89 | # sphinx
90 | pyjwt[crypto]==2.8.0
91 | # via pygithub
92 | pynacl==1.5.0
93 | # via pygithub
94 | pytz==2023.3.post1
95 | # via babel
96 | pyyaml==6.0.1
97 | # via
98 | # myst-parser
99 | # rocm-docs-core
100 | # sphinx-external-toc
101 | requests==2.31.0
102 | # via
103 | # pygithub
104 | # sphinx
105 | rocm-docs-core==0.24.2
106 | # via -r requirements.in
107 | six==1.17.0
108 | # via linuxdoc
109 | smmap==5.0.1
110 | # via gitdb
111 | snowballstemmer==2.2.0
112 | # via sphinx
113 | soupsieve==2.5
114 | # via beautifulsoup4
115 | sphinx==5.3.0
116 | # via
117 | # breathe
118 | # myst-parser
119 | # pydata-sphinx-theme
120 | # rocm-docs-core
121 | # sphinx-book-theme
122 | # sphinx-copybutton
123 | # sphinx-design
124 | # sphinx-external-toc
125 | # sphinx-notfound-page
126 | sphinx-book-theme==1.0.1
127 | # via rocm-docs-core
128 | sphinx-copybutton==0.5.2
129 | # via rocm-docs-core
130 | sphinx-design==0.5.0
131 | # via rocm-docs-core
132 | sphinx-external-toc==0.3.1
133 | # via rocm-docs-core
134 | sphinx-notfound-page==1.0.0
135 | # via rocm-docs-core
136 | sphinxcontrib-applehelp==1.0.4
137 | # via sphinx
138 | sphinxcontrib-devhelp==1.0.2
139 | # via sphinx
140 | sphinxcontrib-htmlhelp==2.0.1
141 | # via sphinx
142 | sphinxcontrib-jsmath==1.0.1
143 | # via sphinx
144 | sphinxcontrib-qthelp==1.0.3
145 | # via sphinx
146 | sphinxcontrib-serializinghtml==1.1.5
147 | # via sphinx
148 | typing-extensions==4.8.0
149 | # via pydata-sphinx-theme
150 | urllib3==2.0.5
151 | # via requests
152 | wrapt==1.15.0
153 | # via deprecated
154 | zipp==3.17.0
155 | # via
156 | # importlib-metadata
157 | # importlib-resources
158 |
--------------------------------------------------------------------------------
/docs/xrt_smi.rst:
--------------------------------------------------------------------------------
1 | ..
2 | .. Heading guidelines
3 | ..
4 | .. # with overline, for parts
5 | .. * with overline, for chapters
6 | .. =, for sections
7 | .. -, for subsections
8 | .. ^, for subsubsections
9 | .. “, for paragraphs
10 | ..
11 |
12 | .. include:: /icons.txt
13 |
14 | ########################
15 | NPU Management Interface
16 | ########################
17 |
18 | *******************************
19 | Introduction
20 | *******************************
21 |
22 | The ``xrt-smi`` utility is a command-line interface to monitor and manage the NPU integrated AMD CPUs.
23 |
24 | It is installed in ``C:\Windows\System32\AMD`` and it can be directly invoked from within the conda environment created by the Ryzen AI Software installer.
25 |
26 | The ``xrt-smi`` utility currently supports three primary commands:
27 |
28 | - ``examine`` - generates reports related to the state of the AI PC and the NPU.
29 | - ``validate`` - executes sanity tests on the NPU.
30 | - ``configure`` - manages the performance level of the NPU.
31 |
32 | By default, the output of the ``xrt-smi examine`` and ``xrt-smi validate`` commands goes to the terminal. It can also be written to file in JSON format as shown below:
33 |
34 | .. code-block:: shell
35 |
36 | xrt-smi examine -f JSON -o
37 |
38 | The utility also support the following options which can be used with any command:
39 |
40 | - ``--help`` - help to use xrt-smi or one of its sub commands
41 | - ``--version`` - report the version of XRT, driver and firmware
42 | - ``--verbose`` - turn on verbosity
43 | - ``--batch`` - enable batch mode (disables escape characters)
44 | - ``--force`` - when possible, force an operation. Eg - overwrite a file in examine or validate
45 |
46 | The ``xrt-smi`` utility requires `Microsoft Visual C++ Redistributable `_ (version 2015 to 2022) to be installed.
47 |
48 |
49 | *******************************
50 | Overview of Key Commands
51 | *******************************
52 |
53 | .. list-table::
54 | :widths: 35 65
55 | :header-rows: 1
56 |
57 | * - Command
58 | - Description
59 | * - examine
60 | - system config, device name
61 | * - examine --report platform
62 | - performance mode, power
63 | * - examine --report aie-partitions
64 | - hw contexts
65 | * - validate --run latency
66 | - latency test
67 | * - validate --run throughput
68 | - throughput test
69 | * - validate --run gemm
70 | - INT8 GEMM test TOPS. This is a full array test and it should not be run while another workload is running. **NOTE**: This command is not supported on PHX and HPT NPUs.
71 | * - configure --pmode
72 | - set performance mode
73 |
74 |
75 | |memo| **NOTE**: The ``examine --report aie-partition`` report runtime information. These commands should be used when a model is running on the NPU. You can run these commands in a loop to see live updates of the reported data.
76 |
77 |
78 | *******************************
79 | xrt-smi examine
80 | *******************************
81 |
82 | System Information
83 | ==================
84 |
85 | Reports OS/system information of the AI PC and confirm the presence of the AMD NPU.
86 |
87 | .. code-block:: shell
88 |
89 | xrt-smi examine
90 |
91 | Sample Command Line Output::
92 |
93 |
94 | System Configuration
95 | OS Name : Windows NT
96 | Release : 26100
97 | Machine : x86_64
98 | CPU Cores : 20
99 | Memory : 32063 MB
100 | Distribution : Microsoft Windows 11 Enterprise
101 | Model : HP OmniBook Ultra Laptop 14-fd0xxx
102 | BIOS Vendor : HP
103 | BIOS Version : W81 Ver. 01.01.14
104 |
105 | XRT
106 | Version : 2.19.0
107 | Branch : HEAD
108 | Hash : f62307ddadf65b54acbed420a9f0edc415fefafc
109 | Hash Date : 2025-03-12 16:34:48
110 | NPU Driver Version : 32.0.203.257
111 | NPU Firmware Version : 1.0.7.97
112 |
113 | Device(s) Present
114 | |BDF |Name |
115 | |----------------|-----------|
116 | |[00c4:00:01.1] |NPU Strix |
117 |
118 |
119 | Sample JSON Output::
120 |
121 |
122 | {
123 | "schema_version": {
124 | "schema": "JSON",
125 | "creation_date": "Tue Mar 18 22:43:38 2025 GMT"
126 | },
127 | "system": {
128 | "host": {
129 | "os": {
130 | "sysname": "Windows NT",
131 | "release": "26100",
132 | "machine": "x86_64",
133 | "distribution": "Microsoft Windows 11 Enterprise",
134 | "model": "HP OmniBook Ultra Laptop 14-fd0xxx",
135 | "hostname": "XCOUDAYD02",
136 | "memory_bytes": "0x7d3f62000",
137 | "cores": "20",
138 | "bios_vendor": "HP",
139 | "bios_version": "W81 Ver. 01.01.14"
140 | },
141 | "xrt": {
142 | "version": "2.19.0",
143 | "branch": "HEAD",
144 | "hash": "f62307ddadf65b54acbed420a9f0edc415fefafc",
145 | "build_date": "2025-03-12 16:34:48",
146 | "drivers": [
147 | {
148 | "name": "NPU Driver",
149 | "version": "32.0.203.257"
150 | }
151 | ]
152 | },
153 | "devices": [
154 | {
155 | "bdf": "00c4:00:01.1",
156 | "device_class": "Ryzen",
157 | "name": "NPU Strix",
158 | "id": "0x0",
159 | "firmware_version": "1.0.7.97",
160 | "instance": "mgmt(inst=1)",
161 | "is_ready": "true"
162 | }
163 | ]
164 | }
165 | }
166 | }
167 |
168 |
169 |
170 |
171 | Platform Information
172 | ====================
173 |
174 | Reports more detailed information about the NPU, such as the performance mode and power consumption.
175 |
176 | .. code-block:: shell
177 |
178 | xrt-smi examine --report platform
179 |
180 | Sample Command Line Output::
181 |
182 | --------------------------
183 | [00c5:00:01.1] : NPU Strix
184 | --------------------------
185 | Platform
186 | Name : NPU Strix
187 | Performance Mode : Default
188 |
189 | Power : 1.277 Watts
190 |
191 | |memo| **NOTE**: Power reporting is not supported on PHX and HPT NPUs. Power reporting is only available on STX devices and onwards.
192 |
193 | NPU Partitions
194 | ==============
195 |
196 | Reports details about the NPU partition and column occupancy on the NPU.
197 |
198 | .. code-block:: shell
199 |
200 | xrt-smi examine --report aie-partitions
201 |
202 | Sample Command Line Output::
203 |
204 | --------------------------
205 | [00c5:00:01.1] : NPU Strix
206 | --------------------------
207 | AIE Partitions
208 | Partition Index: 0
209 | Columns: [0, 1, 2, 3]
210 | HW Contexts:
211 | |PID |Ctx ID |Status |Instr BO |Sub |Compl |Migr |Err |Prio |GOPS |EGOPS |FPS |Latency |
212 | |-------|--------|--------|----------|-----|-------|------|-----|--------|------|-------|-----|---------|
213 | |20696 |0 |Active |64 KB |57 |56 |0 |0 |Normal |0 |0 |0 |0 |
214 |
215 |
216 | NPU Context Bindings
217 | ====================
218 |
219 | Reports details about the columns to NPU HW context binding.
220 |
221 | .. code-block:: shell
222 |
223 | xrt-smi examine --report aie-partitions --verbose
224 |
225 | Sample Command Line Output::
226 |
227 | Verbose: Enabling Verbosity
228 | Verbose: SubCommand: examine
229 |
230 | --------------------------
231 | [00c5:00:01.1] : NPU Strix
232 | --------------------------
233 | AIE Partitions
234 | Partition Index: 0
235 | Columns: [0, 1, 2, 3]
236 | HW Contexts:
237 | |PID |Ctx ID |Status |Instr BO |Sub |Compl |Migr |Err |Prio |GOPS |EGOPS |FPS |Latency |
238 | |-------|--------|--------|----------|-----|-------|------|-----|--------|------|-------|-----|---------|
239 | |20696 |0 |Active |64 KB |57 |56 |0 |0 |Normal |0 |0 |0 |0 |
240 |
241 | AIE Columns
242 | |Column ||HW Context Slot |
243 | |--------||-----------------|
244 | |0 ||[1] |
245 | |1 ||[1] |
246 | |2 ||[1] |
247 | |3 ||[1] |
248 |
249 |
250 |
251 |
252 |
253 | *******************************
254 | xrt-smi validate
255 | *******************************
256 |
257 | Executing a Sanity Check on the NPU
258 | ===================================
259 |
260 | Runs a set of built-in NPU sanity tests which includes latency, throughput, and gemm.
261 |
262 | Note: All tests are run in performance mode.
263 |
264 | - ``latency`` - this test executes a no-op control code and measures the end-to-end latency on all columns
265 | - ``throughput`` - this test loops back the input data from DDR through a MM2S Shim DMA channel back to DDR through a S2MM Shim DMA channel. The data movement within the AIE array follows the lowest latency path i.e. movement is restricted to just the Shim tile.
266 | - ``gemm`` - An INT8 GeMM kernel is deployed on all 32 cores by the application. Each core is storing cycle count in the core data memory. The cycle count is read by the firmware. The TOPS application uses the "XBUTIL" tool to capture the IPUHCLK while the workload runs. Once all cores are executed, the cycle count from all cores will be synced back to the host. Finally, the application uses IPUHCLK, core cycle count, and GeMM kernel size to calculate the TOPS. This is a full array test and it should not be run while another workload is running. **NOTE**: This command is not supported on PHX and HPT NPUs.
267 | - ``all`` - All applicable validate tests will be executed (default)
268 |
269 |
270 | .. code-block:: shell
271 |
272 | xrt-smi validate --run all
273 |
274 | |memo| **NOTE**: Some sanity checks may fail if other applications (for example MEP, Microsoft Experience Package) are also using the NPU.
275 |
276 | Sample Command Line Output::
277 |
278 |
279 | Validate Device : [00c4:00:01.1]
280 | Platform : NPU Strix
281 | Power Mode : Performance
282 | -------------------------------------------------------------------------------
283 | Test 1 [00c4:00:01.1] : gemm
284 | Details : TOPS: 51.3
285 | Test Status : [PASSED]
286 | -------------------------------------------------------------------------------
287 | Test 2 [00c4:00:01.1] : latency
288 | Details : Average latency: 84.2 us
289 | Test Status : [PASSED]
290 | -------------------------------------------------------------------------------
291 | Test 3 [00c4:00:01.1] : throughput
292 | Details : Average throughput: 59891.0 ops
293 | Test Status : [PASSED]
294 | -------------------------------------------------------------------------------
295 | Validation completed. Please run the command '--verbose' option for more details
296 |
297 | *******************************
298 | xrt-smi configure
299 | *******************************
300 |
301 | Managing the Performance Level of the NPU
302 | =========================================
303 |
304 | To set the performance level of the NPU, you can choose from the following modes: powersaver, balanced, performance, or default. Use the command below:
305 |
306 | .. code-block:: shell
307 |
308 | xrt-smi configure --pmode
309 |
310 | - ``default`` - adapts to the Windows Power Mode setting, which can be adjusted under System -> Power & battery -> Power mode. For finer control of the NPU settings, it is recommended to use the xrt-smi mode setting, which overrides the Windows Power mode and ensures optimal results.
311 | - ``powersaver`` - configures the NPU to prioritize power saving, preserving laptop battery life.
312 | - ``balanced`` - configures the NPU to provide a compromise between power saving and performance.
313 | - ``performance`` - configures the NPU to prioritize performance, consuming more power.
314 | - ``turbo`` - configures the NPU for maximum performance performance, requires AC power to be plugged in otherwise uses ``performance`` mode.
315 |
316 | Example: Setting the NPU to high-performance mode
317 |
318 | .. code-block:: shell
319 |
320 | xrt-smi configure --pmode performance
321 |
322 | To check the current performance level, use the following command:
323 |
324 | .. code-block:: shell
325 |
326 | xrt-smi examine --report platform
327 |
328 |
--------------------------------------------------------------------------------