├── README.md ├── nv.chart.py ├── nv.conf └── python_modules └── pynvml.py /README.md: -------------------------------------------------------------------------------- 1 | # README # 2 | 3 | ## Overview ## 4 | 5 | 6 | - [About](#about) 7 | - [Requirements](#requirements) 8 | - [Installation](#installation) 9 | - [General](#general) 10 | - [Installation Example](#installation-example) 11 | - [Options](#options) 12 | - [Memory Clock Factor](#memory-clock-factor) 13 | - [Legacy Mode](#legacy-mode) 14 | - [Charts](#charts) 15 | - [Known Bugs/Issues](#known-bugsissues) 16 | - [FAQ](#faq) 17 | - [License](#license) 18 | - [Version History](#version-history) 19 | - [Contact](#contact) 20 | - [Screenshots](#screenshots) 21 | 22 | 23 | 24 | ## About ## 25 | 26 | [NetData](https://github.com/firehol/netdata/) plugin that polls Nvidia GPU data. 27 | 28 | ![nv plugin screenshot](http://semper.space/netdata_nv/screenshot00.png "Netdata nv plugin") 29 | 30 | 31 | ## Requirements ## 32 | 33 | * Nvidia driver installed (this plugin uses the NVML library) 34 | * nvidia-ml-py Python package (Python NVML wrapper) 35 | 36 | 37 | ## Installation ## 38 | 39 | ### General ### 40 | The path to the NetData installation refered to in this readme is `/usr/libexec/netdata/`. For some NetData installations the path may vary, e.g. `/usr/lib/x86_64-linux-gnu/netdata`. 41 | 42 | Install the nvidia-ml-py Python package via `pip install nvidia-ml-py` or copy the `pynvml.py` file from the "nvidia-ml-py" package (https://pypi.python.org/pypi/nvidia-ml-py) to `/usr/libexec/netdata/python.d/python_modules/`. 43 | 44 | **IMPORTANT**: Version 7.352.0 of the nvidia-ml-py package does not work with Python >=3.2 -> see known bugs section of this readme. 45 | 46 | With default NetData installation copy the nv.chart.py script to `/usr/libexec/netdata/python.d/` and the nv.conf config file to `/etc/netdata/python.d/`. 47 | 48 | Then restart NetData to activate the plugin. 49 | 50 | To disable the nv plugin, edit `/etc/netdata/python.d.conf` and add `nv: no`. 51 | 52 | 53 | ### Installation Example ### 54 | 55 | Example for standard NetData installation under Ubuntu, working with Python >=2.6 and >=3.2: 56 | 57 | ``` 58 | cd /tmp/ 59 | 60 | git clone https://github.com/coraxx/netdata_nv_plugin --depth 1 61 | 62 | sudo cp netdata_nv_plugin/nv.chart.py /usr/libexec/netdata/python.d/ 63 | 64 | sudo cp netdata_nv_plugin/python_modules/pynvml.py /usr/libexec/netdata/python.d/python_modules/ 65 | 66 | sudo cp netdata_nv_plugin/nv.conf /etc/netdata/python.d/ 67 | ``` 68 | 69 | 70 | ## Options ## 71 | 72 | Options are set in the `nv.conf` file. 73 | ### Memory Clock Factor ### 74 | 75 | Set `nvMemFactor: 2` in the `nv.conf` file if you want to display "the real clock speed". This is due to [DDR RAM](https://en.wikipedia.org/wiki/DDR_SDRAM#Double_data_rate_.28DDR.29_SDRAM_specification). Default is `1`. 76 | 77 | 78 | ### Legacy Mode ### 79 | 80 | With older GPUs like my Nvidia GeForce 9600m gt, the load and clock frequencies cannot be read by the NVML lib. Only temperature and memory usage is displayed. 81 | 82 | Set `legacy: True` in the `nv.conf` file to poll GPU and memory load/frequency via the nvidia-settings application (also installed with the Nvidia driver). 83 | 84 | *Tested under Ubuntu 16.04* 85 | 86 | **IMPORTANT:** This legacy mode only works with a running X session, so this will **not** work on headless clients. Also when the X session is not hosted by root, which is usually the case when running e.g. Ubuntu, you **must** allow `root` to connect to the X session. You can do that by executing this command in a terminal as the user of the X session (i.e. the user you are logged into your e.g. GNOME desktop environment): 87 | 88 | `xhost +local:root` 89 | 90 | Don't forget to restart NetData afterwards with e.g. `sudo service netdata restart` 91 | 92 | For the sake of completeness: If you want to disable the root access to the X session again, execute: 93 | 94 | `xhost -local:root` 95 | 96 | 97 | 98 | ## Charts ## 99 | 100 | Depending on the Graphics card, these informations are extracted: 101 | 102 | - GPU, memory, encoder, and decoder load 103 | - Free and used memory 104 | - ECC errors (only for cards equipped with ECC memory e.g. Quadro cards) 105 | - Temperature 106 | - Fan speed 107 | - Clock frequency for GPU core, SM and memory 108 | 109 | Readouts for units (S-class systems) are integrated but not tested yet. These add: 110 | 111 | - intake, exhaust and board temperatures 112 | - PSU current, voltage and power 113 | - Fan rpm 114 | 115 | 116 | ## Known Bugs/Issues ## 117 | 118 | Bugs: 119 | * No known bugs at the moment 120 | 121 | Issues: 122 | * While making this plugin fit for Python 3 I encountered an old Python 2 style `print` in nvidia-ml-py's `pynvml.py` file. Line 1671 `print c_count.value` must be `print(c_count.value)` to be importable under Python 3! 123 | You can do this fix on your own or use the fixed version from this repo. 124 | 125 | 126 | ## FAQ ## 127 | 128 | > Why does only one of my two whatever values show up in the whatever graph/chart? 129 | 130 | Probably because the one not drawn in the graph is zero. For example with two graphics cards where only one is under load, chances are that NetData only draws the one with load > 0%. As soon as the other one also returns values > 0 it will be drawn as well. 131 | 132 | 133 | ## License ## 134 | 135 | The MIT License (MIT) 136 | Copyright (c) 2016 Jan Arnold 137 | 138 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 139 | 140 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 141 | 142 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 143 | 144 | ## Version History ## 145 | 146 | v0.6: 147 | * added encoder and decoder utilization 148 | * fixed memError/eccCounter assignment issue 149 | 150 | v0.5: 151 | * fixed legacy mode (consult [Legacy Mode](#legacy-mode) for detailed information on usage) 152 | 153 | v0.4: 154 | * debug, info and error message cleanup (still a lot of debug messages since legacy mode is not working yet) 155 | * more detailed error reporting for pynvml import 156 | * added nvMemFactor setting to config 157 | * changed default chart order 158 | 159 | v0.3: 160 | * potential bugs fixed 161 | * fit for Python >=2.6 and >=3.2 (see known bugs section) 162 | 163 | v0.2: 164 | * code cleanup (thanks to @paulfantom for the feedback) 165 | 166 | v0.1: 167 | * initial release 168 | 169 | 170 | ## Contact ## 171 | 172 | Who do I talk to? 173 | 174 | * Repo owner or admin 175 | * Other community or team contact 176 | 177 | 178 | ## Screenshots ## 179 | 180 | * Nvidia GeForce GTX 980 equipped PC running Ubuntu 16.04: 181 | 182 | ![nv plugin screenshot 1](http://semper.space/netdata_nv/screenshot01.png "Netdata nv plugin") 183 | 184 | 185 | * Nvidia GeForce 9600m gt/9400m with legacy mode on a MacBook Pro late 2008 running Ubuntu 16.04: 186 | 187 | ![nv plugin screenshot 2](http://semper.space/netdata_nv/screenshot02.png "Netdata nv plugin") 188 | 189 | -------------------------------------------------------------------------------- /nv.chart.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | """ 4 | NetData plugin for Nvidia GPU stats. 5 | 6 | Requirements: 7 | # - Nvidia driver installed (this plugin uses the NVML library) 8 | # - nvidia-ml-py Python package (Python NVML wrapper) installed or copy the 'pynvml.py' file 9 | # from the 'nvidia-ml-py' package (https://pypi.python.org/pypi/nvidia-ml-py/7.352.0) to 10 | # '/usr/libexec/netdata/python.d/python_modules/'. For use with Python >=3.2 please se known bugs 11 | # in the README file. 12 | 13 | 14 | The MIT License (MIT) 15 | Copyright (c) 2016 Jan Arnold 16 | 17 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated 18 | documentation files (the "Software"), to deal in the Software without restriction, including without limitation 19 | the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, 20 | and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 21 | 22 | The above copyright notice and this permission notice shall be included in all copies or substantial portions 23 | of the Software. 24 | 25 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED 26 | TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 27 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF 28 | CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 29 | DEALINGS IN THE SOFTWARE. 30 | 31 | # @Title : nv.chart 32 | # @Project : 33 | # @Description : NetData plugin for Nvidia GPU stats 34 | # @Author : Jan Arnold 35 | # @Email : jan.arnold (at) coraxx.net 36 | # @Copyright : Copyright (C) 2016 Jan Arnold 37 | # @License : MIT 38 | # @Credits : 39 | # @Maintainer : Jan Arnold 40 | # @Date : 2018/11/02 41 | # @Version : 0.6 42 | # @Status : stable 43 | # @Usage : automatically processed by netdata 44 | # @Notes : With default NetData installation put this file under 45 | # : /usr/libexec/netdata/python.d/ 46 | # : and the config file under /etc/netdata/python.d/ 47 | # @Python_version : 2.7.12 and 3.5.2 48 | """ 49 | # ====================================================================================================================== 50 | from bases.FrameworkServices.SimpleService import SimpleService 51 | from subprocess import Popen, PIPE 52 | from re import findall 53 | try: 54 | import pynvml 55 | except Exception as e: 56 | if isinstance(e, ImportError): 57 | self.error("Please install pynvml: pip install nvidia-ml-py") 58 | if isinstance(e, SyntaxError): 59 | self.error( 60 | "Please fix line 1671 in pynvml.py file from the nvidia-ml-py package. 'print c_count.value' must be", 61 | "'print(c_count.value)' to be compatible with Python >=3.2") 62 | raise e 63 | 64 | ## Plugin settings 65 | update_every = 1 66 | priority = 60000 67 | retries = 10 68 | 69 | ORDER = ['load', 'memory', 'frequency', 'temperature', 'fan', 'ecc_errors'] 70 | 71 | CHARTS = { 72 | 'memory': { 73 | 'options': [None, 'Memory', 'MB', 'Memory', 'nv.memory', 'stacked'], 74 | 'lines': [ 75 | # generated dynamically 76 | ]}, 77 | 'load': { 78 | 'options': [None, 'Load', '%', 'Load', 'nv.load', 'line'], 79 | 'lines': [ 80 | # generated dynamically 81 | ]}, 82 | 'ecc_errors': { 83 | 'options': [None, 'ECC errors', 'counts', 'ECC', 'nv.ecc', 'line'], 84 | 'lines': [ 85 | # generated dynamically 86 | ]}, 87 | 'temperature': { 88 | 'options': [None, 'GPU temperature', 'C', 'Temperature', 'nv.temperature', 'line'], 89 | 'lines': [ 90 | # generated dynamically 91 | ]}, 92 | 'fan': { 93 | 'options': [None, 'Fan speed', '%', 'Fans', 'nv.fan', 'line'], 94 | 'lines': [ 95 | # generated dynamically 96 | ]}, 97 | 'frequency': { 98 | 'options': [None, 'Frequency', 'MHz', 'Frequency', 'nv.frequency', 'line'], 99 | 'lines': [ 100 | # generated dynamically 101 | ]} 102 | } 103 | 104 | 105 | class Service(SimpleService): 106 | def __init__(self, configuration=None, name=None): 107 | SimpleService.__init__(self, configuration=configuration, name=name) 108 | 109 | # Chart 110 | self.order = ORDER 111 | self.definitions = CHARTS 112 | 113 | def check(self): 114 | ## Check legacy mode 115 | try: 116 | self.legacy = self.configuration['legacy'] 117 | if self.legacy == '': raise KeyError 118 | if self.legacy is True: self.info('Legacy mode set to True') 119 | except KeyError: 120 | self.legacy = False 121 | self.info("No legacy mode specified. Setting to 'False'") 122 | 123 | ## Real memory clock is double (DDR double data rate ram). Set nvMemFactor = 2 in conf for 'real' memory clock 124 | try: 125 | self.nvMemFactor = int(self.configuration['nvMemFactor']) 126 | if self.nvMemFactor == '': raise KeyError 127 | self.info("'nvMemFactor' set to:",str(self.nvMemFactor)) 128 | except Exception as e: 129 | if isinstance(e, KeyError): 130 | self.info("No 'nvMemFactor' configured. Setting to 1") 131 | else: 132 | self.error("nvMemFactor in config file is not an int. Setting 'nvMemFactor' to 1", str(e)) 133 | self.nvMemFactor = 1 134 | 135 | ## Initialize NVML 136 | try: 137 | pynvml.nvmlInit() 138 | self.info("Nvidia Driver Version:", str(pynvml.nvmlSystemGetDriverVersion())) 139 | except Exception as e: 140 | self.error("pynvml could not be initialized", str(e)) 141 | pynvml.nvmlShutdown() 142 | return False 143 | 144 | ## Get number of graphic cards 145 | try: 146 | self.unitCount = pynvml.nvmlUnitGetCount() 147 | self.deviceCount = pynvml.nvmlDeviceGetCount() 148 | self.debug("Unit count:", str(self.unitCount)) 149 | self.debug("Device count", str(self.deviceCount)) 150 | except Exception as e: 151 | self.error('Error getting number of Nvidia GPUs', str(e)) 152 | pynvml.nvmlShutdown() 153 | return False 154 | 155 | ## Get graphic card names 156 | data = self._get_data() 157 | name = '' 158 | for i in range(self.deviceCount): 159 | if i == 0: 160 | name = name + str(data["device_name_" + str(i)]) + " [{0}]".format(i) 161 | else: 162 | name = name + ' | ' + str(data["device_name_" + str(i)]) + " [{0}]".format(i) 163 | self.info('Graphics Card(s) found:', name) 164 | for chart in self.definitions: 165 | self.definitions[chart]['options'][1] = self.definitions[chart]['options'][1] + ' for ' + name 166 | ## Dynamically add lines 167 | for i in range(self.deviceCount): 168 | gpuIdx = str(i) 169 | ## Memory 170 | if data['device_mem_used_'+str(i)] is not None: 171 | self.definitions['memory']['lines'].append(['device_mem_free_' + gpuIdx, 'free [{0}]'.format(i), 'absolute', 1, 1024**2]) 172 | self.definitions['memory']['lines'].append(['device_mem_used_' + gpuIdx, 'used [{0}]'.format(i), 'absolute', 1, 1024**2]) 173 | # self.definitions['memory']['lines'].append(['device_mem_total_' + gpuIdx, 'GPU:{0} total'.format(i), 'absolute', -1, 1024**2]) 174 | 175 | ## Load/usage 176 | if data['device_load_gpu_' + gpuIdx] is not None: 177 | self.definitions['load']['lines'].append(['device_load_gpu_' + gpuIdx, 'gpu [{0}]'.format(i), 'absolute']) 178 | self.definitions['load']['lines'].append(['device_load_mem_' + gpuIdx, 'memory [{0}]'.format(i), 'absolute']) 179 | 180 | ## Encoder Utilization 181 | if data['device_load_enc_' + gpuIdx] is not None: 182 | self.definitions['load']['lines'].append(['device_load_enc_' + gpuIdx, 'enc [{0}]'.format(i), 'absolute']) 183 | 184 | ## Decoder Utilization 185 | if data['device_load_dec_' + gpuIdx] is not None: 186 | self.definitions['load']['lines'].append(['device_load_dec_' + gpuIdx, 'dec [{0}]'.format(i), 'absolute']) 187 | 188 | ## ECC errors 189 | if data['device_ecc_errors_L1_CACHE_VOLATILE_CORRECTED_' + gpuIdx] is not None: 190 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_L1_CACHE_VOLATILE_CORRECTED_' + gpuIdx, 'L1 Cache Volatile Corrected [{0}]'.format(i), 'absolute']) 191 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_L1_CACHE_VOLATILE_UNCORRECTED_' + gpuIdx, 'L1 Cache Volatile Uncorrected [{0}]'.format(i), 'absolute']) 192 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_L1_CACHE_AGGREGATE_CORRECTED_' + gpuIdx, 'L1 Cache Aggregate Corrected [{0}]'.format(i), 'absolute']) 193 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_L1_CACHE_AGGREGATE_UNCORRECTED_' + gpuIdx, 'L1 Cache Aggregate Uncorrected [{0}]'.format(i), 'absolute']) 194 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_L2_CACHE_VOLATILE_CORRECTED_' + gpuIdx, 'L2 Cache Volatile Corrected [{0}]'.format(i), 'absolute']) 195 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_L2_CACHE_VOLATILE_UNCORRECTED_' + gpuIdx, 'L2 Cache Volatile Uncorrected [{0}]'.format(i), 'absolute']) 196 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_L2_CACHE_AGGREGATE_CORRECTED_' + gpuIdx, 'L2 Cache Aggregate Corrected [{0}]'.format(i), 'absolute']) 197 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_L2_CACHE_AGGREGATE_UNCORRECTED_' + gpuIdx, 'L2 Cache Aggregate Uncorrected [{0}]'.format(i), 'absolute']) 198 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_DEVICE_MEMORY_VOLATILE_CORRECTED_' + gpuIdx, 'Device Memory Volatile Corrected [{0}]'.format(i), 'absolute']) 199 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_DEVICE_MEMORY_VOLATILE_UNCORRECTED_' + gpuIdx, 'Device Memory Volatile Uncorrected [{0}]'.format(i), 'absolute']) 200 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_DEVICE_MEMORY_AGGREGATE_CORRECTED_' + gpuIdx, 'Device Memory Aggregate Corrected [{0}]'.format(i), 'absolute']) 201 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_DEVICE_MEMORY_AGGREGATE_UNCORRECTED_' + gpuIdx, 'Device Memory Aggregate Uncorrected [{0}]'.format(i), 'absolute']) 202 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_REGISTER_FILE_VOLATILE_CORRECTED_' + gpuIdx, 'Register File Volatile Corrected [{0}]'.format(i), 'absolute']) 203 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_REGISTER_FILE_VOLATILE_UNCORRECTED_' + gpuIdx, 'Register File Volatile Uncorrected [{0}]'.format(i), 'absolute']) 204 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_REGISTER_FILE_AGGREGATE_CORRECTED_' + gpuIdx, 'Register File Aggregate Corrected [{0}]'.format(i), 'absolute']) 205 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_REGISTER_FILE_AGGREGATE_UNCORRECTED_' + gpuIdx, 'Register File Aggregate Uncorrected [{0}]'.format(i), 'absolute']) 206 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_TEXTURE_MEMORY_VOLATILE_CORRECTED_' + gpuIdx, 'Texture Memory Volatile Corrected [{0}]'.format(i), 'absolute']) 207 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_TEXTURE_MEMORY_VOLATILE_UNCORRECTED_' + gpuIdx, 'Texture Memory Volatile Uncorrected [{0}]'.format(i), 'absolute']) 208 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_TEXTURE_MEMORY_AGGREGATE_CORRECTED_' + gpuIdx, 'Texture Memory Aggregate Corrected [{0}]'.format(i), 'absolute']) 209 | self.definitions['ecc_errors']['lines'].append(['device_ecc_errors_TEXTURE_MEMORY_AGGREGATE_UNCORRECTED_' + gpuIdx, 'Texture Memory Aggregate Uncorrected [{0}]'.format(i), 'absolute']) 210 | 211 | ## Temperature 212 | if data['device_temp_' + gpuIdx] is not None: 213 | self.definitions['temperature']['lines'].append(['device_temp_' + gpuIdx, 'GPU:{0}'.format(i), 'absolute']) 214 | 215 | ## Fan 216 | if data['device_fanspeed_' + gpuIdx] is not None: 217 | self.definitions['fan']['lines'].append(['device_fanspeed_' + gpuIdx, 'GPU:{0}'.format(i), 'absolute']) 218 | 219 | ## GPU and Memory frequency 220 | if data['device_core_clock_' + gpuIdx] is not None: 221 | self.definitions['frequency']['lines'].append(['device_core_clock_' + gpuIdx, 'core [{0}]'.format(i), 'absolute']) 222 | self.definitions['frequency']['lines'].append(['device_mem_clock_' + gpuIdx, 'memory [{0}]'.format(i), 'absolute']) 223 | ## SM frequency, usually same as GPU - handled extra here because of legacy mode 224 | if data['device_sm_clock_' + gpuIdx] is not None: 225 | self.definitions['frequency']['lines'].append(['device_sm_clock_' + gpuIdx, 'sm [{0}]'.format(i), 'absolute']) 226 | 227 | ## Check if GPU Units are installed and add charts 228 | if self.unitCount: 229 | self.order.append('unit_fan') 230 | self.order.append('unit_psu') 231 | for i in range(self.unitCount): 232 | gpuIdx = str(i) 233 | if data['unit_temp_intake_' + gpuIdx] is not None: 234 | self.definitions['temperature']['lines'].append(['unit_temp_intake_' + gpuIdx, 'intake (unit {0})'.format(i), 'absolute']) 235 | self.definitions['temperature']['lines'].append(['unit_temp_exhaust_' + gpuIdx, 'exhaust (unit {0})'.format(i), 'absolute']) 236 | self.definitions['temperature']['lines'].append(['unit_temp_board_' + gpuIdx, 'board (unit {0})'.format(i), 'absolute']) 237 | if data['unit_fan_speed_' + gpuIdx] is not None: 238 | self.definitions['unit_fan'] = { 239 | 'options': [None, 'Unit fan', 'rpm', 'Unit Fans', 'nv.unit', 'line'], 240 | 'lines': [['unit_fan_speed_' + gpuIdx, 'Unit{0}'.format(i), 'absolute']]} 241 | if data['unit_psu_current_' + gpuIdx] is not None: 242 | self.definitions['unit_psu'] = { 243 | 'options': [None, 'Unit PSU', 'mixed', 'Unit PSU', 'nv.unit', 'line'], 244 | 'lines': [ 245 | ['unit_psu_current_' + gpuIdx, 'current (A) (unit {0})'.format(i), 'absolute'], 246 | ['unit_psu_power_' + gpuIdx, 'power (W) (unit {0})'.format(i), 'absolute'], 247 | ['unit_psu_voltage_' + gpuIdx, 'voltage (V) (unit {0})'.format(i), 'absolute']]} 248 | return True 249 | 250 | def _get_data(self): 251 | data = {} 252 | 253 | if self.deviceCount: 254 | for i in range(self.deviceCount): 255 | gpuIdx = str(i) 256 | handle = pynvml.nvmlDeviceGetHandleByIndex(i) 257 | name = pynvml.nvmlDeviceGetName(handle) 258 | brand = pynvml.nvmlDeviceGetBrand(handle) 259 | 260 | ### Get data ### 261 | ## Memory usage 262 | try: 263 | mem = pynvml.nvmlDeviceGetMemoryInfo(handle) 264 | except Exception as e: 265 | self.debug(str(e)) 266 | mem = None 267 | 268 | ## ECC errors 269 | try: 270 | _memError = {} 271 | _eccCounter = {} 272 | eccErrors = {} 273 | eccCounterType = ['VOLATILE_ECC', 'AGGREGATE_ECC'] 274 | memErrorType = ['ERROR_TYPE_CORRECTED', 'ERROR_TYPE_UNCORRECTED'] 275 | memoryLocationType = ['L1_CACHE', 'L2_CACHE', 'DEVICE_MEMORY', 'REGISTER_FILE', 'TEXTURE_MEMORY'] 276 | for memoryLocation in range(5): 277 | for eccCounter in range(2): 278 | for memError in range(2): 279 | _memError[memErrorType[memError]] = pynvml.nvmlDeviceGetMemoryErrorCounter(handle,memError,eccCounter,memoryLocation) 280 | _eccCounter[eccCounterType[eccCounter]] = _memError 281 | eccErrors[memoryLocationType[memoryLocation]] = _eccCounter 282 | except Exception as e: 283 | self.debug(str(e)) 284 | eccErrors = None 285 | 286 | ## Temperature 287 | try: 288 | temp = pynvml.nvmlDeviceGetTemperature(handle,pynvml.NVML_TEMPERATURE_GPU) 289 | except Exception as e: 290 | self.debug(str(e)) 291 | temp = None 292 | 293 | ## Fan 294 | try: 295 | fanspeed = pynvml.nvmlDeviceGetFanSpeed(handle) 296 | except Exception as e: 297 | self.debug(str(e)) 298 | fanspeed = None 299 | 300 | ## GPU and Memory Utilization 301 | try: 302 | util = pynvml.nvmlDeviceGetUtilizationRates(handle) 303 | gpu_util = util.gpu 304 | mem_util = util.memory 305 | except Exception as e: 306 | self.debug(str(e)) 307 | gpu_util = None 308 | mem_util = None 309 | 310 | ## Encoder Utilization 311 | try: 312 | encoder = pynvml.nvmlDeviceGetEncoderUtilization(handle) 313 | enc_util = encoder[0] 314 | except Exception as e: 315 | self.debug(str(e)) 316 | enc_util = None 317 | 318 | ## Decoder Utilization 319 | try: 320 | decoder = pynvml.nvmlDeviceGetDecoderUtilization(handle) 321 | dec_util = decoder[0] 322 | except Exception as e: 323 | self.debug(str(e)) 324 | dec_util = None 325 | 326 | ## Clock frequencies 327 | try: 328 | clock_core = pynvml.nvmlDeviceGetClockInfo(handle, pynvml.NVML_CLOCK_GRAPHICS) 329 | clock_sm = pynvml.nvmlDeviceGetClockInfo(handle, pynvml.NVML_CLOCK_SM) 330 | clock_mem = pynvml.nvmlDeviceGetClockInfo(handle, pynvml.NVML_CLOCK_MEM) * self.nvMemFactor 331 | except Exception as e: 332 | self.debug(str(e)) 333 | clock_core = None 334 | clock_sm = None 335 | clock_mem = None 336 | 337 | ### Packing data ### 338 | self.debug("Device", gpuIdx, ":", str(name)) 339 | data["device_name_" + gpuIdx] = name 340 | 341 | self.debug("Brand:", str(brand)) 342 | 343 | self.debug(str(name), "Temp :", str(temp)) 344 | data["device_temp_" + gpuIdx] = temp 345 | 346 | self.debug(str(name), "Mem total :", str(mem.total), 'bytes') 347 | data["device_mem_total_" + gpuIdx] = mem.total 348 | 349 | self.debug(str(name), "Mem used :", str(mem.used), 'bytes') 350 | data["device_mem_used_" + gpuIdx] = mem.used 351 | 352 | self.debug(str(name), "Mem free :", str(mem.free), 'bytes') 353 | data["device_mem_free_" + gpuIdx] = mem.free 354 | 355 | self.debug(str(name), "Load GPU :", str(gpu_util), '%') 356 | data["device_load_gpu_" + gpuIdx] = gpu_util 357 | 358 | self.debug(str(name), "Load MEM :", str(mem_util), '%') 359 | data["device_load_mem_" + gpuIdx] = mem_util 360 | 361 | self.debug(str(name), "Load ENC :", str(enc_util), '%') 362 | data["device_load_enc_" + gpuIdx] = enc_util 363 | 364 | self.debug(str(name), "Load DEC :", str(dec_util), '%') 365 | data["device_load_dec_" + gpuIdx] = dec_util 366 | 367 | self.debug(str(name), "Core clock:", str(clock_core), 'MHz') 368 | data["device_core_clock_" + gpuIdx] = clock_core 369 | 370 | self.debug(str(name), "SM clock :", str(clock_sm), 'MHz') 371 | data["device_sm_clock_" + gpuIdx] = clock_sm 372 | 373 | self.debug(str(name), "Mem clock :", str(clock_mem), 'MHz') 374 | data["device_mem_clock_" + gpuIdx] = clock_mem 375 | 376 | self.debug(str(name), "Fan speed :", str(fanspeed), '%') 377 | data["device_fanspeed_" + gpuIdx] = fanspeed 378 | 379 | self.debug(str(name), "ECC errors:", str(eccErrors)) 380 | if eccErrors is not None: 381 | data["device_ecc_errors_L1_CACHE_VOLATILE_CORRECTED_" + gpuIdx] = eccErrors["L1_CACHE"]["VOLATILE_ECC"]["ERROR_TYPE_CORRECTED"] 382 | data["device_ecc_errors_L1_CACHE_VOLATILE_UNCORRECTED_" + gpuIdx] = eccErrors["L1_CACHE"]["VOLATILE_ECC"]["ERROR_TYPE_UNCORRECTED"] 383 | data["device_ecc_errors_L1_CACHE_AGGREGATE_CORRECTED_" + gpuIdx] = eccErrors["L1_CACHE"]["AGGREGATE_ECC"]["ERROR_TYPE_CORRECTED"] 384 | data["device_ecc_errors_L1_CACHE_AGGREGATE_UNCORRECTED_" + gpuIdx] = eccErrors["L1_CACHE"]["AGGREGATE_ECC"]["ERROR_TYPE_UNCORRECTED"] 385 | data["device_ecc_errors_L2_CACHE_VOLATILE_CORRECTED_" + gpuIdx] = eccErrors["L2_CACHE"]["VOLATILE_ECC"]["ERROR_TYPE_CORRECTED"] 386 | data["device_ecc_errors_L2_CACHE_VOLATILE_UNCORRECTED_" + gpuIdx] = eccErrors["L2_CACHE"]["VOLATILE_ECC"]["ERROR_TYPE_UNCORRECTED"] 387 | data["device_ecc_errors_L2_CACHE_AGGREGATE_CORRECTED_" + gpuIdx] = eccErrors["L2_CACHE"]["AGGREGATE_ECC"]["ERROR_TYPE_CORRECTED"] 388 | data["device_ecc_errors_L2_CACHE_AGGREGATE_UNCORRECTED_" + gpuIdx] = eccErrors["L2_CACHE"]["AGGREGATE_ECC"]["ERROR_TYPE_UNCORRECTED"] 389 | data["device_ecc_errors_DEVICE_MEMORY_VOLATILE_CORRECTED_" + gpuIdx] = eccErrors["DEVICE_MEMORY"]["VOLATILE_ECC"]["ERROR_TYPE_CORRECTED"] 390 | data["device_ecc_errors_DEVICE_MEMORY_VOLATILE_UNCORRECTED_" + gpuIdx] = eccErrors["DEVICE_MEMORY"]["VOLATILE_ECC"]["ERROR_TYPE_UNCORRECTED"] 391 | data["device_ecc_errors_DEVICE_MEMORY_AGGREGATE_CORRECTED_" + gpuIdx] = eccErrors["DEVICE_MEMORY"]["AGGREGATE_ECC"]["ERROR_TYPE_CORRECTED"] 392 | data["device_ecc_errors_DEVICE_MEMORY_AGGREGATE_UNCORRECTED_" + gpuIdx] = eccErrors["DEVICE_MEMORY"]["AGGREGATE_ECC"]["ERROR_TYPE_UNCORRECTED"] 393 | data["device_ecc_errors_REGISTER_FILE_VOLATILE_CORRECTED_" + gpuIdx] = eccErrors["REGISTER_FILE"]["VOLATILE_ECC"]["ERROR_TYPE_CORRECTED"] 394 | data["device_ecc_errors_REGISTER_FILE_VOLATILE_UNCORRECTED_" + gpuIdx] = eccErrors["REGISTER_FILE"]["VOLATILE_ECC"]["ERROR_TYPE_UNCORRECTED"] 395 | data["device_ecc_errors_REGISTER_FILE_AGGREGATE_CORRECTED_" + gpuIdx] = eccErrors["REGISTER_FILE"]["AGGREGATE_ECC"]["ERROR_TYPE_CORRECTED"] 396 | data["device_ecc_errors_REGISTER_FILE_AGGREGATE_UNCORRECTED_" + gpuIdx] = eccErrors["REGISTER_FILE"]["AGGREGATE_ECC"]["ERROR_TYPE_UNCORRECTED"] 397 | data["device_ecc_errors_TEXTURE_MEMORY_VOLATILE_CORRECTED_" + gpuIdx] = eccErrors["TEXTURE_MEMORY"]["VOLATILE_ECC"]["ERROR_TYPE_CORRECTED"] 398 | data["device_ecc_errors_TEXTURE_MEMORY_VOLATILE_UNCORRECTED_" + gpuIdx] = eccErrors["TEXTURE_MEMORY"]["VOLATILE_ECC"]["ERROR_TYPE_UNCORRECTED"] 399 | data["device_ecc_errors_TEXTURE_MEMORY_AGGREGATE_CORRECTED_" + gpuIdx] = eccErrors["TEXTURE_MEMORY"]["AGGREGATE_ECC"]["ERROR_TYPE_CORRECTED"] 400 | data["device_ecc_errors_TEXTURE_MEMORY_AGGREGATE_UNCORRECTED_" + gpuIdx] = eccErrors["TEXTURE_MEMORY"]["AGGREGATE_ECC"]["ERROR_TYPE_UNCORRECTED"] 401 | else: 402 | data["device_ecc_errors_L1_CACHE_VOLATILE_CORRECTED_" + gpuIdx] = None 403 | 404 | ## Get unit (S-class Nvidia cards) data 405 | if self.unitCount: 406 | for i in range(self.unitCount): 407 | gpuIdx = str(i) 408 | handle = pynvml.nvmlUnitGetHandleByIndex(i) 409 | 410 | try: 411 | fan = pynvml.nvmlUnitGetFanSpeedInfo(handle) 412 | fan_speed = fan.speed # Fan speed (RPM) 413 | fan_state = fan.state # Flag that indicates whether fan is working properly 414 | except Exception as e: 415 | self.debug(str(e)) 416 | fan_speed = None 417 | fan_state = None 418 | 419 | try: 420 | psu = pynvml.nvmlUnitGetPsuInfo(handle) 421 | psu_current = psu.current # PSU current (A) 422 | psu_power = psu.power # PSU power draw (W) 423 | psu_state = psu.state # The power supply state 424 | psu_voltage = psu.voltage # PSU voltage (V) 425 | except Exception as e: 426 | self.debug(str(e)) 427 | psu_current = None 428 | psu_power = None 429 | psu_state = None 430 | psu_voltage = None 431 | 432 | try: 433 | temp_intake = pynvml.nvmlUnitGetTemperature(handle,0) # Temperature at intake in C 434 | temp_exhaust = pynvml.nvmlUnitGetTemperature(handle,1) # Temperature at exhaust in C 435 | temp_board = pynvml.nvmlUnitGetTemperature(handle,2) # Temperature on board in C 436 | except Exception as e: 437 | self.debug(str(e)) 438 | temp_intake = None 439 | temp_exhaust = None 440 | temp_board = None 441 | 442 | self.debug('Unit fan speed:',str(fan_speed)) 443 | data["unit_fan_speed_" + gpuIdx] = fan_speed 444 | 445 | self.debug('Unit fan state:',str(fan_state)) 446 | data["unit_fan_state_" + gpuIdx] = fan_state 447 | 448 | self.debug('Unit PSU current:',str(psu_current)) 449 | data["unit_psu_current_" + gpuIdx] = psu_current 450 | 451 | self.debug('Unit PSU power:', str(psu_power)) 452 | data["unit_psu_power_" + gpuIdx] = psu_power 453 | 454 | self.debug('Unit PSU state:', str(psu_state)) 455 | data["unit_psu_state_" + gpuIdx] = psu_state 456 | 457 | self.debug('Unit PSU voltage:', str(psu_voltage)) 458 | data["unit_psu_voltage_" + gpuIdx] = psu_voltage 459 | 460 | self.debug('Unit temp intake:', str(temp_intake)) 461 | data["unit_temp_intake_" + gpuIdx] = temp_intake 462 | 463 | self.debug('Unit temp exhaust:', str(temp_exhaust)) 464 | data["unit_temp_exhaust_" + gpuIdx] = temp_exhaust 465 | 466 | self.debug('Unit temp board:', str(temp_board)) 467 | data["unit_temp_board_" + gpuIdx] = temp_board 468 | 469 | ## Get data via legacy mode 470 | if self.legacy: 471 | try: 472 | output, error = Popen( 473 | [ 474 | "nvidia-settings", 475 | "-c", ":0", 476 | "-q", "GPUUtilization", 477 | "-q", "GPUCurrentClockFreqs", 478 | "-q", "GPUCoreTemp", 479 | "-q", "TotalDedicatedGPUMemory", 480 | "-q", "UsedDedicatedGPUMemory" 481 | ], 482 | shell=False, 483 | stdout=PIPE,stderr=PIPE).communicate() 484 | output = repr(str(output)) 485 | if len(output) < 800: 486 | raise Exception('Error in fetching data from nvidia-settings ' + output) 487 | self.debug(str(error), output) 488 | except Exception as e: 489 | self.error(str(e)) 490 | self.error('Setting legacy mode to False') 491 | self.legacy = False 492 | return data 493 | for i in range(self.deviceCount): 494 | gpuIdx = str(i) 495 | if data["device_temp_" + gpuIdx] is None: 496 | coreTemp = findall('GPUCoreTemp.*?(gpu:\d*).*?\s(\d*)', output)[i][1] 497 | try: 498 | data["device_temp_" + gpuIdx] = int(coreTemp) 499 | self.debug('Using legacy temp for GPU {0}: {1}'.format(gpuIdx, coreTemp)) 500 | except Exception as e: 501 | self.debug(str(e), "skipping device_temp_" + gpuIdx) 502 | if data["device_mem_used_" + gpuIdx] is None: 503 | memUsed = findall('UsedDedicatedGPUMemory.*?(gpu:\d*).*?\s(\d*)', output)[i][1] 504 | try: 505 | data["device_mem_used_" + gpuIdx] = int(memUsed) 506 | self.debug('Using legacy mem_used for GPU {0}: {1}'.format(gpuIdx, memUsed)) 507 | except Exception as e: 508 | self.debug(str(e), "skipping device_mem_used_" + gpuIdx) 509 | if data["device_load_gpu_" + gpuIdx] is None: 510 | gpu_util = findall('(gpu:\d*).*?graphics=(\d*),.*?memory=(\d*)', output)[i][1] 511 | try: 512 | data["device_load_gpu_" + gpuIdx] = int(gpu_util) 513 | self.debug('Using legacy load_gpu for GPU {0}: {1}'.format(gpuIdx, gpu_util)) 514 | except Exception as e: 515 | self.debug(str(e), "skipping device_load_gpu_" + gpuIdx) 516 | if data["device_load_mem_" + gpuIdx] is None: 517 | mem_util = findall('(gpu:\d*).*?graphics=(\d*),.*?memory=(\d*)', output)[i][2] 518 | try: 519 | data["device_load_mem_" + gpuIdx] = int(mem_util) 520 | self.debug('Using legacy load_mem for GPU {0}: {1}'.format(gpuIdx, mem_util)) 521 | except Exception as e: 522 | self.debug(str(e), "skipping device_load_mem_" + gpuIdx) 523 | if data["device_core_clock_" + gpuIdx] is None: 524 | clock_core = findall('GPUCurrentClockFreqs.*?(gpu:\d*).*?(\d*),(\d*)', output)[i][1] 525 | try: 526 | data["device_core_clock_" + gpuIdx] = int(clock_core) 527 | self.debug('Using legacy core_clock for GPU {0}: {1}'.format(gpuIdx, clock_core)) 528 | except Exception as e: 529 | self.debug(str(e), "skipping device_core_clock_" + gpuIdx) 530 | if data["device_mem_clock_" + gpuIdx] is None: 531 | clock_mem = findall('GPUCurrentClockFreqs.*?(gpu:\d*).*?(\d*),(\d*)', output)[i][2] 532 | try: 533 | data["device_mem_clock_" + gpuIdx] = int(clock_mem) 534 | self.debug('Using legacy mem_clock for GPU {0}: {1}'.format(gpuIdx, clock_mem)) 535 | except Exception as e: 536 | self.debug(str(e), "skipping device_mem_clock_" + gpuIdx) 537 | 538 | return data 539 | -------------------------------------------------------------------------------- /nv.conf: -------------------------------------------------------------------------------- 1 | # netdata python.d.plugin configuration for nv 2 | # 3 | # Set "legacy: True" for old Nvidia graphic cards. 4 | # 5 | # With older GPUs like my Nvidia GeForce 9600m gt, the load and clock frequencies 6 | # cannot be read by the NVML lib. Only temperature and memory usage is displayed. 7 | # 8 | # Set "legacy: True" in the "nv.conf" file to poll GPU and memory load/frequency 9 | # via the nvidia-settings application (also installed with the Nvidia driver). 10 | # 11 | # IMPORTANT: This legacy mode only works with a running X session, so this will NOT 12 | # work on headless clients. Also when the X session is not hosted by root, which is 13 | # usually the case when running e.g. Ubuntu, you **must** allow "root" to connect to 14 | # the X session. You can do that by executing this command in a terminal as the user 15 | # of the X session (i.e. the user you are logged into your e.g. GNOME desktop environment, 16 | # tested under Ubuntu 16.04): 17 | # 18 | # $ xhost +local:root 19 | # 20 | # Don't forget to restart NetData afterwards with e.g. "sudo service netdata restart" 21 | # 22 | # For the sake of completeness: If you want to disable the root access to the X session 23 | # again, execute: 24 | # 25 | # $ xhost -local:root 26 | # 27 | # Set "nvMemFactor" to 2 if you want to display "the real clock speed" 28 | # (https://en.wikipedia.org/wiki/DDR_SDRAM#Double_data_rate_.28DDR.29_SDRAM_specification) 29 | 30 | update_every : 1 31 | retries : 10 32 | priority : 20000 33 | 34 | legacy : False 35 | nvMemFactor : 1 36 | -------------------------------------------------------------------------------- /python_modules/pynvml.py: -------------------------------------------------------------------------------- 1 | ##### 2 | # Copyright (c) 2011-2015, NVIDIA Corporation. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without 5 | # modification, are permitted provided that the following conditions are met: 6 | # 7 | # * Redistributions of source code must retain the above copyright notice, 8 | # this list of conditions and the following disclaimer. 9 | # * Redistributions in binary form must reproduce the above copyright 10 | # notice, this list of conditions and the following disclaimer in the 11 | # documentation and/or other materials provided with the distribution. 12 | # * Neither the name of the NVIDIA Corporation nor the names of its 13 | # contributors may be used to endorse or promote products derived from 14 | # this software without specific prior written permission. 15 | # 16 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 17 | # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18 | # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19 | # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE 20 | # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 21 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 22 | # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 23 | # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 24 | # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 25 | # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF 26 | # THE POSSIBILITY OF SUCH DAMAGE. 27 | ##### 28 | 29 | ## 30 | # Python bindings for the NVML library 31 | ## 32 | from ctypes import * 33 | from ctypes.util import find_library 34 | import sys 35 | import os 36 | import threading 37 | import string 38 | 39 | ## C Type mappings ## 40 | ## Enums 41 | _nvmlEnableState_t = c_uint 42 | NVML_FEATURE_DISABLED = 0 43 | NVML_FEATURE_ENABLED = 1 44 | 45 | _nvmlBrandType_t = c_uint 46 | NVML_BRAND_UNKNOWN = 0 47 | NVML_BRAND_QUADRO = 1 48 | NVML_BRAND_TESLA = 2 49 | NVML_BRAND_NVS = 3 50 | NVML_BRAND_GRID = 4 51 | NVML_BRAND_GEFORCE = 5 52 | NVML_BRAND_COUNT = 6 53 | 54 | _nvmlTemperatureThresholds_t = c_uint 55 | NVML_TEMPERATURE_THRESHOLD_SHUTDOWN = 0 56 | NVML_TEMPERATURE_THRESHOLD_SLOWDOWN = 1 57 | NVML_TEMPERATURE_THRESHOLD_COUNT = 1 58 | 59 | _nvmlTemperatureSensors_t = c_uint 60 | NVML_TEMPERATURE_GPU = 0 61 | NVML_TEMPERATURE_COUNT = 1 62 | 63 | _nvmlComputeMode_t = c_uint 64 | NVML_COMPUTEMODE_DEFAULT = 0 65 | NVML_COMPUTEMODE_EXCLUSIVE_THREAD = 1 66 | NVML_COMPUTEMODE_PROHIBITED = 2 67 | NVML_COMPUTEMODE_EXCLUSIVE_PROCESS = 3 68 | NVML_COMPUTEMODE_COUNT = 4 69 | 70 | _nvmlMemoryLocation_t = c_uint 71 | NVML_MEMORY_LOCATION_L1_CACHE = 0 72 | NVML_MEMORY_LOCATION_L2_CACHE = 1 73 | NVML_MEMORY_LOCATION_DEVICE_MEMORY = 2 74 | NVML_MEMORY_LOCATION_REGISTER_FILE = 3 75 | NVML_MEMORY_LOCATION_TEXTURE_MEMORY = 4 76 | NVML_MEMORY_LOCATION_COUNT = 5 77 | 78 | # These are deprecated, instead use _nvmlMemoryErrorType_t 79 | _nvmlEccBitType_t = c_uint 80 | NVML_SINGLE_BIT_ECC = 0 81 | NVML_DOUBLE_BIT_ECC = 1 82 | NVML_ECC_ERROR_TYPE_COUNT = 2 83 | 84 | _nvmlEccCounterType_t = c_uint 85 | NVML_VOLATILE_ECC = 0 86 | NVML_AGGREGATE_ECC = 1 87 | NVML_ECC_COUNTER_TYPE_COUNT = 2 88 | 89 | _nvmlMemoryErrorType_t = c_uint 90 | NVML_MEMORY_ERROR_TYPE_CORRECTED = 0 91 | NVML_MEMORY_ERROR_TYPE_UNCORRECTED = 1 92 | NVML_MEMORY_ERROR_TYPE_COUNT = 2 93 | 94 | _nvmlClockType_t = c_uint 95 | NVML_CLOCK_GRAPHICS = 0 96 | NVML_CLOCK_SM = 1 97 | NVML_CLOCK_MEM = 2 98 | NVML_CLOCK_COUNT = 3 99 | 100 | _nvmlDriverModel_t = c_uint 101 | NVML_DRIVER_WDDM = 0 102 | NVML_DRIVER_WDM = 1 103 | 104 | _nvmlPstates_t = c_uint 105 | NVML_PSTATE_0 = 0 106 | NVML_PSTATE_1 = 1 107 | NVML_PSTATE_2 = 2 108 | NVML_PSTATE_3 = 3 109 | NVML_PSTATE_4 = 4 110 | NVML_PSTATE_5 = 5 111 | NVML_PSTATE_6 = 6 112 | NVML_PSTATE_7 = 7 113 | NVML_PSTATE_8 = 8 114 | NVML_PSTATE_9 = 9 115 | NVML_PSTATE_10 = 10 116 | NVML_PSTATE_11 = 11 117 | NVML_PSTATE_12 = 12 118 | NVML_PSTATE_13 = 13 119 | NVML_PSTATE_14 = 14 120 | NVML_PSTATE_15 = 15 121 | NVML_PSTATE_UNKNOWN = 32 122 | 123 | _nvmlInforomObject_t = c_uint 124 | NVML_INFOROM_OEM = 0 125 | NVML_INFOROM_ECC = 1 126 | NVML_INFOROM_POWER = 2 127 | NVML_INFOROM_COUNT = 3 128 | 129 | _nvmlReturn_t = c_uint 130 | NVML_SUCCESS = 0 131 | NVML_ERROR_UNINITIALIZED = 1 132 | NVML_ERROR_INVALID_ARGUMENT = 2 133 | NVML_ERROR_NOT_SUPPORTED = 3 134 | NVML_ERROR_NO_PERMISSION = 4 135 | NVML_ERROR_ALREADY_INITIALIZED = 5 136 | NVML_ERROR_NOT_FOUND = 6 137 | NVML_ERROR_INSUFFICIENT_SIZE = 7 138 | NVML_ERROR_INSUFFICIENT_POWER = 8 139 | NVML_ERROR_DRIVER_NOT_LOADED = 9 140 | NVML_ERROR_TIMEOUT = 10 141 | NVML_ERROR_IRQ_ISSUE = 11 142 | NVML_ERROR_LIBRARY_NOT_FOUND = 12 143 | NVML_ERROR_FUNCTION_NOT_FOUND = 13 144 | NVML_ERROR_CORRUPTED_INFOROM = 14 145 | NVML_ERROR_GPU_IS_LOST = 15 146 | NVML_ERROR_RESET_REQUIRED = 16 147 | NVML_ERROR_OPERATING_SYSTEM = 17 148 | NVML_ERROR_LIB_RM_VERSION_MISMATCH = 18 149 | NVML_ERROR_UNKNOWN = 999 150 | 151 | _nvmlFanState_t = c_uint 152 | NVML_FAN_NORMAL = 0 153 | NVML_FAN_FAILED = 1 154 | 155 | _nvmlLedColor_t = c_uint 156 | NVML_LED_COLOR_GREEN = 0 157 | NVML_LED_COLOR_AMBER = 1 158 | 159 | _nvmlGpuOperationMode_t = c_uint 160 | NVML_GOM_ALL_ON = 0 161 | NVML_GOM_COMPUTE = 1 162 | NVML_GOM_LOW_DP = 2 163 | 164 | _nvmlPageRetirementCause_t = c_uint 165 | NVML_PAGE_RETIREMENT_CAUSE_DOUBLE_BIT_ECC_ERROR = 0 166 | NVML_PAGE_RETIREMENT_CAUSE_MULTIPLE_SINGLE_BIT_ECC_ERRORS = 1 167 | NVML_PAGE_RETIREMENT_CAUSE_COUNT = 2 168 | 169 | _nvmlRestrictedAPI_t = c_uint 170 | NVML_RESTRICTED_API_SET_APPLICATION_CLOCKS = 0 171 | NVML_RESTRICTED_API_SET_AUTO_BOOSTED_CLOCKS = 1 172 | NVML_RESTRICTED_API_COUNT = 2 173 | 174 | _nvmlBridgeChipType_t = c_uint 175 | NVML_BRIDGE_CHIP_PLX = 0 176 | NVML_BRIDGE_CHIP_BRO4 = 1 177 | NVML_MAX_PHYSICAL_BRIDGE = 128 178 | 179 | _nvmlValueType_t = c_uint 180 | NVML_VALUE_TYPE_DOUBLE = 0 181 | NVML_VALUE_TYPE_UNSIGNED_INT = 1 182 | NVML_VALUE_TYPE_UNSIGNED_LONG = 2 183 | NVML_VALUE_TYPE_UNSIGNED_LONG_LONG = 3 184 | NVML_VALUE_TYPE_COUNT = 4 185 | 186 | _nvmlPerfPolicyType_t = c_uint 187 | NVML_PERF_POLICY_POWER = 0 188 | NVML_PERF_POLICY_THERMAL = 1 189 | NVML_PERF_POLICY_COUNT = 2 190 | 191 | _nvmlSamplingType_t = c_uint 192 | NVML_TOTAL_POWER_SAMPLES = 0 193 | NVML_GPU_UTILIZATION_SAMPLES = 1 194 | NVML_MEMORY_UTILIZATION_SAMPLES = 2 195 | NVML_ENC_UTILIZATION_SAMPLES = 3 196 | NVML_DEC_UTILIZATION_SAMPLES = 4 197 | NVML_PROCESSOR_CLK_SAMPLES = 5 198 | NVML_MEMORY_CLK_SAMPLES = 6 199 | NVML_SAMPLINGTYPE_COUNT = 7 200 | 201 | _nvmlPcieUtilCounter_t = c_uint 202 | NVML_PCIE_UTIL_TX_BYTES = 0 203 | NVML_PCIE_UTIL_RX_BYTES = 1 204 | NVML_PCIE_UTIL_COUNT = 2 205 | 206 | _nvmlGpuTopologyLevel_t = c_uint 207 | NVML_TOPOLOGY_INTERNAL = 0 208 | NVML_TOPOLOGY_SINGLE = 10 209 | NVML_TOPOLOGY_MULTIPLE = 20 210 | NVML_TOPOLOGY_HOSTBRIDGE = 30 211 | NVML_TOPOLOGY_CPU = 40 212 | NVML_TOPOLOGY_SYSTEM = 50 213 | 214 | # C preprocessor defined values 215 | nvmlFlagDefault = 0 216 | nvmlFlagForce = 1 217 | 218 | # buffer size 219 | NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE = 16 220 | NVML_DEVICE_UUID_BUFFER_SIZE = 80 221 | NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE = 81 222 | NVML_SYSTEM_NVML_VERSION_BUFFER_SIZE = 80 223 | NVML_DEVICE_NAME_BUFFER_SIZE = 64 224 | NVML_DEVICE_SERIAL_BUFFER_SIZE = 30 225 | NVML_DEVICE_VBIOS_VERSION_BUFFER_SIZE = 32 226 | NVML_DEVICE_PCI_BUS_ID_BUFFER_SIZE = 16 227 | 228 | NVML_VALUE_NOT_AVAILABLE_ulonglong = c_ulonglong(-1) 229 | NVML_VALUE_NOT_AVAILABLE_uint = c_uint(-1) 230 | 231 | ## Lib loading ## 232 | nvmlLib = None 233 | libLoadLock = threading.Lock() 234 | _nvmlLib_refcount = 0 # Incremented on each nvmlInit and decremented on nvmlShutdown 235 | 236 | ## Error Checking ## 237 | class NVMLError(Exception): 238 | _valClassMapping = dict() 239 | # List of currently known error codes 240 | _errcode_to_string = { 241 | NVML_ERROR_UNINITIALIZED: "Uninitialized", 242 | NVML_ERROR_INVALID_ARGUMENT: "Invalid Argument", 243 | NVML_ERROR_NOT_SUPPORTED: "Not Supported", 244 | NVML_ERROR_NO_PERMISSION: "Insufficient Permissions", 245 | NVML_ERROR_ALREADY_INITIALIZED: "Already Initialized", 246 | NVML_ERROR_NOT_FOUND: "Not Found", 247 | NVML_ERROR_INSUFFICIENT_SIZE: "Insufficient Size", 248 | NVML_ERROR_INSUFFICIENT_POWER: "Insufficient External Power", 249 | NVML_ERROR_DRIVER_NOT_LOADED: "Driver Not Loaded", 250 | NVML_ERROR_TIMEOUT: "Timeout", 251 | NVML_ERROR_IRQ_ISSUE: "Interrupt Request Issue", 252 | NVML_ERROR_LIBRARY_NOT_FOUND: "NVML Shared Library Not Found", 253 | NVML_ERROR_FUNCTION_NOT_FOUND: "Function Not Found", 254 | NVML_ERROR_CORRUPTED_INFOROM: "Corrupted infoROM", 255 | NVML_ERROR_GPU_IS_LOST: "GPU is lost", 256 | NVML_ERROR_RESET_REQUIRED: "GPU requires restart", 257 | NVML_ERROR_OPERATING_SYSTEM: "The operating system has blocked the request.", 258 | NVML_ERROR_LIB_RM_VERSION_MISMATCH: "RM has detected an NVML/RM version mismatch.", 259 | NVML_ERROR_UNKNOWN: "Unknown Error", 260 | } 261 | def __new__(typ, value): 262 | ''' 263 | Maps value to a proper subclass of NVMLError. 264 | See _extractNVMLErrorsAsClasses function for more details 265 | ''' 266 | if typ == NVMLError: 267 | typ = NVMLError._valClassMapping.get(value, typ) 268 | obj = Exception.__new__(typ) 269 | obj.value = value 270 | return obj 271 | def __str__(self): 272 | try: 273 | if self.value not in NVMLError._errcode_to_string: 274 | NVMLError._errcode_to_string[self.value] = str(nvmlErrorString(self.value)) 275 | return NVMLError._errcode_to_string[self.value] 276 | except NVMLError_Uninitialized: 277 | return "NVML Error with code %d" % self.value 278 | def __eq__(self, other): 279 | return self.value == other.value 280 | 281 | def _extractNVMLErrorsAsClasses(): 282 | ''' 283 | Generates a hierarchy of classes on top of NVMLError class. 284 | 285 | Each NVML Error gets a new NVMLError subclass. This way try,except blocks can filter appropriate 286 | exceptions more easily. 287 | 288 | NVMLError is a parent class. Each NVML_ERROR_* gets it's own subclass. 289 | e.g. NVML_ERROR_ALREADY_INITIALIZED will be turned into NVMLError_AlreadyInitialized 290 | ''' 291 | this_module = sys.modules[__name__] 292 | nvmlErrorsNames = filter(lambda x: x.startswith("NVML_ERROR_"), dir(this_module)) 293 | for err_name in nvmlErrorsNames: 294 | # e.g. Turn NVML_ERROR_ALREADY_INITIALIZED into NVMLError_AlreadyInitialized 295 | class_name = "NVMLError_" + string.capwords(err_name.replace("NVML_ERROR_", ""), "_").replace("_", "") 296 | err_val = getattr(this_module, err_name) 297 | def gen_new(val): 298 | def new(typ): 299 | obj = NVMLError.__new__(typ, val) 300 | return obj 301 | return new 302 | new_error_class = type(class_name, (NVMLError,), {'__new__': gen_new(err_val)}) 303 | new_error_class.__module__ = __name__ 304 | setattr(this_module, class_name, new_error_class) 305 | NVMLError._valClassMapping[err_val] = new_error_class 306 | _extractNVMLErrorsAsClasses() 307 | 308 | def _nvmlCheckReturn(ret): 309 | if (ret != NVML_SUCCESS): 310 | raise NVMLError(ret) 311 | return ret 312 | 313 | ## Function access ## 314 | _nvmlGetFunctionPointer_cache = dict() # function pointers are cached to prevent unnecessary libLoadLock locking 315 | def _nvmlGetFunctionPointer(name): 316 | global nvmlLib 317 | 318 | if name in _nvmlGetFunctionPointer_cache: 319 | return _nvmlGetFunctionPointer_cache[name] 320 | 321 | libLoadLock.acquire() 322 | try: 323 | # ensure library was loaded 324 | if (nvmlLib == None): 325 | raise NVMLError(NVML_ERROR_UNINITIALIZED) 326 | try: 327 | _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name) 328 | return _nvmlGetFunctionPointer_cache[name] 329 | except AttributeError: 330 | raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND) 331 | finally: 332 | # lock is always freed 333 | libLoadLock.release() 334 | 335 | ## Alternative object 336 | # Allows the object to be printed 337 | # Allows mismatched types to be assigned 338 | # - like None when the Structure variant requires c_uint 339 | class nvmlFriendlyObject(object): 340 | def __init__(self, dictionary): 341 | for x in dictionary: 342 | setattr(self, x, dictionary[x]) 343 | def __str__(self): 344 | return self.__dict__.__str__() 345 | 346 | def nvmlStructToFriendlyObject(struct): 347 | d = {} 348 | for x in struct._fields_: 349 | key = x[0] 350 | value = getattr(struct, key) 351 | d[key] = value 352 | obj = nvmlFriendlyObject(d) 353 | return obj 354 | 355 | # pack the object so it can be passed to the NVML library 356 | def nvmlFriendlyObjectToStruct(obj, model): 357 | for x in model._fields_: 358 | key = x[0] 359 | value = obj.__dict__[key] 360 | setattr(model, key, value) 361 | return model 362 | 363 | ## Unit structures 364 | class struct_c_nvmlUnit_t(Structure): 365 | pass # opaque handle 366 | c_nvmlUnit_t = POINTER(struct_c_nvmlUnit_t) 367 | 368 | class _PrintableStructure(Structure): 369 | """ 370 | Abstract class that produces nicer __str__ output than ctypes.Structure. 371 | e.g. instead of: 372 | >>> print str(obj) 373 | 374 | this class will print 375 | class_name(field_name: formatted_value, field_name: formatted_value) 376 | 377 | _fmt_ dictionary of -> 378 | e.g. class that has _field_ 'hex_value', c_uint could be formatted with 379 | _fmt_ = {"hex_value" : "%08X"} 380 | to produce nicer output. 381 | Default fomratting string for all fields can be set with key "" like: 382 | _fmt_ = {"" : "%d MHz"} # e.g all values are numbers in MHz. 383 | If not set it's assumed to be just "%s" 384 | 385 | Exact format of returned str from this class is subject to change in the future. 386 | """ 387 | _fmt_ = {} 388 | def __str__(self): 389 | result = [] 390 | for x in self._fields_: 391 | key = x[0] 392 | value = getattr(self, key) 393 | fmt = "%s" 394 | if key in self._fmt_: 395 | fmt = self._fmt_[key] 396 | elif "" in self._fmt_: 397 | fmt = self._fmt_[""] 398 | result.append(("%s: " + fmt) % (key, value)) 399 | return self.__class__.__name__ + "(" + string.join(result, ", ") + ")" 400 | 401 | class c_nvmlUnitInfo_t(_PrintableStructure): 402 | _fields_ = [ 403 | ('name', c_char * 96), 404 | ('id', c_char * 96), 405 | ('serial', c_char * 96), 406 | ('firmwareVersion', c_char * 96), 407 | ] 408 | 409 | class c_nvmlLedState_t(_PrintableStructure): 410 | _fields_ = [ 411 | ('cause', c_char * 256), 412 | ('color', _nvmlLedColor_t), 413 | ] 414 | 415 | class c_nvmlPSUInfo_t(_PrintableStructure): 416 | _fields_ = [ 417 | ('state', c_char * 256), 418 | ('current', c_uint), 419 | ('voltage', c_uint), 420 | ('power', c_uint), 421 | ] 422 | 423 | class c_nvmlUnitFanInfo_t(_PrintableStructure): 424 | _fields_ = [ 425 | ('speed', c_uint), 426 | ('state', _nvmlFanState_t), 427 | ] 428 | 429 | class c_nvmlUnitFanSpeeds_t(_PrintableStructure): 430 | _fields_ = [ 431 | ('fans', c_nvmlUnitFanInfo_t * 24), 432 | ('count', c_uint) 433 | ] 434 | 435 | ## Device structures 436 | class struct_c_nvmlDevice_t(Structure): 437 | pass # opaque handle 438 | c_nvmlDevice_t = POINTER(struct_c_nvmlDevice_t) 439 | 440 | class nvmlPciInfo_t(_PrintableStructure): 441 | _fields_ = [ 442 | ('busId', c_char * 16), 443 | ('domain', c_uint), 444 | ('bus', c_uint), 445 | ('device', c_uint), 446 | ('pciDeviceId', c_uint), 447 | 448 | # Added in 2.285 449 | ('pciSubSystemId', c_uint), 450 | ('reserved0', c_uint), 451 | ('reserved1', c_uint), 452 | ('reserved2', c_uint), 453 | ('reserved3', c_uint), 454 | ] 455 | _fmt_ = { 456 | 'domain' : "0x%04X", 457 | 'bus' : "0x%02X", 458 | 'device' : "0x%02X", 459 | 'pciDeviceId' : "0x%08X", 460 | 'pciSubSystemId' : "0x%08X", 461 | } 462 | 463 | class c_nvmlMemory_t(_PrintableStructure): 464 | _fields_ = [ 465 | ('total', c_ulonglong), 466 | ('free', c_ulonglong), 467 | ('used', c_ulonglong), 468 | ] 469 | _fmt_ = {'': "%d B"} 470 | 471 | class c_nvmlBAR1Memory_t(_PrintableStructure): 472 | _fields_ = [ 473 | ('bar1Total', c_ulonglong), 474 | ('bar1Free', c_ulonglong), 475 | ('bar1Used', c_ulonglong), 476 | ] 477 | _fmt_ = {'': "%d B"} 478 | 479 | # On Windows with the WDDM driver, usedGpuMemory is reported as None 480 | # Code that processes this structure should check for None, I.E. 481 | # 482 | # if (info.usedGpuMemory == None): 483 | # # TODO handle the error 484 | # pass 485 | # else: 486 | # print("Using %d MiB of memory" % (info.usedGpuMemory / 1024 / 1024)) 487 | # 488 | # See NVML documentation for more information 489 | class c_nvmlProcessInfo_t(_PrintableStructure): 490 | _fields_ = [ 491 | ('pid', c_uint), 492 | ('usedGpuMemory', c_ulonglong), 493 | ] 494 | _fmt_ = {'usedGpuMemory': "%d B"} 495 | 496 | class c_nvmlBridgeChipInfo_t(_PrintableStructure): 497 | _fields_ = [ 498 | ('type', _nvmlBridgeChipType_t), 499 | ('fwVersion', c_uint), 500 | ] 501 | 502 | class c_nvmlBridgeChipHierarchy_t(_PrintableStructure): 503 | _fields_ = [ 504 | ('bridgeCount', c_uint), 505 | ('bridgeChipInfo', c_nvmlBridgeChipInfo_t * 128), 506 | ] 507 | 508 | class c_nvmlEccErrorCounts_t(_PrintableStructure): 509 | _fields_ = [ 510 | ('l1Cache', c_ulonglong), 511 | ('l2Cache', c_ulonglong), 512 | ('deviceMemory', c_ulonglong), 513 | ('registerFile', c_ulonglong), 514 | ] 515 | 516 | class c_nvmlUtilization_t(_PrintableStructure): 517 | _fields_ = [ 518 | ('gpu', c_uint), 519 | ('memory', c_uint), 520 | ] 521 | _fmt_ = {'': "%d %%"} 522 | 523 | # Added in 2.285 524 | class c_nvmlHwbcEntry_t(_PrintableStructure): 525 | _fields_ = [ 526 | ('hwbcId', c_uint), 527 | ('firmwareVersion', c_char * 32), 528 | ] 529 | 530 | class c_nvmlValue_t(Union): 531 | _fields_ = [ 532 | ('dVal', c_double), 533 | ('uiVal', c_uint), 534 | ('ulVal', c_ulong), 535 | ('ullVal', c_ulonglong), 536 | ] 537 | 538 | class c_nvmlSample_t(_PrintableStructure): 539 | _fields_ = [ 540 | ('timeStamp', c_ulonglong), 541 | ('sampleValue', c_nvmlValue_t), 542 | ] 543 | 544 | class c_nvmlViolationTime_t(_PrintableStructure): 545 | _fields_ = [ 546 | ('referenceTime', c_ulonglong), 547 | ('violationTime', c_ulonglong), 548 | ] 549 | 550 | ## Event structures 551 | class struct_c_nvmlEventSet_t(Structure): 552 | pass # opaque handle 553 | c_nvmlEventSet_t = POINTER(struct_c_nvmlEventSet_t) 554 | 555 | nvmlEventTypeSingleBitEccError = 0x0000000000000001 556 | nvmlEventTypeDoubleBitEccError = 0x0000000000000002 557 | nvmlEventTypePState = 0x0000000000000004 558 | nvmlEventTypeXidCriticalError = 0x0000000000000008 559 | nvmlEventTypeClock = 0x0000000000000010 560 | nvmlEventTypeNone = 0x0000000000000000 561 | nvmlEventTypeAll = ( 562 | nvmlEventTypeNone | 563 | nvmlEventTypeSingleBitEccError | 564 | nvmlEventTypeDoubleBitEccError | 565 | nvmlEventTypePState | 566 | nvmlEventTypeClock | 567 | nvmlEventTypeXidCriticalError 568 | ) 569 | 570 | ## Clock Throttle Reasons defines 571 | nvmlClocksThrottleReasonGpuIdle = 0x0000000000000001 572 | nvmlClocksThrottleReasonApplicationsClocksSetting = 0x0000000000000002 573 | nvmlClocksThrottleReasonUserDefinedClocks = nvmlClocksThrottleReasonApplicationsClocksSetting # deprecated, use nvmlClocksThrottleReasonApplicationsClocksSetting 574 | nvmlClocksThrottleReasonSwPowerCap = 0x0000000000000004 575 | nvmlClocksThrottleReasonHwSlowdown = 0x0000000000000008 576 | nvmlClocksThrottleReasonUnknown = 0x8000000000000000 577 | nvmlClocksThrottleReasonNone = 0x0000000000000000 578 | nvmlClocksThrottleReasonAll = ( 579 | nvmlClocksThrottleReasonNone | 580 | nvmlClocksThrottleReasonGpuIdle | 581 | nvmlClocksThrottleReasonApplicationsClocksSetting | 582 | nvmlClocksThrottleReasonSwPowerCap | 583 | nvmlClocksThrottleReasonHwSlowdown | 584 | nvmlClocksThrottleReasonUnknown 585 | ) 586 | 587 | class c_nvmlEventData_t(_PrintableStructure): 588 | _fields_ = [ 589 | ('device', c_nvmlDevice_t), 590 | ('eventType', c_ulonglong), 591 | ('eventData', c_ulonglong) 592 | ] 593 | _fmt_ = {'eventType': "0x%08X"} 594 | 595 | class c_nvmlAccountingStats_t(_PrintableStructure): 596 | _fields_ = [ 597 | ('gpuUtilization', c_uint), 598 | ('memoryUtilization', c_uint), 599 | ('maxMemoryUsage', c_ulonglong), 600 | ('time', c_ulonglong), 601 | ('startTime', c_ulonglong), 602 | ('isRunning', c_uint), 603 | ('reserved', c_uint * 5) 604 | ] 605 | 606 | ## C function wrappers ## 607 | def nvmlInit(): 608 | _LoadNvmlLibrary() 609 | 610 | # 611 | # Initialize the library 612 | # 613 | fn = _nvmlGetFunctionPointer("nvmlInit_v2") 614 | ret = fn() 615 | _nvmlCheckReturn(ret) 616 | 617 | # Atomically update refcount 618 | global _nvmlLib_refcount 619 | libLoadLock.acquire() 620 | _nvmlLib_refcount += 1 621 | libLoadLock.release() 622 | return None 623 | 624 | def _LoadNvmlLibrary(): 625 | ''' 626 | Load the library if it isn't loaded already 627 | ''' 628 | global nvmlLib 629 | 630 | if (nvmlLib == None): 631 | # lock to ensure only one caller loads the library 632 | libLoadLock.acquire() 633 | 634 | try: 635 | # ensure the library still isn't loaded 636 | if (nvmlLib == None): 637 | try: 638 | if (sys.platform[:3] == "win"): 639 | # cdecl calling convention 640 | # load nvml.dll from %ProgramFiles%/NVIDIA Corporation/NVSMI/nvml.dll 641 | nvmlLib = CDLL(os.path.join(os.getenv("ProgramFiles", "C:/Program Files"), "NVIDIA Corporation/NVSMI/nvml.dll")) 642 | else: 643 | # assume linux 644 | nvmlLib = CDLL("libnvidia-ml.so.1") 645 | except OSError as ose: 646 | _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND) 647 | if (nvmlLib == None): 648 | _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND) 649 | finally: 650 | # lock is always freed 651 | libLoadLock.release() 652 | 653 | def nvmlShutdown(): 654 | # 655 | # Leave the library loaded, but shutdown the interface 656 | # 657 | fn = _nvmlGetFunctionPointer("nvmlShutdown") 658 | ret = fn() 659 | _nvmlCheckReturn(ret) 660 | 661 | # Atomically update refcount 662 | global _nvmlLib_refcount 663 | libLoadLock.acquire() 664 | if (0 < _nvmlLib_refcount): 665 | _nvmlLib_refcount -= 1 666 | libLoadLock.release() 667 | return None 668 | 669 | # Added in 2.285 670 | def nvmlErrorString(result): 671 | fn = _nvmlGetFunctionPointer("nvmlErrorString") 672 | fn.restype = c_char_p # otherwise return is an int 673 | ret = fn(result) 674 | return ret 675 | 676 | # Added in 2.285 677 | def nvmlSystemGetNVMLVersion(): 678 | c_version = create_string_buffer(NVML_SYSTEM_NVML_VERSION_BUFFER_SIZE) 679 | fn = _nvmlGetFunctionPointer("nvmlSystemGetNVMLVersion") 680 | ret = fn(c_version, c_uint(NVML_SYSTEM_NVML_VERSION_BUFFER_SIZE)) 681 | _nvmlCheckReturn(ret) 682 | return c_version.value 683 | 684 | # Added in 2.285 685 | def nvmlSystemGetProcessName(pid): 686 | c_name = create_string_buffer(1024) 687 | fn = _nvmlGetFunctionPointer("nvmlSystemGetProcessName") 688 | ret = fn(c_uint(pid), c_name, c_uint(1024)) 689 | _nvmlCheckReturn(ret) 690 | return c_name.value 691 | 692 | def nvmlSystemGetDriverVersion(): 693 | c_version = create_string_buffer(NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE) 694 | fn = _nvmlGetFunctionPointer("nvmlSystemGetDriverVersion") 695 | ret = fn(c_version, c_uint(NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE)) 696 | _nvmlCheckReturn(ret) 697 | return c_version.value 698 | 699 | # Added in 2.285 700 | def nvmlSystemGetHicVersion(): 701 | c_count = c_uint(0) 702 | hics = None 703 | fn = _nvmlGetFunctionPointer("nvmlSystemGetHicVersion") 704 | 705 | # get the count 706 | ret = fn(byref(c_count), None) 707 | 708 | # this should only fail with insufficient size 709 | if ((ret != NVML_SUCCESS) and 710 | (ret != NVML_ERROR_INSUFFICIENT_SIZE)): 711 | raise NVMLError(ret) 712 | 713 | # if there are no hics 714 | if (c_count.value == 0): 715 | return [] 716 | 717 | hic_array = c_nvmlHwbcEntry_t * c_count.value 718 | hics = hic_array() 719 | ret = fn(byref(c_count), hics) 720 | _nvmlCheckReturn(ret) 721 | return hics 722 | 723 | ## Unit get functions 724 | def nvmlUnitGetCount(): 725 | c_count = c_uint() 726 | fn = _nvmlGetFunctionPointer("nvmlUnitGetCount") 727 | ret = fn(byref(c_count)) 728 | _nvmlCheckReturn(ret) 729 | return c_count.value 730 | 731 | def nvmlUnitGetHandleByIndex(index): 732 | c_index = c_uint(index) 733 | unit = c_nvmlUnit_t() 734 | fn = _nvmlGetFunctionPointer("nvmlUnitGetHandleByIndex") 735 | ret = fn(c_index, byref(unit)) 736 | _nvmlCheckReturn(ret) 737 | return unit 738 | 739 | def nvmlUnitGetUnitInfo(unit): 740 | c_info = c_nvmlUnitInfo_t() 741 | fn = _nvmlGetFunctionPointer("nvmlUnitGetUnitInfo") 742 | ret = fn(unit, byref(c_info)) 743 | _nvmlCheckReturn(ret) 744 | return c_info 745 | 746 | def nvmlUnitGetLedState(unit): 747 | c_state = c_nvmlLedState_t() 748 | fn = _nvmlGetFunctionPointer("nvmlUnitGetLedState") 749 | ret = fn(unit, byref(c_state)) 750 | _nvmlCheckReturn(ret) 751 | return c_state 752 | 753 | def nvmlUnitGetPsuInfo(unit): 754 | c_info = c_nvmlPSUInfo_t() 755 | fn = _nvmlGetFunctionPointer("nvmlUnitGetPsuInfo") 756 | ret = fn(unit, byref(c_info)) 757 | _nvmlCheckReturn(ret) 758 | return c_info 759 | 760 | def nvmlUnitGetTemperature(unit, type): 761 | c_temp = c_uint() 762 | fn = _nvmlGetFunctionPointer("nvmlUnitGetTemperature") 763 | ret = fn(unit, c_uint(type), byref(c_temp)) 764 | _nvmlCheckReturn(ret) 765 | return c_temp.value 766 | 767 | def nvmlUnitGetFanSpeedInfo(unit): 768 | c_speeds = c_nvmlUnitFanSpeeds_t() 769 | fn = _nvmlGetFunctionPointer("nvmlUnitGetFanSpeedInfo") 770 | ret = fn(unit, byref(c_speeds)) 771 | _nvmlCheckReturn(ret) 772 | return c_speeds 773 | 774 | # added to API 775 | def nvmlUnitGetDeviceCount(unit): 776 | c_count = c_uint(0) 777 | # query the unit to determine device count 778 | fn = _nvmlGetFunctionPointer("nvmlUnitGetDevices") 779 | ret = fn(unit, byref(c_count), None) 780 | if (ret == NVML_ERROR_INSUFFICIENT_SIZE): 781 | ret = NVML_SUCCESS 782 | _nvmlCheckReturn(ret) 783 | return c_count.value 784 | 785 | def nvmlUnitGetDevices(unit): 786 | c_count = c_uint(nvmlUnitGetDeviceCount(unit)) 787 | device_array = c_nvmlDevice_t * c_count.value 788 | c_devices = device_array() 789 | fn = _nvmlGetFunctionPointer("nvmlUnitGetDevices") 790 | ret = fn(unit, byref(c_count), c_devices) 791 | _nvmlCheckReturn(ret) 792 | return c_devices 793 | 794 | ## Device get functions 795 | def nvmlDeviceGetCount(): 796 | c_count = c_uint() 797 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetCount_v2") 798 | ret = fn(byref(c_count)) 799 | _nvmlCheckReturn(ret) 800 | return c_count.value 801 | 802 | def nvmlDeviceGetHandleByIndex(index): 803 | c_index = c_uint(index) 804 | device = c_nvmlDevice_t() 805 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetHandleByIndex_v2") 806 | ret = fn(c_index, byref(device)) 807 | _nvmlCheckReturn(ret) 808 | return device 809 | 810 | def nvmlDeviceGetHandleBySerial(serial): 811 | c_serial = c_char_p(serial) 812 | device = c_nvmlDevice_t() 813 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetHandleBySerial") 814 | ret = fn(c_serial, byref(device)) 815 | _nvmlCheckReturn(ret) 816 | return device 817 | 818 | def nvmlDeviceGetHandleByUUID(uuid): 819 | c_uuid = c_char_p(uuid) 820 | device = c_nvmlDevice_t() 821 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetHandleByUUID") 822 | ret = fn(c_uuid, byref(device)) 823 | _nvmlCheckReturn(ret) 824 | return device 825 | 826 | def nvmlDeviceGetHandleByPciBusId(pciBusId): 827 | c_busId = c_char_p(pciBusId) 828 | device = c_nvmlDevice_t() 829 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetHandleByPciBusId_v2") 830 | ret = fn(c_busId, byref(device)) 831 | _nvmlCheckReturn(ret) 832 | return device 833 | 834 | def nvmlDeviceGetName(handle): 835 | c_name = create_string_buffer(NVML_DEVICE_NAME_BUFFER_SIZE) 836 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetName") 837 | ret = fn(handle, c_name, c_uint(NVML_DEVICE_NAME_BUFFER_SIZE)) 838 | _nvmlCheckReturn(ret) 839 | return c_name.value 840 | 841 | def nvmlDeviceGetBoardId(handle): 842 | c_id = c_uint(); 843 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetBoardId") 844 | ret = fn(handle, byref(c_id)) 845 | _nvmlCheckReturn(ret) 846 | return c_id.value 847 | 848 | def nvmlDeviceGetMultiGpuBoard(handle): 849 | c_multiGpu = c_uint(); 850 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetMultiGpuBoard") 851 | ret = fn(handle, byref(c_multiGpu)) 852 | _nvmlCheckReturn(ret) 853 | return c_multiGpu.value 854 | 855 | def nvmlDeviceGetBrand(handle): 856 | c_type = _nvmlBrandType_t() 857 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetBrand") 858 | ret = fn(handle, byref(c_type)) 859 | _nvmlCheckReturn(ret) 860 | return c_type.value 861 | 862 | def nvmlDeviceGetSerial(handle): 863 | c_serial = create_string_buffer(NVML_DEVICE_SERIAL_BUFFER_SIZE) 864 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetSerial") 865 | ret = fn(handle, c_serial, c_uint(NVML_DEVICE_SERIAL_BUFFER_SIZE)) 866 | _nvmlCheckReturn(ret) 867 | return c_serial.value 868 | 869 | def nvmlDeviceGetCpuAffinity(handle, cpuSetSize): 870 | affinity_array = c_ulonglong * cpuSetSize 871 | c_affinity = affinity_array() 872 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetCpuAffinity") 873 | ret = fn(handle, cpuSetSize, byref(c_affinity)) 874 | _nvmlCheckReturn(ret) 875 | return c_affinity 876 | 877 | def nvmlDeviceSetCpuAffinity(handle): 878 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetCpuAffinity") 879 | ret = fn(handle) 880 | _nvmlCheckReturn(ret) 881 | return None 882 | 883 | def nvmlDeviceClearCpuAffinity(handle): 884 | fn = _nvmlGetFunctionPointer("nvmlDeviceClearCpuAffinity") 885 | ret = fn(handle) 886 | _nvmlCheckReturn(ret) 887 | return None 888 | 889 | def nvmlDeviceGetMinorNumber(handle): 890 | c_minor_number = c_uint() 891 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetMinorNumber") 892 | ret = fn(handle, byref(c_minor_number)) 893 | _nvmlCheckReturn(ret) 894 | return c_minor_number.value 895 | 896 | def nvmlDeviceGetUUID(handle): 897 | c_uuid = create_string_buffer(NVML_DEVICE_UUID_BUFFER_SIZE) 898 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetUUID") 899 | ret = fn(handle, c_uuid, c_uint(NVML_DEVICE_UUID_BUFFER_SIZE)) 900 | _nvmlCheckReturn(ret) 901 | return c_uuid.value 902 | 903 | def nvmlDeviceGetInforomVersion(handle, infoRomObject): 904 | c_version = create_string_buffer(NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE) 905 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetInforomVersion") 906 | ret = fn(handle, _nvmlInforomObject_t(infoRomObject), 907 | c_version, c_uint(NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE)) 908 | _nvmlCheckReturn(ret) 909 | return c_version.value 910 | 911 | # Added in 4.304 912 | def nvmlDeviceGetInforomImageVersion(handle): 913 | c_version = create_string_buffer(NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE) 914 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetInforomImageVersion") 915 | ret = fn(handle, c_version, c_uint(NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE)) 916 | _nvmlCheckReturn(ret) 917 | return c_version.value 918 | 919 | # Added in 4.304 920 | def nvmlDeviceGetInforomConfigurationChecksum(handle): 921 | c_checksum = c_uint() 922 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetInforomConfigurationChecksum") 923 | ret = fn(handle, byref(c_checksum)) 924 | _nvmlCheckReturn(ret) 925 | return c_checksum.value 926 | 927 | # Added in 4.304 928 | def nvmlDeviceValidateInforom(handle): 929 | fn = _nvmlGetFunctionPointer("nvmlDeviceValidateInforom") 930 | ret = fn(handle) 931 | _nvmlCheckReturn(ret) 932 | return None 933 | 934 | def nvmlDeviceGetDisplayMode(handle): 935 | c_mode = _nvmlEnableState_t() 936 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetDisplayMode") 937 | ret = fn(handle, byref(c_mode)) 938 | _nvmlCheckReturn(ret) 939 | return c_mode.value 940 | 941 | def nvmlDeviceGetDisplayActive(handle): 942 | c_mode = _nvmlEnableState_t() 943 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetDisplayActive") 944 | ret = fn(handle, byref(c_mode)) 945 | _nvmlCheckReturn(ret) 946 | return c_mode.value 947 | 948 | 949 | def nvmlDeviceGetPersistenceMode(handle): 950 | c_state = _nvmlEnableState_t() 951 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPersistenceMode") 952 | ret = fn(handle, byref(c_state)) 953 | _nvmlCheckReturn(ret) 954 | return c_state.value 955 | 956 | def nvmlDeviceGetPciInfo(handle): 957 | c_info = nvmlPciInfo_t() 958 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPciInfo_v2") 959 | ret = fn(handle, byref(c_info)) 960 | _nvmlCheckReturn(ret) 961 | return c_info 962 | 963 | def nvmlDeviceGetClockInfo(handle, type): 964 | c_clock = c_uint() 965 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetClockInfo") 966 | ret = fn(handle, _nvmlClockType_t(type), byref(c_clock)) 967 | _nvmlCheckReturn(ret) 968 | return c_clock.value 969 | 970 | # Added in 2.285 971 | def nvmlDeviceGetMaxClockInfo(handle, type): 972 | c_clock = c_uint() 973 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetMaxClockInfo") 974 | ret = fn(handle, _nvmlClockType_t(type), byref(c_clock)) 975 | _nvmlCheckReturn(ret) 976 | return c_clock.value 977 | 978 | # Added in 4.304 979 | def nvmlDeviceGetApplicationsClock(handle, type): 980 | c_clock = c_uint() 981 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetApplicationsClock") 982 | ret = fn(handle, _nvmlClockType_t(type), byref(c_clock)) 983 | _nvmlCheckReturn(ret) 984 | return c_clock.value 985 | 986 | # Added in 5.319 987 | def nvmlDeviceGetDefaultApplicationsClock(handle, type): 988 | c_clock = c_uint() 989 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetDefaultApplicationsClock") 990 | ret = fn(handle, _nvmlClockType_t(type), byref(c_clock)) 991 | _nvmlCheckReturn(ret) 992 | return c_clock.value 993 | 994 | # Added in 4.304 995 | def nvmlDeviceGetSupportedMemoryClocks(handle): 996 | # first call to get the size 997 | c_count = c_uint(0) 998 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetSupportedMemoryClocks") 999 | ret = fn(handle, byref(c_count), None) 1000 | 1001 | if (ret == NVML_SUCCESS): 1002 | # special case, no clocks 1003 | return [] 1004 | elif (ret == NVML_ERROR_INSUFFICIENT_SIZE): 1005 | # typical case 1006 | clocks_array = c_uint * c_count.value 1007 | c_clocks = clocks_array() 1008 | 1009 | # make the call again 1010 | ret = fn(handle, byref(c_count), c_clocks) 1011 | _nvmlCheckReturn(ret) 1012 | 1013 | procs = [] 1014 | for i in range(c_count.value): 1015 | procs.append(c_clocks[i]) 1016 | 1017 | return procs 1018 | else: 1019 | # error case 1020 | raise NVMLError(ret) 1021 | 1022 | # Added in 4.304 1023 | def nvmlDeviceGetSupportedGraphicsClocks(handle, memoryClockMHz): 1024 | # first call to get the size 1025 | c_count = c_uint(0) 1026 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetSupportedGraphicsClocks") 1027 | ret = fn(handle, c_uint(memoryClockMHz), byref(c_count), None) 1028 | 1029 | if (ret == NVML_SUCCESS): 1030 | # special case, no clocks 1031 | return [] 1032 | elif (ret == NVML_ERROR_INSUFFICIENT_SIZE): 1033 | # typical case 1034 | clocks_array = c_uint * c_count.value 1035 | c_clocks = clocks_array() 1036 | 1037 | # make the call again 1038 | ret = fn(handle, c_uint(memoryClockMHz), byref(c_count), c_clocks) 1039 | _nvmlCheckReturn(ret) 1040 | 1041 | procs = [] 1042 | for i in range(c_count.value): 1043 | procs.append(c_clocks[i]) 1044 | 1045 | return procs 1046 | else: 1047 | # error case 1048 | raise NVMLError(ret) 1049 | 1050 | def nvmlDeviceGetFanSpeed(handle): 1051 | c_speed = c_uint() 1052 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetFanSpeed") 1053 | ret = fn(handle, byref(c_speed)) 1054 | _nvmlCheckReturn(ret) 1055 | return c_speed.value 1056 | 1057 | def nvmlDeviceGetTemperature(handle, sensor): 1058 | c_temp = c_uint() 1059 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetTemperature") 1060 | ret = fn(handle, _nvmlTemperatureSensors_t(sensor), byref(c_temp)) 1061 | _nvmlCheckReturn(ret) 1062 | return c_temp.value 1063 | 1064 | def nvmlDeviceGetTemperatureThreshold(handle, threshold): 1065 | c_temp = c_uint() 1066 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetTemperatureThreshold") 1067 | ret = fn(handle, _nvmlTemperatureThresholds_t(threshold), byref(c_temp)) 1068 | _nvmlCheckReturn(ret) 1069 | return c_temp.value 1070 | 1071 | # DEPRECATED use nvmlDeviceGetPerformanceState 1072 | def nvmlDeviceGetPowerState(handle): 1073 | c_pstate = _nvmlPstates_t() 1074 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPowerState") 1075 | ret = fn(handle, byref(c_pstate)) 1076 | _nvmlCheckReturn(ret) 1077 | return c_pstate.value 1078 | 1079 | def nvmlDeviceGetPerformanceState(handle): 1080 | c_pstate = _nvmlPstates_t() 1081 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPerformanceState") 1082 | ret = fn(handle, byref(c_pstate)) 1083 | _nvmlCheckReturn(ret) 1084 | return c_pstate.value 1085 | 1086 | def nvmlDeviceGetPowerManagementMode(handle): 1087 | c_pcapMode = _nvmlEnableState_t() 1088 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPowerManagementMode") 1089 | ret = fn(handle, byref(c_pcapMode)) 1090 | _nvmlCheckReturn(ret) 1091 | return c_pcapMode.value 1092 | 1093 | def nvmlDeviceGetPowerManagementLimit(handle): 1094 | c_limit = c_uint() 1095 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPowerManagementLimit") 1096 | ret = fn(handle, byref(c_limit)) 1097 | _nvmlCheckReturn(ret) 1098 | return c_limit.value 1099 | 1100 | # Added in 4.304 1101 | def nvmlDeviceGetPowerManagementLimitConstraints(handle): 1102 | c_minLimit = c_uint() 1103 | c_maxLimit = c_uint() 1104 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPowerManagementLimitConstraints") 1105 | ret = fn(handle, byref(c_minLimit), byref(c_maxLimit)) 1106 | _nvmlCheckReturn(ret) 1107 | return [c_minLimit.value, c_maxLimit.value] 1108 | 1109 | # Added in 4.304 1110 | def nvmlDeviceGetPowerManagementDefaultLimit(handle): 1111 | c_limit = c_uint() 1112 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPowerManagementDefaultLimit") 1113 | ret = fn(handle, byref(c_limit)) 1114 | _nvmlCheckReturn(ret) 1115 | return c_limit.value 1116 | 1117 | 1118 | # Added in 331 1119 | def nvmlDeviceGetEnforcedPowerLimit(handle): 1120 | c_limit = c_uint() 1121 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetEnforcedPowerLimit") 1122 | ret = fn(handle, byref(c_limit)) 1123 | _nvmlCheckReturn(ret) 1124 | return c_limit.value 1125 | 1126 | def nvmlDeviceGetPowerUsage(handle): 1127 | c_watts = c_uint() 1128 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPowerUsage") 1129 | ret = fn(handle, byref(c_watts)) 1130 | _nvmlCheckReturn(ret) 1131 | return c_watts.value 1132 | 1133 | # Added in 4.304 1134 | def nvmlDeviceGetGpuOperationMode(handle): 1135 | c_currState = _nvmlGpuOperationMode_t() 1136 | c_pendingState = _nvmlGpuOperationMode_t() 1137 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetGpuOperationMode") 1138 | ret = fn(handle, byref(c_currState), byref(c_pendingState)) 1139 | _nvmlCheckReturn(ret) 1140 | return [c_currState.value, c_pendingState.value] 1141 | 1142 | # Added in 4.304 1143 | def nvmlDeviceGetCurrentGpuOperationMode(handle): 1144 | return nvmlDeviceGetGpuOperationMode(handle)[0] 1145 | 1146 | # Added in 4.304 1147 | def nvmlDeviceGetPendingGpuOperationMode(handle): 1148 | return nvmlDeviceGetGpuOperationMode(handle)[1] 1149 | 1150 | def nvmlDeviceGetMemoryInfo(handle): 1151 | c_memory = c_nvmlMemory_t() 1152 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo") 1153 | ret = fn(handle, byref(c_memory)) 1154 | _nvmlCheckReturn(ret) 1155 | return c_memory 1156 | 1157 | def nvmlDeviceGetBAR1MemoryInfo(handle): 1158 | c_bar1_memory = c_nvmlBAR1Memory_t() 1159 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetBAR1MemoryInfo") 1160 | ret = fn(handle, byref(c_bar1_memory)) 1161 | _nvmlCheckReturn(ret) 1162 | return c_bar1_memory 1163 | 1164 | def nvmlDeviceGetComputeMode(handle): 1165 | c_mode = _nvmlComputeMode_t() 1166 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeMode") 1167 | ret = fn(handle, byref(c_mode)) 1168 | _nvmlCheckReturn(ret) 1169 | return c_mode.value 1170 | 1171 | def nvmlDeviceGetEccMode(handle): 1172 | c_currState = _nvmlEnableState_t() 1173 | c_pendingState = _nvmlEnableState_t() 1174 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetEccMode") 1175 | ret = fn(handle, byref(c_currState), byref(c_pendingState)) 1176 | _nvmlCheckReturn(ret) 1177 | return [c_currState.value, c_pendingState.value] 1178 | 1179 | # added to API 1180 | def nvmlDeviceGetCurrentEccMode(handle): 1181 | return nvmlDeviceGetEccMode(handle)[0] 1182 | 1183 | # added to API 1184 | def nvmlDeviceGetPendingEccMode(handle): 1185 | return nvmlDeviceGetEccMode(handle)[1] 1186 | 1187 | def nvmlDeviceGetTotalEccErrors(handle, errorType, counterType): 1188 | c_count = c_ulonglong() 1189 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetTotalEccErrors") 1190 | ret = fn(handle, _nvmlMemoryErrorType_t(errorType), 1191 | _nvmlEccCounterType_t(counterType), byref(c_count)) 1192 | _nvmlCheckReturn(ret) 1193 | return c_count.value 1194 | 1195 | # This is deprecated, instead use nvmlDeviceGetMemoryErrorCounter 1196 | def nvmlDeviceGetDetailedEccErrors(handle, errorType, counterType): 1197 | c_counts = c_nvmlEccErrorCounts_t() 1198 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetDetailedEccErrors") 1199 | ret = fn(handle, _nvmlMemoryErrorType_t(errorType), 1200 | _nvmlEccCounterType_t(counterType), byref(c_counts)) 1201 | _nvmlCheckReturn(ret) 1202 | return c_counts 1203 | 1204 | # Added in 4.304 1205 | def nvmlDeviceGetMemoryErrorCounter(handle, errorType, counterType, locationType): 1206 | c_count = c_ulonglong() 1207 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryErrorCounter") 1208 | ret = fn(handle, 1209 | _nvmlMemoryErrorType_t(errorType), 1210 | _nvmlEccCounterType_t(counterType), 1211 | _nvmlMemoryLocation_t(locationType), 1212 | byref(c_count)) 1213 | _nvmlCheckReturn(ret) 1214 | return c_count.value 1215 | 1216 | def nvmlDeviceGetUtilizationRates(handle): 1217 | c_util = c_nvmlUtilization_t() 1218 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetUtilizationRates") 1219 | ret = fn(handle, byref(c_util)) 1220 | _nvmlCheckReturn(ret) 1221 | return c_util 1222 | 1223 | def nvmlDeviceGetEncoderUtilization(handle): 1224 | c_util = c_uint() 1225 | c_samplingPeriod = c_uint() 1226 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetEncoderUtilization") 1227 | ret = fn(handle, byref(c_util), byref(c_samplingPeriod)) 1228 | _nvmlCheckReturn(ret) 1229 | return [c_util.value, c_samplingPeriod.value] 1230 | 1231 | def nvmlDeviceGetDecoderUtilization(handle): 1232 | c_util = c_uint() 1233 | c_samplingPeriod = c_uint() 1234 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetDecoderUtilization") 1235 | ret = fn(handle, byref(c_util), byref(c_samplingPeriod)) 1236 | _nvmlCheckReturn(ret) 1237 | return [c_util.value, c_samplingPeriod.value] 1238 | 1239 | def nvmlDeviceGetPcieReplayCounter(handle): 1240 | c_replay = c_uint() 1241 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPcieReplayCounter") 1242 | ret = fn(handle, byref(c_replay)) 1243 | _nvmlCheckReturn(ret) 1244 | return c_replay.value 1245 | 1246 | def nvmlDeviceGetDriverModel(handle): 1247 | c_currModel = _nvmlDriverModel_t() 1248 | c_pendingModel = _nvmlDriverModel_t() 1249 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetDriverModel") 1250 | ret = fn(handle, byref(c_currModel), byref(c_pendingModel)) 1251 | _nvmlCheckReturn(ret) 1252 | return [c_currModel.value, c_pendingModel.value] 1253 | 1254 | # added to API 1255 | def nvmlDeviceGetCurrentDriverModel(handle): 1256 | return nvmlDeviceGetDriverModel(handle)[0] 1257 | 1258 | # added to API 1259 | def nvmlDeviceGetPendingDriverModel(handle): 1260 | return nvmlDeviceGetDriverModel(handle)[1] 1261 | 1262 | # Added in 2.285 1263 | def nvmlDeviceGetVbiosVersion(handle): 1264 | c_version = create_string_buffer(NVML_DEVICE_VBIOS_VERSION_BUFFER_SIZE) 1265 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetVbiosVersion") 1266 | ret = fn(handle, c_version, c_uint(NVML_DEVICE_VBIOS_VERSION_BUFFER_SIZE)) 1267 | _nvmlCheckReturn(ret) 1268 | return c_version.value 1269 | 1270 | # Added in 2.285 1271 | def nvmlDeviceGetComputeRunningProcesses(handle): 1272 | # first call to get the size 1273 | c_count = c_uint(0) 1274 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses") 1275 | ret = fn(handle, byref(c_count), None) 1276 | 1277 | if (ret == NVML_SUCCESS): 1278 | # special case, no running processes 1279 | return [] 1280 | elif (ret == NVML_ERROR_INSUFFICIENT_SIZE): 1281 | # typical case 1282 | # oversize the array incase more processes are created 1283 | c_count.value = c_count.value * 2 + 5 1284 | proc_array = c_nvmlProcessInfo_t * c_count.value 1285 | c_procs = proc_array() 1286 | 1287 | # make the call again 1288 | ret = fn(handle, byref(c_count), c_procs) 1289 | _nvmlCheckReturn(ret) 1290 | 1291 | procs = [] 1292 | for i in range(c_count.value): 1293 | # use an alternative struct for this object 1294 | obj = nvmlStructToFriendlyObject(c_procs[i]) 1295 | if (obj.usedGpuMemory == NVML_VALUE_NOT_AVAILABLE_ulonglong.value): 1296 | # special case for WDDM on Windows, see comment above 1297 | obj.usedGpuMemory = None 1298 | procs.append(obj) 1299 | 1300 | return procs 1301 | else: 1302 | # error case 1303 | raise NVMLError(ret) 1304 | 1305 | def nvmlDeviceGetGraphicsRunningProcesses(handle): 1306 | # first call to get the size 1307 | c_count = c_uint(0) 1308 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetGraphicsRunningProcesses") 1309 | ret = fn(handle, byref(c_count), None) 1310 | 1311 | if (ret == NVML_SUCCESS): 1312 | # special case, no running processes 1313 | return [] 1314 | elif (ret == NVML_ERROR_INSUFFICIENT_SIZE): 1315 | # typical case 1316 | # oversize the array incase more processes are created 1317 | c_count.value = c_count.value * 2 + 5 1318 | proc_array = c_nvmlProcessInfo_t * c_count.value 1319 | c_procs = proc_array() 1320 | 1321 | # make the call again 1322 | ret = fn(handle, byref(c_count), c_procs) 1323 | _nvmlCheckReturn(ret) 1324 | 1325 | procs = [] 1326 | for i in range(c_count.value): 1327 | # use an alternative struct for this object 1328 | obj = nvmlStructToFriendlyObject(c_procs[i]) 1329 | if (obj.usedGpuMemory == NVML_VALUE_NOT_AVAILABLE_ulonglong.value): 1330 | # special case for WDDM on Windows, see comment above 1331 | obj.usedGpuMemory = None 1332 | procs.append(obj) 1333 | 1334 | return procs 1335 | else: 1336 | # error case 1337 | raise NVMLError(ret) 1338 | 1339 | def nvmlDeviceGetAutoBoostedClocksEnabled(handle): 1340 | c_isEnabled = _nvmlEnableState_t() 1341 | c_defaultIsEnabled = _nvmlEnableState_t() 1342 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetAutoBoostedClocksEnabled") 1343 | ret = fn(handle, byref(c_isEnabled), byref(c_defaultIsEnabled)) 1344 | _nvmlCheckReturn(ret) 1345 | return [c_isEnabled.value, c_defaultIsEnabled.value] 1346 | #Throws NVML_ERROR_NOT_SUPPORTED if hardware doesn't support setting auto boosted clocks 1347 | 1348 | ## Set functions 1349 | def nvmlUnitSetLedState(unit, color): 1350 | fn = _nvmlGetFunctionPointer("nvmlUnitSetLedState") 1351 | ret = fn(unit, _nvmlLedColor_t(color)) 1352 | _nvmlCheckReturn(ret) 1353 | return None 1354 | 1355 | def nvmlDeviceSetPersistenceMode(handle, mode): 1356 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetPersistenceMode") 1357 | ret = fn(handle, _nvmlEnableState_t(mode)) 1358 | _nvmlCheckReturn(ret) 1359 | return None 1360 | 1361 | def nvmlDeviceSetComputeMode(handle, mode): 1362 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetComputeMode") 1363 | ret = fn(handle, _nvmlComputeMode_t(mode)) 1364 | _nvmlCheckReturn(ret) 1365 | return None 1366 | 1367 | def nvmlDeviceSetEccMode(handle, mode): 1368 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetEccMode") 1369 | ret = fn(handle, _nvmlEnableState_t(mode)) 1370 | _nvmlCheckReturn(ret) 1371 | return None 1372 | 1373 | def nvmlDeviceClearEccErrorCounts(handle, counterType): 1374 | fn = _nvmlGetFunctionPointer("nvmlDeviceClearEccErrorCounts") 1375 | ret = fn(handle, _nvmlEccCounterType_t(counterType)) 1376 | _nvmlCheckReturn(ret) 1377 | return None 1378 | 1379 | def nvmlDeviceSetDriverModel(handle, model): 1380 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetDriverModel") 1381 | ret = fn(handle, _nvmlDriverModel_t(model)) 1382 | _nvmlCheckReturn(ret) 1383 | return None 1384 | 1385 | def nvmlDeviceSetAutoBoostedClocksEnabled(handle, enabled): 1386 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetAutoBoostedClocksEnabled") 1387 | ret = fn(handle, _nvmlEnableState_t(enabled)) 1388 | _nvmlCheckReturn(ret) 1389 | return None 1390 | #Throws NVML_ERROR_NOT_SUPPORTED if hardware doesn't support setting auto boosted clocks 1391 | 1392 | def nvmlDeviceSetDefaultAutoBoostedClocksEnabled(handle, enabled, flags): 1393 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetDefaultAutoBoostedClocksEnabled") 1394 | ret = fn(handle, _nvmlEnableState_t(enabled), c_uint(flags)) 1395 | _nvmlCheckReturn(ret) 1396 | return None 1397 | #Throws NVML_ERROR_NOT_SUPPORTED if hardware doesn't support setting auto boosted clocks 1398 | 1399 | # Added in 4.304 1400 | def nvmlDeviceSetApplicationsClocks(handle, maxMemClockMHz, maxGraphicsClockMHz): 1401 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetApplicationsClocks") 1402 | ret = fn(handle, c_uint(maxMemClockMHz), c_uint(maxGraphicsClockMHz)) 1403 | _nvmlCheckReturn(ret) 1404 | return None 1405 | 1406 | # Added in 4.304 1407 | def nvmlDeviceResetApplicationsClocks(handle): 1408 | fn = _nvmlGetFunctionPointer("nvmlDeviceResetApplicationsClocks") 1409 | ret = fn(handle) 1410 | _nvmlCheckReturn(ret) 1411 | return None 1412 | 1413 | # Added in 4.304 1414 | def nvmlDeviceSetPowerManagementLimit(handle, limit): 1415 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetPowerManagementLimit") 1416 | ret = fn(handle, c_uint(limit)) 1417 | _nvmlCheckReturn(ret) 1418 | return None 1419 | 1420 | # Added in 4.304 1421 | def nvmlDeviceSetGpuOperationMode(handle, mode): 1422 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetGpuOperationMode") 1423 | ret = fn(handle, _nvmlGpuOperationMode_t(mode)) 1424 | _nvmlCheckReturn(ret) 1425 | return None 1426 | 1427 | # Added in 2.285 1428 | def nvmlEventSetCreate(): 1429 | fn = _nvmlGetFunctionPointer("nvmlEventSetCreate") 1430 | eventSet = c_nvmlEventSet_t() 1431 | ret = fn(byref(eventSet)) 1432 | _nvmlCheckReturn(ret) 1433 | return eventSet 1434 | 1435 | # Added in 2.285 1436 | def nvmlDeviceRegisterEvents(handle, eventTypes, eventSet): 1437 | fn = _nvmlGetFunctionPointer("nvmlDeviceRegisterEvents") 1438 | ret = fn(handle, c_ulonglong(eventTypes), eventSet) 1439 | _nvmlCheckReturn(ret) 1440 | return None 1441 | 1442 | # Added in 2.285 1443 | def nvmlDeviceGetSupportedEventTypes(handle): 1444 | c_eventTypes = c_ulonglong() 1445 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetSupportedEventTypes") 1446 | ret = fn(handle, byref(c_eventTypes)) 1447 | _nvmlCheckReturn(ret) 1448 | return c_eventTypes.value 1449 | 1450 | # Added in 2.285 1451 | # raises NVML_ERROR_TIMEOUT exception on timeout 1452 | def nvmlEventSetWait(eventSet, timeoutms): 1453 | fn = _nvmlGetFunctionPointer("nvmlEventSetWait") 1454 | data = c_nvmlEventData_t() 1455 | ret = fn(eventSet, byref(data), c_uint(timeoutms)) 1456 | _nvmlCheckReturn(ret) 1457 | return data 1458 | 1459 | # Added in 2.285 1460 | def nvmlEventSetFree(eventSet): 1461 | fn = _nvmlGetFunctionPointer("nvmlEventSetFree") 1462 | ret = fn(eventSet) 1463 | _nvmlCheckReturn(ret) 1464 | return None 1465 | 1466 | # Added in 3.295 1467 | def nvmlDeviceOnSameBoard(handle1, handle2): 1468 | fn = _nvmlGetFunctionPointer("nvmlDeviceOnSameBoard") 1469 | onSameBoard = c_int() 1470 | ret = fn(handle1, handle2, byref(onSameBoard)) 1471 | _nvmlCheckReturn(ret) 1472 | return (onSameBoard.value != 0) 1473 | 1474 | # Added in 3.295 1475 | def nvmlDeviceGetCurrPcieLinkGeneration(handle): 1476 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetCurrPcieLinkGeneration") 1477 | gen = c_uint() 1478 | ret = fn(handle, byref(gen)) 1479 | _nvmlCheckReturn(ret) 1480 | return gen.value 1481 | 1482 | # Added in 3.295 1483 | def nvmlDeviceGetMaxPcieLinkGeneration(handle): 1484 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetMaxPcieLinkGeneration") 1485 | gen = c_uint() 1486 | ret = fn(handle, byref(gen)) 1487 | _nvmlCheckReturn(ret) 1488 | return gen.value 1489 | 1490 | # Added in 3.295 1491 | def nvmlDeviceGetCurrPcieLinkWidth(handle): 1492 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetCurrPcieLinkWidth") 1493 | width = c_uint() 1494 | ret = fn(handle, byref(width)) 1495 | _nvmlCheckReturn(ret) 1496 | return width.value 1497 | 1498 | # Added in 3.295 1499 | def nvmlDeviceGetMaxPcieLinkWidth(handle): 1500 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetMaxPcieLinkWidth") 1501 | width = c_uint() 1502 | ret = fn(handle, byref(width)) 1503 | _nvmlCheckReturn(ret) 1504 | return width.value 1505 | 1506 | # Added in 4.304 1507 | def nvmlDeviceGetSupportedClocksThrottleReasons(handle): 1508 | c_reasons= c_ulonglong() 1509 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetSupportedClocksThrottleReasons") 1510 | ret = fn(handle, byref(c_reasons)) 1511 | _nvmlCheckReturn(ret) 1512 | return c_reasons.value 1513 | 1514 | # Added in 4.304 1515 | def nvmlDeviceGetCurrentClocksThrottleReasons(handle): 1516 | c_reasons= c_ulonglong() 1517 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetCurrentClocksThrottleReasons") 1518 | ret = fn(handle, byref(c_reasons)) 1519 | _nvmlCheckReturn(ret) 1520 | return c_reasons.value 1521 | 1522 | # Added in 5.319 1523 | def nvmlDeviceGetIndex(handle): 1524 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetIndex") 1525 | c_index = c_uint() 1526 | ret = fn(handle, byref(c_index)) 1527 | _nvmlCheckReturn(ret) 1528 | return c_index.value 1529 | 1530 | # Added in 5.319 1531 | def nvmlDeviceGetAccountingMode(handle): 1532 | c_mode = _nvmlEnableState_t() 1533 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetAccountingMode") 1534 | ret = fn(handle, byref(c_mode)) 1535 | _nvmlCheckReturn(ret) 1536 | return c_mode.value 1537 | 1538 | def nvmlDeviceSetAccountingMode(handle, mode): 1539 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetAccountingMode") 1540 | ret = fn(handle, _nvmlEnableState_t(mode)) 1541 | _nvmlCheckReturn(ret) 1542 | return None 1543 | 1544 | def nvmlDeviceClearAccountingPids(handle): 1545 | fn = _nvmlGetFunctionPointer("nvmlDeviceClearAccountingPids") 1546 | ret = fn(handle) 1547 | _nvmlCheckReturn(ret) 1548 | return None 1549 | 1550 | def nvmlDeviceGetAccountingStats(handle, pid): 1551 | stats = c_nvmlAccountingStats_t() 1552 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetAccountingStats") 1553 | ret = fn(handle, c_uint(pid), byref(stats)) 1554 | _nvmlCheckReturn(ret) 1555 | if (stats.maxMemoryUsage == NVML_VALUE_NOT_AVAILABLE_ulonglong.value): 1556 | # special case for WDDM on Windows, see comment above 1557 | stats.maxMemoryUsage = None 1558 | return stats 1559 | 1560 | def nvmlDeviceGetAccountingPids(handle): 1561 | count = c_uint(nvmlDeviceGetAccountingBufferSize(handle)) 1562 | pids = (c_uint * count.value)() 1563 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetAccountingPids") 1564 | ret = fn(handle, byref(count), pids) 1565 | _nvmlCheckReturn(ret) 1566 | return map(int, pids[0:count.value]) 1567 | 1568 | def nvmlDeviceGetAccountingBufferSize(handle): 1569 | bufferSize = c_uint() 1570 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetAccountingBufferSize") 1571 | ret = fn(handle, byref(bufferSize)) 1572 | _nvmlCheckReturn(ret) 1573 | return int(bufferSize.value) 1574 | 1575 | def nvmlDeviceGetRetiredPages(device, sourceFilter): 1576 | c_source = _nvmlPageRetirementCause_t(sourceFilter) 1577 | c_count = c_uint(0) 1578 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetRetiredPages") 1579 | 1580 | # First call will get the size 1581 | ret = fn(device, c_source, byref(c_count), None) 1582 | 1583 | # this should only fail with insufficient size 1584 | if ((ret != NVML_SUCCESS) and 1585 | (ret != NVML_ERROR_INSUFFICIENT_SIZE)): 1586 | raise NVMLError(ret) 1587 | 1588 | # call again with a buffer 1589 | # oversize the array for the rare cases where additional pages 1590 | # are retired between NVML calls 1591 | c_count.value = c_count.value * 2 + 5 1592 | page_array = c_ulonglong * c_count.value 1593 | c_pages = page_array() 1594 | ret = fn(device, c_source, byref(c_count), c_pages) 1595 | _nvmlCheckReturn(ret) 1596 | return map(int, c_pages[0:c_count.value]) 1597 | 1598 | def nvmlDeviceGetRetiredPagesPendingStatus(device): 1599 | c_pending = _nvmlEnableState_t() 1600 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetRetiredPagesPendingStatus") 1601 | ret = fn(device, byref(c_pending)) 1602 | _nvmlCheckReturn(ret) 1603 | return int(c_pending.value) 1604 | 1605 | def nvmlDeviceGetAPIRestriction(device, apiType): 1606 | c_permission = _nvmlEnableState_t() 1607 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetAPIRestriction") 1608 | ret = fn(device, _nvmlRestrictedAPI_t(apiType), byref(c_permission)) 1609 | _nvmlCheckReturn(ret) 1610 | return int(c_permission.value) 1611 | 1612 | def nvmlDeviceSetAPIRestriction(handle, apiType, isRestricted): 1613 | fn = _nvmlGetFunctionPointer("nvmlDeviceSetAPIRestriction") 1614 | ret = fn(handle, _nvmlRestrictedAPI_t(apiType), _nvmlEnableState_t(isRestricted)) 1615 | _nvmlCheckReturn(ret) 1616 | return None 1617 | 1618 | def nvmlDeviceGetBridgeChipInfo(handle): 1619 | bridgeHierarchy = c_nvmlBridgeChipHierarchy_t() 1620 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetBridgeChipInfo") 1621 | ret = fn(handle, byref(bridgeHierarchy)) 1622 | _nvmlCheckReturn(ret) 1623 | return bridgeHierarchy 1624 | 1625 | def nvmlDeviceGetSamples(device, sampling_type, timeStamp): 1626 | c_sampling_type = _nvmlSamplingType_t(sampling_type) 1627 | c_time_stamp = c_ulonglong(timeStamp) 1628 | c_sample_count = c_uint(0) 1629 | c_sample_value_type = _nvmlValueType_t() 1630 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetSamples") 1631 | 1632 | ## First Call gets the size 1633 | ret = fn(device, c_sampling_type, c_time_stamp, byref(c_sample_value_type), byref(c_sample_count), None) 1634 | 1635 | # Stop if this fails 1636 | if (ret != NVML_SUCCESS): 1637 | raise NVMLError(ret) 1638 | 1639 | sampleArray = c_sample_count.value * c_nvmlSample_t 1640 | c_samples = sampleArray() 1641 | ret = fn(device, c_sampling_type, c_time_stamp, byref(c_sample_value_type), byref(c_sample_count), c_samples) 1642 | _nvmlCheckReturn(ret) 1643 | return (c_sample_value_type.value, c_samples[0:c_sample_count.value]) 1644 | 1645 | def nvmlDeviceGetViolationStatus(device, perfPolicyType): 1646 | c_perfPolicy_type = _nvmlPerfPolicyType_t(perfPolicyType) 1647 | c_violTime = c_nvmlViolationTime_t() 1648 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetViolationStatus") 1649 | 1650 | ## Invoke the method to get violation time 1651 | ret = fn(device, c_perfPolicy_type, byref(c_violTime)) 1652 | _nvmlCheckReturn(ret) 1653 | return c_violTime 1654 | 1655 | def nvmlDeviceGetPcieThroughput(device, counter): 1656 | c_util = c_uint() 1657 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetPcieThroughput") 1658 | ret = fn(device, _nvmlPcieUtilCounter_t(counter), byref(c_util)) 1659 | _nvmlCheckReturn(ret) 1660 | return c_util.value 1661 | 1662 | def nvmlSystemGetTopologyGpuSet(cpuNumber): 1663 | c_count = c_uint(0) 1664 | fn = _nvmlGetFunctionPointer("nvmlSystemGetTopologyGpuSet") 1665 | 1666 | # First call will get the size 1667 | ret = fn(cpuNumber, byref(c_count), None) 1668 | 1669 | if ret != NVML_SUCCESS: 1670 | raise NVMLError(ret) 1671 | print(c_count.value) 1672 | # call again with a buffer 1673 | device_array = c_nvmlDevice_t * c_count.value 1674 | c_devices = device_array() 1675 | ret = fn(cpuNumber, byref(c_count), c_devices) 1676 | _nvmlCheckReturn(ret) 1677 | return map(None, c_devices[0:c_count.value]) 1678 | 1679 | def nvmlDeviceGetTopologyNearestGpus(device, level): 1680 | c_count = c_uint(0) 1681 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetTopologyNearestGpus") 1682 | 1683 | # First call will get the size 1684 | ret = fn(device, level, byref(c_count), None) 1685 | 1686 | if ret != NVML_SUCCESS: 1687 | raise NVMLError(ret) 1688 | 1689 | # call again with a buffer 1690 | device_array = c_nvmlDevice_t * c_count.value 1691 | c_devices = device_array() 1692 | ret = fn(device, level, byref(c_count), c_devices) 1693 | _nvmlCheckReturn(ret) 1694 | return map(None, c_devices[0:c_count.value]) 1695 | 1696 | def nvmlDeviceGetTopologyCommonAncestor(device1, device2): 1697 | c_level = _nvmlGpuTopologyLevel_t() 1698 | fn = _nvmlGetFunctionPointer("nvmlDeviceGetTopologyCommonAncestor") 1699 | ret = fn(device1, device2, byref(c_level)) 1700 | _nvmlCheckReturn(ret) 1701 | return c_level.value 1702 | --------------------------------------------------------------------------------