├── README.md └── spec.md /README.md: -------------------------------------------------------------------------------- 1 | **project is inactive. please look at open telemetry instead** 2 | 3 | 4 | spec 5 | ==== 6 | 7 | This is the metrics2.0 spec, shown at http://metrics20.org/spec/ 8 | 9 | If you want to make contributions, please open pull requests, issues, etc here. 10 | 11 | Thank you. 12 | -------------------------------------------------------------------------------- /spec.md: -------------------------------------------------------------------------------- 1 | # Metrics 2.0 spec 2 | 3 | This specification lives at [github.com/metrics20/spec](https://github.com/metrics20/spec), that's where change requests can be made 4 | 5 | ## Table of Contents 6 | 1. [Role of the spec](#role-of-the-spec) 7 | 2. [Glossary](#glossary) 8 | 3. [Data model](#data-model) 9 | 4. [Tags](#tags) 10 | * [Tag keys](#tag-keys) 11 | * [Tag values: unit](#tag-values-unit) 12 | * [Tag values: stat](#tag-values-stat) 13 | * [Tag values: mtype](#tag-values-mtype) 14 | 5. [Examples](#examples) 15 | * [Example 1: Swift proxy server timings](#example-1-swift-proxy-server-timings) 16 | * [Example 2: Disk space](#example-2-disk-space) 17 | 18 | ## Role of the spec 19 | 20 | * foster adoption of shared terminology and accurate classification 21 | * make metrics more self-describing 22 | 23 | With the end goal of tooling interoperability, correctness and being more user friendly. see http://metrics20.org/ for more details. 24 | 25 | It does not dictate transport protocols or storage mechanisms (except it imposes minimum requirements to support the spec), 26 | since that's an area in heavy flux and spans a broad technical spectrum where 27 | varying tradeoffs make sense (e.g. simplicity vs high performance), though the metrics2.0 project and website 28 | also aims to bring projects together under shared implementations and formats (see http://metrics20.org/implementations/) 29 | 30 | ## Glossary 31 | 32 | Tag 33 | : pieces of text that describe or identify a metric. Can either be in key=value form (e.g. 'env=prod') or just value without a key. (e.g. 'prod') which can be more convenient if you don't do key based searches. You should be able to mix both styles even for a given metric, But it is highly encouraged to use the key=value format as much as possible. If not explicitly specified, a tag can be assumed to be intrinsic. Tag values should always be non-empty strings, except for unit. 34 | 35 | (Tag) Key 36 | : describes the dimension or property being measured. Can be very useful for specifying aggregations, grouping, filtering and searching. 37 | 38 | Intrinsic tag 39 | : A tag value that describes the thing being measured in a fundamental way. Changing this tag means we're talking about measuring something else or in a different way, and means the timeseries identifier changes, so we're talking about a different series. (e.g. mtype, unit). 40 | 41 | Extrinsic tag 42 | : A tag value that provides information about the thing being measured, or the data, but that can change over time without meaning a changing in the timeseries identifier (e.g. line, agent). 43 | This is optional and implementation specific (e.g. are changes over time tracked or just the current state) 44 | 45 | Meta tag 46 | : Synonym for extrinsic tag 47 | 48 | Key 49 | : In many systems, a string is also used to uniquely identify a timeseries, in addition to tags. 50 | 51 | 52 | ## Data model 53 | 54 | * metrics have tags and optionally meta tags and optionally a key 55 | * the metric (timeseries) identity is directly tied to the metric key (if any) and tags values (but not their order). 56 | * this means that for a different key or a different set of tags we're talking about a different stream of data. 57 | * meta tags can change without affecting the metric identity. 58 | * tags should be chosen to make the metric as self-describing as possible, a key can be used for information that is hard to fit in tags. It is OK for there to be semantical overlap between tags and the key. (e.g. to put information in the key that is already also in tags) 59 | * characters that should be allowed: (at least) alphanumerics (case sensitive), underscore, hyphen, dot, forward slash. (units can contain slash, ip/hostname can contain dot) 60 | 61 | 62 | ## Tags 63 | 64 | * the following tags are **mandatory**: unit, mtype. 65 | * try to add as many tags to your metrics as possible, using all tags in the below table as appropriate or coming up with your own as needed. 66 | * if the tag key you're looking for is not in the spec yet, open a ticket to try to get it added. 67 | * keys should be fitting, descriptive and short, in that order of importance. 68 | * try to keep everything lowercase except where upper casing is sensible (e.g. units and prefixes that use capitals (M, B,...) or commonly capitalized terms such as http verbs (GET/PUT) 69 | 70 | 71 | ### Tag keys 72 | 73 | Tag key | use 74 | ------------|-----: 75 | host | physical or virtual machine 76 | http_method | the http method. like PUT, GET, etc. 77 | http_code | 200, 404, etc 78 | device | block device, network device, ... 79 | unit | the unit something is expressed in (b/s, MB, etc). See below. 80 | what | the thing being measured, if the other tags don't suffice. often same as metric key. 81 | type | further describe the metric. type is a very generic word, only use it if you really don't know anything better. 82 | result | values: ok, fail, ... (for http requests, http_code is probably more useful) 83 | stat | to clarify the statistical view 84 | bin_max | if your metrics are separated into bins by some numeric value, upper limit of a bin (like (statsd) histograms) 85 | direction | in/out (not 'tx' or 'rx', more consistent) 86 | mtype | type of metric in terms of how the data should be interpreted. See below. 87 | unit | in what is the magnititude being measured. see below 88 | file | file (that generated a metric) 89 | line | line (that generated a metric) 90 | env | environment 91 | 92 | 93 | ### Special tag values 94 | 95 | Value | Meaning 96 | --------|-----: 97 | `_sum_` | represents the sum of all other (would-be) metrics summed across this tag. ( equivalence) 98 | `_avg_` | represents the avg of all other (would-be) metrics averaged across this tag. (equivalence) 99 | 100 | 101 | 102 | ### Tag values: unit 103 | 104 | * Units in metrics 2.0 are the union of [all SI units and prefixes](http://en.wikipedia.org/wiki/International_System_of_Units) and [IEC prefixes](http://en.wikipedia.org/wiki/Binary_prefix), extended with units commonly used in IT (the "extensions"). 105 | * The extensions are designed to be intuitive (i.e. as commonly used by [strftime](http://strftime.org), however, are to never conflict with SI or IEC. 106 | * Note that unit can be empty string for unitless data (e.g. the fraction of two series with the same unit, or a probability. See "open question" further down) 107 | 108 | #### Commonly used SI units 109 | 110 | Unit | Meaning 111 | ------|-----: 112 | s | second (time) 113 | Hz | frequency (1/s) 114 | 115 | For the full listing, see the SI website 116 | 117 | #### SI and IEC prefixes 118 | 119 | * [SI decimal prefixes](http://en.wikipedia.org/wiki/SI_prefix) 120 | * [IEC binary prefixes](http://en.wikipedia.org/wiki/Binary_prefix) 121 | 122 | The most common ones are in the table below: 123 | 124 | Unit | Meaning 125 | ------|---------: 126 | n | nano, 10^-9 127 | μ | micro, 10^-6 128 | m | milli, 10^-3 129 | c | centi, 10^-2 130 | d | deci, 10^-1 131 | k | kilo, 10^3 132 | M | mega, 10^6 133 | G | giga, 10^9 134 | T | tera, 10^12 135 | P | peta, 10^15 136 | Ki | kibi 1024 137 | Mi | mebi, 1024^2 138 | Gi | gibi, 1024^3 139 | Ti | tebi, 1024^4 140 | Pi | pebi, 1024^5 141 | Ei | exbi, 1024^6 142 | 143 | #### Extensions 144 | 145 | Symbol | Meaning 146 | --------|-----: 147 | b | [bit](http://en.wikipedia.org/wiki/Bit#Unit_and_symbol) 148 | B | [byte](http://en.wikipedia.org/wiki/Bit#Unit_and_symbol) 149 | M | minute (strftime) 150 | h | hour (strftime) 151 | d | day (strftime) 152 | w | week (strftime) 153 | mo | month (not 'm' like in strftime because that would be SI conflict) 154 | err | errors 155 | warn | warnings 156 | conn | connections 157 | event | events (TCP events etc) 158 | ino | inodes 159 | email | email messages 160 | jiff | jiffies (i.e. for cpu usage) 161 | job | job (as in job queue) 162 | file | (not 'F' that's farad) 163 | load | cpu load 164 | metric | a metric line like in the statsd or graphite protocol 165 | msg | message (like in message queues) 166 | P | probability (between 0 and 1) 167 | page | page (as in memory segment) 168 | pckt | network packet 169 | process | process 170 | req | http requests, database queries, etc 171 | sock | sockets 172 | thread | thread 173 | ticket | upload tickets, kerberos tickets, .. 174 | 175 | Any combination of a prefix with any of the unit is supported. I.e. kHz, MB/s, etc. 176 | Note that out of consistency, and for clarity 'Mb/s' should be used instead of 'Mbps', and so forth for similar network metrics. [^Mbps] 177 | 178 | ### Tag values: stat 179 | 180 | Symbol | Meaning 181 | --------|-----: 182 | min | lowest value seen 183 | max | highest value seen 184 | mean | standard mean 185 | std | standard deviation 186 | *_NUM | the NUM percentile of the stat 187 | 188 | ### Tag values: mtype 189 | 190 | Symbol | Meaning 191 | ----------|-----: 192 | rate | a number per second (implies that unit ends on '/s') 193 | count | a number per a given interval (such as a statsd flushInterval) 194 | gauge | values at each point in time 195 | counter | keeps increasing over time (but might wrap/reset at some point) i.e. a gauge with the added notion of "i usually want to derive this to see the rate" 196 | timestamp | value represents a unix timestamp. so basically a gauge or counter but we know we can also render the "age" at each point. 197 | 198 | ### Open question: representing things that alter the meaning of unit and mtype 199 | 200 | * something in percent. do we add _pct to the unit or mtype tag? or normalized=100? 201 | * probability, commonly expressed as P is not really a unit, a probability has no unit. How do we specify that the number is a probability or that the range is between 0 and 1 ? 202 | * cpu load can also be considered unitless. 203 | 204 | 205 | ## Examples 206 | 207 | ### Example 1: Swift proxy-server timings 208 | 209 | This comes from the structured_metrics toolkit which upgrades a metric from the traditional form: 210 | ``` 211 | stats.timers.dfsproxy1.proxy-server.object.GET.206.timing.upper_90 212 | ``` 213 | 214 | Into: [^target_type] 215 | ``` 216 | { 217 | what=response_time 218 | http_code=206 219 | http_method=GET 220 | host=dfsproxy1 221 | service=proxy-server 222 | stat=upper_90 223 | swift_type=object 224 | target_type=gauge 225 | unit=ms 226 | } 227 | ``` 228 | 229 | 230 | ### Example 2: Disk space 231 | 232 | A hypothetical monitoring agent "diamond2" could submit native metrics 2.0 to track used disk space on a given mountpoint (file system) on a given server, like so: [^target_type] 233 | 234 | ``` 235 | { 236 | mountpoint=/srv/node/dfs3 237 | what=disk_space 238 | host=dfs4 239 | target_type=gauge 240 | type=used 241 | unit=B 242 | } 243 | meta: { 244 | agent=diamond2 245 | } 246 | ``` 247 | 248 | A hypothetical storage system could hence use something like this as the id for the corresponding series: 249 | ``` 250 | id=mountpoint=/srv/node/dfs3,what=disk_space,host=dfs4,target_type=gauge,type=used,unit=B 251 | ``` 252 | 253 | Note that if we switch to a different agent, the id will stay the same because meta tags are not used to generate the identifier for the storage system. 254 | 255 | [^Mbps]: Carbon-tagger and structured_metrics will set the unit to Mb/s whether the unit in the serialized key is Mbps or Mb/s. Also Graph-Explorer does not support the Mbps form, it does support the Mb/s form. 256 | [^target_type]: `target_type` was the old name for `mtype`, still used by tools such as structured_metrics and graph-explorer. They should be updated. They also still use 'server' instead of 'host', and lower/upper instead of min/max. 257 | --------------------------------------------------------------------------------