├── .gitignore ├── .htaccess ├── CODEOWNERS ├── CONTRIBUTING.md ├── LICENSE.md ├── README.md ├── gen-identifier-registry ├── index.xml └── rfc2629.xsl /.gitignore: -------------------------------------------------------------------------------- 1 | *.sw[nop] 2 | *~ 3 | .project 4 | .settings 5 | TAGS 6 | -------------------------------------------------------------------------------- /.htaccess: -------------------------------------------------------------------------------- 1 | DirectoryIndex index.xml 2 | -------------------------------------------------------------------------------- /CODEOWNERS: -------------------------------------------------------------------------------- 1 | # These owners will be the default owners for everything in 2 | # the repo. Unless a later match takes precedence, 3 | # they will be requested for review when someone opens a 4 | # pull request. 5 | * @jbenet @msporny @gannan08 @rhiaro 6 | 7 | # See CODEOWNERS syntax here: https://help.github.com/articles/about-codeowners/#codeowners-syntax 8 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # W3C Credentials Community Group 2 | 3 | Contributions to this repository are intended to become part of 4 | Recommendation-track documents governed by the 5 | [W3C Patent Policy](https://www.w3.org/Consortium/Patent-Policy-20040205/) and 6 | [Software and Document License](https://www.w3.org/Consortium/Legal/copyright-software). 7 | To make substantive contributions to specifications, you must either participate 8 | in the relevant W3C Working Group or make a non-member patent licensing commitment. 9 | 10 | If you are not the sole contributor to a contribution (pull request), please 11 | identify all contributors in the pull request comment. 12 | 13 | To add a contributor (other than yourself, that's automatic), mark them one 14 | per line as follows: 15 | 16 | ``` 17 | +@github_username 18 | ``` 19 | 20 | If you added a contributor by mistake, you can remove them in a comment with: 21 | 22 | ``` 23 | -@github_username 24 | ``` 25 | 26 | If you are making a pull request on behalf of someone else but you had no 27 | part in designing the feature, you can remove yourself with the above syntax. 28 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | All Reports in this Repository are licensed by Contributors under the [W3C Software and Document 2 | License](https://www.w3.org/Consortium/Legal/2015/copyright-software-and-document). 3 | 4 | Contributions to Specifications are made under the 5 | [W3C CLA](https://www.w3.org/community/about/agreements/cla/). 6 | 7 | Contributions to Software, including sample implementations, are under the 8 | [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). 9 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## The Multihash Data Format 2 | 3 | This specification describes a data model for expressing the output of 4 | cryptographic hash functions. 5 | 6 | Cryptographic hash functions often have multiple output sizes and encodings. 7 | This variability makes it difficult for applications to examine a series of 8 | bytes and determine which hash function produced them. Multihash is a universal 9 | data format for encoding outputs from hash functions. It is useful to write 10 | applications that can simultaneously support different hash function outputs as 11 | well as upgrade their use of hashes over time; Multihash is intended to 12 | address these needs. 13 | 14 | You can view an HTML version of the specification here: 15 | 16 | [https://w3c-ccg.github.io/multihash/](https://w3c-ccg.github.io/multihash/) 17 | 18 | We encourage contributions meeting the [Contribution 19 | Guidelines](CONTRIBUTING.md). While we prefer the creation of issues 20 | and Pull Requests in the GitHub repository, discussions often occur 21 | on the 22 | [public-credentials](http://lists.w3.org/Archives/Public/public-credentials/) 23 | mailing list as well. 24 | 25 | ### Other useful links 26 | * [Multiformats Website](https://multiformats.io/) 27 | * [Multihash Website](https://multiformats.io/multihash/) 28 | * [Public group email archive](https://lists.w3.org/Archives/Public/public-credentials/) 29 | -------------------------------------------------------------------------------- /gen-identifier-registry: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # 3 | # Script to generate the multihash identifier registry from the CSV file 4 | import codecs 5 | import csv 6 | import urllib.request 7 | 8 | url = 'https://raw.githubusercontent.com/multiformats/multicodec/master/table.csv' 9 | stream = urllib.request.urlopen(url) 10 | csvdata = csv.reader(codecs.iterdecode(stream, 'utf-8')) 11 | 12 | print(""" 13 | 14 | Name 15 | Identifier 16 | Status 17 | Specification 18 | """) 19 | 20 | for line in csvdata: 21 | if line[1].strip() != 'multihash': 22 | continue 23 | 24 | if line[3].strip() != 'permanent': 25 | continue 26 | 27 | codec = line[0].strip() 28 | identifier = line[2].strip() 29 | status = 'active' 30 | spec = 'Unknown' 31 | 32 | if(codec.startswith('blake2')): 33 | spec = 'RFC 7693' 34 | elif(codec.startswith('sha1')): 35 | spec = 'RFC 6234' 36 | elif(codec.startswith('sha2')): 37 | spec = 'RFC 6234' 38 | elif(codec.startswith('sha3')): 39 | spec = 'FIPS 202' 40 | elif(codec.startswith('poseidon')): 41 | spec = 'POSEIDON' 42 | elif(codec.startswith('md4')): 43 | status = 'deprecated' 44 | spec = 'RFC 6150' 45 | elif(codec.startswith('md5')): 46 | status = 'deprecated' 47 | spec = 'RFC 6151' 48 | 49 | print(f' {codec}{identifier}{status}{spec}') 50 | 51 | print(' ') 52 | -------------------------------------------------------------------------------- /index.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 12 | 13 | 14 | The Multihash Data Format 15 | 16 | 17 | 18 | Protocol Labs 19 |
20 | 21 | 548 Market Street, #51207 22 | San Francisco 23 | CA 24 | 94104 25 | US 26 | 27 | +1 619 957 7606 28 | juan@protocol.ai 29 | http://juan.benet.ai/ 30 |
31 |
32 | 33 | 34 | Digital Bazaar 35 |
36 | 37 | 203 Roanoke Street W. 38 | Blacksburg 39 | VA 40 | 24060 41 | US 42 | 43 | +1 540 961 4469 44 | msporny@digitalbazaar.com 45 | http://manu.sporny.org/ 46 |
47 |
48 | 49 | 50 | Security 51 | 52 | digest algorithm 53 | digital signature 54 | PKI 55 | SHA 56 | BLAKE 57 | poseidon 58 | 59 | 60 | 61 | Cryptographic hash functions often have multiple output sizes and encodings. 62 | This variability makes it difficult for applications to examine a series of 63 | bytes and determine which hash function produced them. Multihash is a universal 64 | data format for encoding outputs from hash functions. It is useful to write 65 | applications that can simultaneously support different hash function outputs as 66 | well as upgrade their use of hashes over time; Multihash is intended to 67 | address these needs. 68 | 69 | 70 | 71 | 72 | 73 | This specification is a joint work product of 74 | Protocol Labs and the 75 | W3C Credentials Community Group. 76 | Feedback related to this specification should logged in the 77 | issue tracker 78 | or be sent to 79 | public-credentials@w3.org. 80 | 81 | 82 |
83 | 84 |
85 | 86 | Multihash is particularly important in systems which depend on 87 | cryptographically secure hash functions. Attacks may break the cryptographic 88 | properties of secure hash functions. These cryptographic breaks are 89 | particularly painful in large tool ecosystems, where tools may have made 90 | assumptions about hash values, such as function and digest size. Upgrading 91 | becomes a nightmare, as all tools which make those assumptions would have 92 | to be upgraded to use the new hash function and new hash digest length. 93 | Tools may face serious interoperability problems or error-prone special casing. 94 | 95 | 96 | How many programs out there assume a git hash is a SHA-1 hash? 97 | 98 | 99 | How many scripts assume the hash value digest is exactly 160 bits? 100 | 101 | 102 | How many tools will break when these values change? 103 | 104 | 105 | How many programs will fail silently when these values change? 106 | 107 | 108 | This is precisely why Multihash was created. It was designed for 109 | seamlessly upgrading systems that depend on cryptographic hashes. 110 | 111 | 112 | When using Multihash, a system warns the consumers of its hash values that 113 | these may have to be upgraded in case of a break. Even though the system 114 | may still only use a single hash function at a time, the use of multihash 115 | makes it clear to applications that hash values may use different hash 116 | functions or be longer in the future. Tooling, applications, and scripts 117 | can avoid making assumptions about the length, and read it from the 118 | multihash value instead. This way, the vast majority of tooling - which 119 | may not do any checking of hashes - would not have to be upgraded at all. 120 | This vastly simplifies the upgrade process, avoiding the waste of hundreds 121 | or thousands of software engineering hours, deep frustrations, and high 122 | blood pressure. 123 | 124 |
125 |
126 | 127 | A multihash follows the TLV (type-length-value) pattern and consists of 128 | several fields composed of a combination of unsigned variable length 129 | integers and byte information. 130 | 131 |
132 | 133 | The following section details the core data types used by the Multihash 134 | data format. 135 | 136 |
137 | 138 | A data type that enables one to express an unsigned integer of variable length. 139 | The format uses the Little Endian Base 128 (LEB128) encoding that is defined in 140 | Appendix C of the 141 | DWARF Debugging Information Format standard, 142 | initially released in 1993. 143 | 144 | 145 | As suggested by the name, this variable length encoding is only capable of 146 | representing unsigned integers. Further, while there is no theoretical maximum 147 | integer value that can be represented by the format, implementations MUST NOT 148 | encode more than nine (9) bytes giving a practical limit of integers in a range 149 | between 0 and 2^63 - 1. 150 | 151 | 152 | When encoding an unsigned variable integer, the unsigned integer is serialized 153 | seven bits at a time, starting with the least significant bits. The most 154 | significant bit in each output byte indicates if there is a 155 | continuation byte. It is not possible to express a signed integer with this 156 | data type. 157 | 158 | 159 | Value 160 | Encoding (bits) 161 | hexadecimal notation 162 | 1 163 | 00000001 164 | 0x01 165 | 127 166 | 01111111 167 | 0x7F 168 | 128 169 | 10000000 00000001 170 | 0x8001 171 | 255 172 | 11111111 00000001 173 | 0xFF01 174 | 300 175 | 10101100 00000010 176 | 0xAC02 177 | 16384 178 | 10000000 10000000 00000001 179 | 0x808001 180 | 181 | 182 | Implementations MUST restrict the size of the varint to a max of nine bytes 183 | (63 bits). In order to avoid memory attacks on the encoding, the 184 | aforementioned practical maximum length of nine bytes is used. There is 185 | no theoretical limit, and future specs can grow this number if it is truly 186 | necessary to have code or length values larger than 2^31. 187 | 188 |
189 |
190 |
191 | 192 | A multihash follows the TLV (type-length-value) pattern. 193 | 194 |
195 | 196 | The hash function identifier is an 197 | unsigned variable integer 198 | identifying the hash 199 | function. The possible values for this field are provided in 200 | The Multihash Identifier Registry. 201 | 202 |
203 |
204 | 205 | The digest length is an 206 | unsigned variable integer 207 | counting the length of the digest in bytes. 208 | 209 |
210 |
211 | 212 | The digest value is the hash function digest with a length of exactly what is 213 | specified in the digest length, which is specified in bytes. 214 | 215 |
216 |
217 |
218 | 219 | 220 | For example, the following is an expression of a SHA2-256 hash in hexadecimal 221 | notation (spaces added for readability purposes): 222 |
223 | 0x12 20 41dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8 224 |
225 | The first byte (0x12) specifies the SHA2-256 hash function. The second byte 226 | (0x20) specifies the length of the hash, which is 32 bytes. The rest of the 227 | data specifies the value of the output of the hash function. 228 |
229 |
230 |
231 |
232 | 233 | 234 | 235 | 236 | 237 | DWARF Debugging Information Format, Version 3 238 | 239 | 240 | 241 | This document defines the format for the information generated by compilers, assemblers and linkage editors, that is necessary for symbolic, source-level debugging. 242 | 243 | 244 | 245 | 246 | 247 | 248 | US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF) 249 | 250 | 251 | 252 | 253 | Federal Information Processing Standard, FIPS 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions 263 | 264 | 265 | 266 | This Standard specifies the Secure Hash Algorithm-3 (SHA-3) family of functions on binary data. 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | POSEIDON: A New Hash Function for Zero-Knowledge Proof Systems 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | A modular framework and concrete instances of cryptographic hash functions which work natively with GF(p) objects. The POSEIDON hash function uses up to 8x fewer constraints per message bit than a Pedersen Hash. 284 | 285 | 286 | 287 | 288 | 289 | 290 | The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC) 291 | 292 | 293 | 294 | 295 | This document describes the cryptographic hash function BLAKE2 and makes the algorithm specification and C source code conveniently available to the Internet community. BLAKE2 comes in two main flavors: BLAKE2b is optimized for 64-bit platforms and BLAKE2s for smaller architectures. BLAKE2 can be directly keyed, making it functionally equivalent to a Message Authentication Code (MAC). 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | MD4 to Historic Status 307 | 308 | 309 | 310 | 311 | This document retires RFC 1320, which documents the MD4 algorithm, and discusses the reasons for doing so. This document moves RFC 1320 to Historic status. This document is not an Internet Standards Track specification; it is published for informational purposes. 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | Updated Security Considerations for the MD5 Message-Digest and the HMAC-MD5 Algorithms 320 | 321 | 322 | 323 | 324 | This document retires RFC 1320, which documents the MD4 algorithm, and discusses the reasons for doing so. This document moves RFC 1320 to Historic status. This document is not an Internet Standards Track specification; it is published for informational purposes. 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 |
333 | 334 | There are a number of security considerations to take into account when 335 | implementing or utilizing this specification. 336 | 337 | TBD 338 | 339 |
340 |
341 | 342 | 343 | The multihash examples are chosen to show different hash functions and 344 | different hash digest lengths at play. The input test data for all of the 345 | examples in this section is:
Merkle–Damgård
346 |
347 | 348 |
349 |
350 | 0x11148a173fd3e32c0fa78b90fe42d305f202244e2739 351 |
352 | 353 | The fields for this multihash are - hashing function: sha1 (0x11), 354 | length: 20 (0x14), digest: 0x8a173fd3e32c0fa78b90fe42d305f202244e2739 355 | 356 |
357 | 358 |
359 |
360 | 0x122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8 361 |
362 | 363 | The fields for this multihash are - hashing function: sha2-256 (0x12), 364 | length: 32 (0x20), digest: 0x41dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8 365 | 366 |
367 | 368 |
369 |
370 | 0x132052eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4 371 |
372 | 373 | The fields for this multihash are - hashing function: sha2-512 (0x13), 374 | length: 32 (0x20), 375 | digest: 0x52eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4 376 | 377 |
378 | 379 |
380 |
381 | 0x134052eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4c2cbbafd365f96fb12b1d98a0334870c2ce90355da25e6a1108a6e17c4aaebb0 382 |
383 | 384 | The fields for this multihash are - hashing function: sha2-512 (0x13), 385 | length: 64 (0x40), 386 | digest: 0x52eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4c2cbbafd365f96fb12b1d98a0334870c2ce90355da25e6a1108a6e17c4aaebb0 387 | 388 |
389 | 390 |
391 |
392 | 0xb24040d91ae0cb0e48022053ab0f8f0dc78d28593d0f1c13ae39c9b169c136a779f21a0496337b6f776a73c1742805c1cc15e792ddb3c92ee1fe300389456ef3dc97e2 393 |
394 | 395 | The fields for this multihash are - hashing function: blake2b-512 (0xb240), 396 | length: 64 (0x40), 397 | digest: 0xd91ae0cb0e48022053ab0f8f0dc78d28593d0f1c13ae39c9b169c136a779f21a0496337b6f776a73c1742805c1cc15e792ddb3c92ee1fe300389456ef3dc97e2 398 | 399 |
400 | 401 |
402 |
403 | 0xb220207d0a1371550f3306532ff44520b649f8be05b72674e46fc24468ff74323ab030 404 |
405 | 406 | The fields for this multihash are - hashing function: blake2b-256 (0xb220), 407 | length: 32 (0x20), 408 | digest: 0x7d0a1371550f3306532ff44520b649f8be05b72674e46fc24468ff74323ab030 409 | 410 |
411 | 412 |
413 |
414 | 0xb26020a96953281f3fd944a3206219fad61a40b992611b7580f1fa091935db3f7ca13d 415 |
416 | 417 | The fields for this multihash are - hashing function: blake2s-256 (0xb260), 418 | length: 32 (0x20), 419 | digest: 0xa96953281f3fd944a3206219fad61a40b992611b7580f1fa091935db3f7ca13d 420 | 421 |
422 | 423 |
424 |
425 | 0xb250100a4ec6f1629e49262d7093e2f82a3278 426 |
427 | 428 | The fields for this multihash are - hashing function: blake2s-128 (0xb250), 429 | length: 16 (0x10), digest: 0x0a4ec6f1629e49262d7093e2f82a3278 430 | 431 |
432 | 433 |
434 |
435 | 436 | The editors would like to thank the following individuals for feedback on and 437 | implementations of the specification (in alphabetical order). 438 | 439 |
440 |
441 |
442 | 443 | The Multihash Identifier Registry contains hash functions supported by Multihash 444 | each with its canonical name, its value in hexadecimal notation, and its status. 445 | The following initial entries should be added 446 | to the registry to be created and maintained at (the suggested URI) 447 | http://www.iana.org/assignments/multihash-identifiers: 448 | 449 | 450 | 451 | Name 452 | Identifier 453 | Status 454 | Specification 455 | 456 | identity0x00activeUnknown 457 | sha10x11activeRFC 6234 458 | sha2-2560x12activeRFC 6234 459 | sha2-5120x13activeRFC 6234 460 | sha3-5120x14activeFIPS 202 461 | sha3-3840x15activeFIPS 202 462 | sha3-2560x16activeFIPS 202 463 | sha3-2240x17activeFIPS 202 464 | sha2-3840x20activeRFC 6234 465 | sha2-256-trunc254-padded0x1012activeRFC 6234 466 | sha2-2240x1013activeRFC 6234 467 | sha2-512-2240x1014activeRFC 6234 468 | sha2-512-2560x1015activeRFC 6234 469 | blake2b-2560xb220activeRFC 7693 470 | poseidon-bls12_381-a2-fc10xb401activePOSEIDON 471 | 472 | 473 | 474 | NOTE: The most up to date place for developers to find the table above, plus 475 | all multihash headers in "draft" status, is 476 | https://github.com/multiformats/multicodec/blob/master/table.csv. 477 | 478 |
479 | 480 |
481 | 482 | This memo registers the "mh" digest-algorithm in the 483 | HTTP Digest Algorithm Values 484 | registry with the following values: 485 | 486 | 487 | Digest Algorithm: mh 488 | Description: The multibase-serialized value of a multihash-supported algorithm. 489 | References: this document 490 | Status: standard 491 | 492 |
493 | 494 |
495 | 496 | This memo registers the "mh" hash algorithm in the 497 | Named Information Hash Algorithm 498 | registry with the following values: 499 | 500 | 501 | ID: 49 502 | Hash Name String: mh 503 | Value Length: variable 504 | Reference: this document 505 | Status: current 506 | 507 |
508 |
509 |
510 |
--------------------------------------------------------------------------------