├── _config.yml ├── .github └── FUNDING.yml ├── sections ├── acknowledgements.md ├── asymmetric-keys.md ├── general-guidance.md ├── random-numbers.md ├── symmetric-keys.md ├── cryptographic-libraries.md └── hashing.md ├── Contents.md ├── LICENSE └── README.md /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman 2 | -------------------------------------------------------------------------------- /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | github: [samuel-lucas6] 4 | custom: ["https://www.kryptor.co.uk/#donate"] 5 | -------------------------------------------------------------------------------- /sections/acknowledgements.md: -------------------------------------------------------------------------------- 1 | # Acknowledgements 2 | These guidelines were inspired by: 3 | - Latacora's [Cryptographic Right Answers](https://latacora.singles/2018/04/03/cryptographic-right-answers.html) 4 | - Greg Rubin's [Crypto Gotchas](https://github.com/SalusaSecondus/CryptoGotchas) 5 | - Thomas Ptacek's [(Updated) Cryptographic Right Answers](https://gist.github.com/tqbf/be58d2d39690c3b366ad) gist 6 | - Aaron Toponce's forked [Cryptographic Best Practices](https://gist.github.com/atoponce/07d8d4c833873be2f68c34f9afc5a78a#file-gistfile1-md) gist 7 | 8 | The difference is that newer algorithms are mentioned, recommendations have been justified, and important notes about implementation are provided. -------------------------------------------------------------------------------- /Contents.md: -------------------------------------------------------------------------------- 1 | # Cryptography Guidelines 2 | Are you a developer in need of some crypto? If so, you've come to the right place! 3 | 4 | These guidelines outline: 5 | - Cryptographic library recommendations 6 | - Cryptographic algorithm recommendations 7 | - Parameter recommendations 8 | - Important implementation details 9 | 10 | Parts are opinion-based, but most of this information is derived from expert recommendations alongside real-world protocols and applications designed by cryptographers and cryptography engineers. 11 | 12 | Importantly, unlike some other guidelines online, justification is provided for why certain libraries and algorithms are preferable. This helps with learning and enables fact checking, allowing you to ultimately come to your own conclusions. 13 | 14 | In general, **boring is better, whereas complexity risks catastrophe**. With more complicated designs, contacting a cryptography engineer is strongly recommended. 15 | 16 | Note that some knowledge of cryptography is required to understand the terminology used in these guidelines. For learning resources, check out [this](https://samuellucas.com/blog/how-to-learn-about-cryptography.html) and [this](https://soatok.blog/2020/06/10/how-to-learn-cryptography-as-a-programmer/) blog post. 17 | 18 | ## Contents 19 | 1. [General Guidance](sections/general-guidance.md) 20 | 2. [Cryptographic Libraries](sections/cryptographic-libraries.md) 21 | 3. [Symmetric Encryption](sections/symmetric-encryption.md) 22 | 4. [Message Authentication Codes](sections/message-authentication-codes.md) 23 | 5. [Symmetric Key Size](sections/symmetric-key-size.md) 24 | 6. [Random Numbers](sections/random-numbers.md) 25 | 7. [Hashing](sections/hashing.md) 26 | 8. [Password Hashing/Password-Based Key Derivation](sections/password-hashing-password-based-key-derivation.md) 27 | 9. [(Non-Password-Based) Key Derivation Functions](sections/non-password-based-key-derivation-functions.md) 28 | 10. [Key Exchange/Hybrid Encryption](sections/key-exchange-hybrid-encryption.md) 29 | 11. [Digital Signatures](sections/digital-signatures.md) 30 | 12. [Asymmetric Key Size](sections/asymmetric-key-size.md) 31 | 13. [Concluding Remarks](sections/concluding-remarks.md) 32 | 14. [Acknowledgements](sections/acknowledgements.md) 33 | 34 | ## Contribute 35 | If you find these guidelines helpful, please **star** this repository and **share** the link around. Doing so might just prevent someone from making a catastrophic mistake. 36 | 37 | If you have any **feedback or corrections**, please **contact me** privately [here](https://samuellucas.com/) or publicly [here](https://github.com/samuel-lucas6/Cryptography-Guidelines/discussions) to help improve these guidelines. [Pull requests](https://github.com/samuel-lucas6/Cryptography-Guidelines/pulls) are also welcome but please be prepared for things to be reworded. 38 | 39 | ## License 40 | ![Creative Commons License Icon](https://i.creativecommons.org/l/by-sa/4.0/88x31.png) This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) because it took bloody ages to write. -------------------------------------------------------------------------------- /sections/asymmetric-keys.md: -------------------------------------------------------------------------------- 1 | # Asymmetric Keys 2 | ## Use 3 | ### 256-bit ECC keys 4 | This is the key size for [X25519](https://datatracker.ietf.org/doc/html/rfc7748), which provides a [~128-bit security level](https://crypto.stackexchange.com/questions/27771/does-curve25519-only-provide-112-bit-security). 5 | 6 | Why am I recommending this when I recommend 256-bit keys (a 256-bit security level) for symmetric encryption? Because 128-bit security means [something different](https://github.com/LoupVaillant/Monocypher/issues/127#issuecomment-536200435) in the [case](https://loup-vaillant.fr/tutorials/128-bits-of-security) of these asymmetric algorithms. 7 | 8 | Furthermore, [X25519](https://datatracker.ietf.org/doc/html/rfc7748) is faster, [more common](https://ianix.com/pub/curve25519-deployment.html), and [more accessible](https://en.wikipedia.org/wiki/Comparison_of_TLS_implementations#Supported_elliptic_curves) than [X448](https://datatracker.ietf.org/doc/html/rfc7748). Finally, when quantum computers do come along, [ECC](https://en.wikipedia.org/wiki/Elliptic-curve_cryptography) and [RSA](https://en.wikipedia.org/wiki/RSA_(cryptosystem)) ([with usable key sizes](https://crypto.stackexchange.com/a/88303)) will be broken regardless of the key size anyway, so many people feel [less of a need](https://github.com/LoupVaillant/Monocypher/issues/127#issuecomment-536200435) to use a higher security level curve. 9 | 10 | ### 448-bit ECC keys 11 | If you insist on a higher security curve, this is the key size for [X448](https://datatracker.ietf.org/doc/html/rfc7748), which provides a [~224-bit security level](https://datatracker.ietf.org/doc/html/rfc7748#section-4.2). It's still [not](https://csrc.nist.gov/publications/detail/nistir/8105/final) post-quantum secure though. 12 | 13 | ### 3072-bit RSA keys 14 | **If you’re forced to use [RSA](https://en.wikipedia.org/wiki/RSA_(cryptosystem))**, then you should use 3072-bit keys, which is the key size [currently used by the NSA](https://www.keylength.com/en/6/) and recommended by NIST, ECRYPT, and BSI for [near-term protection](https://www.keylength.com/en/3/). 15 | 16 | The maximum should be 4096-bit because the performance is really bad after that. However, 4096-bit provides [little benefit](https://crypto.stackexchange.com/a/99235) over 3072-bit. 17 | 18 | **Just don’t use RSA.** 19 | 20 | ## Avoid 21 | ### 512-bit RSA keys 22 | This was [broken ages ago](https://crypto.stackexchange.com/a/3933). 23 | 24 | ### 1024-bit RSA keys 25 | These should be considered [**no longer](https://crypto.stackexchange.com/a/1982) [secure**](https://crypto.stackexchange.com/questions/2612/difficulty-of-breaking-rsa-for-a-given-key-size). 2048-bit ([112-bit security](https://crypto.stackexchange.com/a/1980)) is the recommended [minimum](https://www.keylength.com/en/8/) and 3072-bit gets you up to the [128-bit security level](https://crypto.stackexchange.com/questions/8687/security-strength-of-rsa-in-relation-with-the-modulus-size?noredirect=1&lq=1). 26 | 27 | ### 8192-bit RSA keys 28 | These are [slow](https://www.javamex.com/tutorials/cryptography/rsa_key_length.shtml) to use and excessive to store. 29 | 30 | ### 2048-bit RSA keys 31 | These only provide a [112-bit security level](https://crypto.stackexchange.com/questions/8687/security-strength-of-rsa-in-relation-with-the-modulus-size?noredirect=1&lq=1), which is below the standard [128-bit security level](https://loup-vaillant.fr/tutorials/128-bits-of-security). 32 | 33 | Therefore, whilst still [commonly used](https://swiftsilentdeadly.com/protonmail-five-years-later-part-iii-security-features/) and safe as a **minimum** RSA key size, it makes sense to use 3072-bit keys instead. 34 | 35 | ### Post-quantum key sizes 36 | These algorithms are [still being researched](https://csrc.nist.gov/projects/post-quantum-cryptography), and the key sizes can be [very large](https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Brochure/quantum-safe-cryptography.html?nn=433196) compared to those for [X25519](https://ianix.com/pub/curve25519-deployment.html)/[Ed25519](https://ianix.com/pub/ed25519-deployment.html). -------------------------------------------------------------------------------- /sections/general-guidance.md: -------------------------------------------------------------------------------- 1 | # General Guidance 2 | It’s your responsibility to get things right the first time around to the best of your ability rather than relying on peer review that may never happen. 3 | 4 | ## Use existing, analysed cryptographic algorithms 5 | **Please don't create your own custom cryptographic algorithms (e.g. a custom cipher or hash function)**. 6 | 7 | This is like flying a Boeing 747 without a pilot license but worse because **even experienced cryptographers design [insecure](https://eprint.iacr.org/2022/214.pdf) algorithms**, which is why cryptographic algorithms are thoroughly analysed by a large number of cryptanalysts, usually as part of a [competition](https://competitions.cr.yp.to/index.html). By contrast, you rarely see experienced airline pilots crashing planes. 8 | 9 | The only *exception* to this rule is implementing something like Encrypt-then-MAC with secure, **existing** cryptographic algorithms **when you know what you're doing**. 10 | 11 | ## Use established cryptographic libraries 12 | **Please avoid coding existing cryptographic algorithms yourself (e.g. coding AES yourself)**. 13 | 14 | Cryptographic libraries provide access to these algorithms for you to prevent people from making mistakes that cause vulnerabilities and to offer good performance. 15 | 16 | Whilst a select few algorithms are relatively simple to implement, like [HKDF](https://datatracker.ietf.org/doc/html/rfc5869), [many aren't](https://loup-vaillant.fr/articles/implementing-elligator) and require a great deal of experience to implement correctly. 17 | 18 | Another reason to avoid doing this is that it's not much fun since academic papers and reference implementations can be very difficult to understand. 19 | 20 | ## Read 21 | You often don’t need to know how cryptographic algorithms work under the hood to implement them correctly, just like how you don’t need to know how a car works to drive. 22 | 23 | However, you need to know enough about what you’re trying to do, which requires reading: 24 | 25 | - The [documentation](https://doc.libsodium.org/) for the cryptographic library you’re using 26 | - [RFC standards](https://datatracker.ietf.org/doc/html/rfc2104) for algorithms you're using, particularly the ['Security Considerations'](https://datatracker.ietf.org/doc/html/rfc7748#section-7) sections 27 | - Helpful [blog](https://neilmadden.blog/2018/11/14/public-key-authenticated-encryption-and-why-you-want-it-part-i/) [posts](https://soatok.blog/2021/11/17/understanding-hkdf/) and [Cryptography Stack Exchange](https://crypto.stackexchange.com/) answers 28 | - [Guidelines](https://gotchas.salusa.dev/) like this one 29 | - Relevant information in [books](https://www.manning.com/books/real-world-cryptography) 30 | 31 | Ideally, **consult multiple resources** to ensure the information is accurate. For instance, some information on Stack Exchange is misleading or outright wrong. Sources that have been peer reviewed are likely higher in quality. 32 | 33 | Furthermore, reading books about the subject in general will be beneficial, again like how knowing about cars can help if you break down. For a list of great resources, check out my [How to Learn About Cryptography](https://samuellucas.com/blog/how-to-learn-about-cryptography.html) blog post. Soatok also has a [great](https://soatok.blog/2020/06/10/how-to-learn-cryptography-as-a-programmer/) blog post. 34 | 35 | ## Double check 36 | To prevent code-related mistakes, you should: 37 | 38 | - Read security sensitive code twice 39 | - Test your code to ensure that it’s operating as expected (e.g. using test vectors, unit tests, debugging, etc) 40 | 41 | ## Peer review 42 | Unless your project is popular, you have a bug bounty program with cash rewards, or what you’re developing is for an organisation, very few people, perhaps none, will look through the code to find and report vulnerabilities. 43 | 44 | Similarly, receiving funding for a code audit will probably be impossible. Organisations providing funding typically dish it out to large projects like [Tor](https://www.opentech.fund/results/supported-projects/). If you want to fund it yourself, it'll probably cost you $5,000-$10,000 for a small project, which isn't worth the money. -------------------------------------------------------------------------------- /sections/random-numbers.md: -------------------------------------------------------------------------------- 1 | # Random Numbers 2 | ## Use 3 | ### An operating system CSPRNG 4 | [CSPRNG](https://en.wikipedia.org/wiki/Cryptographically-secure_pseudorandom_number_generator) stands for **cryptographically secure** pseudorandom number generator. Your [programming language](https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.randomnumbergenerator?view=net-6.0) or [cryptographic library](https://doc.libsodium.org/generating_random_data) should call the operating system CSPRNG for you. 5 | 6 | Here's a list of functions/classes for some major programming languages: 7 | - Python: [secrets](https://docs.python.org/3/library/secrets.html) 8 | - JavaScript: [Crypto.getRandomValues()](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getRandomValues) 9 | - C#: [RandomNumberGenerator()](https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.randomnumbergenerator?view=net-6.0) 10 | - Java: [SecureRandom()](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/security/SecureRandom.html) 11 | - Go: [crypto/rand](https://pkg.go.dev/crypto/rand) 12 | - Rust: [rand](https://crates.io/crates/rand) 13 | - PHP: [random](https://www.php.net/manual/en/ref.csprng.php) 14 | - Swift: [SystemRandomNumberGenerator](https://developer.apple.com/documentation/swift/systemrandomnumbergenerator) 15 | 16 | You may also find [this](https://paragonie.com/blog/2016/05/how-generate-secure-random-numbers-in-various-programming-languages) blog post useful, which mentions some other languages and goes into greater detail. 17 | 18 | Avoid calling the operating system CSPRNG yourself if you can help it. If you need to for some reason, here's a list: 19 | - Windows: [BCryptGenRandom](https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptgenrandom) 20 | - Linux/macOS: [getrandom()](https://man7.org/linux/man-pages/man2/getrandom.2.html) if available or *[/dev/urandom](https://linux.die.net/man/4/urandom) otherwise 21 | - OpenBSD: [arc4random()](https://man.openbsd.org/arc4random.3) 22 | 23 | *[/dev/random](https://en.wikipedia.org/wiki//dev/random) 'creates more problems than it solves' - [Jean-Philippe Aumasson](https://www.aumasson.jp/) in [Serious Cryptography](https://nostarch.com/seriouscrypto). Specifically, it causes performance/denial-of-service issues. See [this](https://www.2uo.de/myths-about-urandom/) if you don't have a copy. 24 | 25 | On embedded devices, allow a library like [LibHydrogen](https://github.com/jedisct1/libhydrogen) to handle random number generation for you. 26 | 27 | ## Avoid 28 | ### A regular PRNG 29 | This means a **non-cryptographically secure** pseudorandom number generator. **These are not secure and should not be used for anything related to security**. 30 | 31 | For example, avoid the following: 32 | - Python: [random](https://docs.python.org/3/library/random.html) 33 | - JavaScript: [Math.random()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random) 34 | - C#: [Random.Next()](https://docs.microsoft.com/en-us/dotnet/api/system.random.next?view=net-6.0) 35 | - Java: [Random()](https://docs.oracle.com/javase/8/docs/api/java/util/Random.html) or [Math.random()](https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#random--) 36 | - Go: [math/rand](https://pkg.go.dev/math/rand) 37 | - PHP: [rand()](https://www.php.net/manual/en/function.rand.php) 38 | 39 | ### A custom/userspace PRNG 40 | This is [**very](https://nakedsecurity.sophos.com/2013/07/09/anatomy-of-a-pseudorandom-number-generator-visualising-cryptocats-buggy-prng/) [likely**](https://www.cryptofails.com/post/72902772336/how-not-to-csprng) going to be [**insecure**](https://hdm.io/tools/debian-openssl/) because it’s [harder](https://monocypher.org/manual/#Random_number_generation) to do properly than you’d think. For instance: 41 | - You need to mix different entropy sources (e.g. mouse movements, temperature readings, RDRAND, etc) together to produce a seed 42 | - However, entropy is difficult to acquire at boot since some devices will result in the same noise 43 | - You want to ensure forward secrecy to prevent an attacker retrieving previously generated random numbers 44 | - You want to reseed periodically. Reinjecting new entropy provides backward/future secrecy to prevent a state compromise allowing future random numbers to be predicted 45 | - Program forks result in a child process that shares the state with the parent process, meaning identical output unless forks use different seeds 46 | 47 | **Just trust the operating system CSPRNG**. Chances are you'd seed a custom PRNG with it anyway. 48 | 49 | If you know what you're doing and you're forced to implement your own, make sure it's a [fast-key-erasure](https://blog.cr.yp.to/20170723-random.html) one. If you don't understand everything on that page, you probably shouldn't risk it as good randomness is the foundation for lots of other cryptography. 50 | 51 | ## Notes 52 | ### Output size 53 | Salts are [typically](https://www.rfc-editor.org/rfc/rfc9106.html#section-3.1) 128 bits long. This length ensures a collision is very [unlikely](https://crypto.stackexchange.com/a/56132). [Some](https://doc.libsodium.org/password_hashing/default_phf#key-derivation) cryptographic libraries won't let you change this for things like [password hashing](password-hashing-password-based-key-derivation.md). 54 | 55 | However, if you want to be conservative, then you can use 256-bit random values for IDs, salts, and so on. This reduces the chances of a collision into the realm of [never having anything to worry about](https://crypto.stackexchange.com/a/27828). 56 | 57 | ### Virtual machines 58 | If you generate random numbers inside a virtual machine (VM) and the VM state is saved and later restored, the same random numbers may be generated. -------------------------------------------------------------------------------- /sections/symmetric-keys.md: -------------------------------------------------------------------------------- 1 | # Symmetric Keys 2 | ## Use 3 | ### 256-bit keys 4 | There are two main arguments for 128-bit keys: 5 | - [Increased performance](https://github.com/jedisct1/zig-rocca#readme) for many algorithms 6 | - A 256-bit key is [excessive](https://security.stackexchange.com/questions/14068/why-most-people-use-256-bit-encryption-instead-of-128-bit) since a 128-bit key can't be bruteforced 7 | 8 | Whilst the first point is generally true, some algorithms only support 256-bit keys (e.g. [ChaCha20-Poly1305](https://datatracker.ietf.org/doc/html/rfc8439)) and newer algorithms with a 256-bit key are faster than current schemes with a 128-bit key (e.g. [AEGIS](https://competitions.cr.yp.to/round3/aegisv11.pdf) and [Rocca](https://tosc.iacr.org/index.php/ToSC/article/view/8904/8480)). 9 | 10 | Regarding the second argument, **a 128-bit key doesn't translate to 128-bit security** due to [multi-target attacks](https://blog.cr.yp.to/20151120-batchattacks.html), which involve attacking [many keys at once](https://crypto.stackexchange.com/questions/75880/what-is-a-multi-target-attack). 11 | 12 | Furthermore, it's [recommended](https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Brochure/quantum-safe-cryptography.html?nn=433196) that 256-bit keys are used to ensure post-quantum security since there's concern that future quantum computers will eventually be able to bruteforce 128-bit keys. 13 | 14 | Nobody can accurately predict how the quantum computer situation will play out, but it's clear that **256-bit keys should be used if security is a priority**. 15 | 16 | ## Avoid 17 | ### Smaller than 128-bit keys 18 | In [some cases](https://en.wikipedia.org/wiki/Data_Encryption_Standard#Brute-force_attack), such keys can already be bruteforced. For slightly larger keys, they very likely won’t stand the test of time. 19 | 20 | ### 128-bit keys 21 | Please see the 256-bit keys discussion. In sum, this is safe today and often leads to a performance increase, but **this doesn't offer a 128-bit security level** and won't provide long-term protection. 22 | 23 | Also, the [argument](https://security.stackexchange.com/a/14537) that AES-128 is more secure than AES-256 due to [certain attacks](https://crypto.stackexchange.com/a/91878) being more effective on AES-256 is misleading because such attacks are [not practical](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard#Known_attacks) in properly designed protocols. 24 | 25 | ### Large keys (e.g. 512-bit and 1024-bit) 26 | Some symmetric encryption algorithms support large key sizes (e.g. [Threefish](https://en.wikipedia.org/wiki/Threefish)). Also, with MACs like [HMAC](https://datatracker.ietf.org/doc/html/rfc2104), it’s [recommended](https://www.rfc-editor.org/rfc/rfc2104#section-3) to use a key size as large as the output length to avoid a reduction in security (e.g. a 512-bit key for HMAC-SHA-512). 27 | 28 | However, key sizes over 256-bit are widely regarded as [unnecessary](https://crypto.stackexchange.com/a/62553) because they provide [no practical](https://crypto.stackexchange.com/a/1160) security benefit. 29 | 30 | Furthermore, encryption algorithms supporting such key sizes are unpopular in practice, which is a good sign that they should be avoided. 31 | 32 | Plus, HMAC keys larger than the hash function block size (e.g. > 512 bits with HMAC-SHA-256 and > 1024 bits with HMAC-SHA-512) get hashed down to the output length of the hash function. 33 | 34 | ## Notes 35 | ### Symmetric keys must be kept secret 36 | Unlike with public-key cryptography, where you can share the public key safely, you must **not** share a symmetric key via an insecure (e.g. unencrypted) channel. 37 | 38 | Also, revealing an [unsalted/unkeyed hash of a key](https://keymaterial.net/2020/09/07/invisible-salamanders-in-aes-gcm-siv/) **leaks** the identity, violating indistinguishability. Thus, this should be avoided, although a MAC of the key solves this problem. 39 | 40 | ### Keys must be uniformly random 41 | They can be generated in three ways: 42 | - Randomly using a **cryptographically secure** pseudorandom number generator (please see the [Random Numbers](random-numbers.md) section) 43 | - Derived from a **high-entropy** key (e.g. a [shared secret](https://en.wikipedia.org/wiki/Shared_secret)) using a key derivation function (please see the [(Non-Password-Based) Key Derivation Functions](non-password-based-key-derivation-functions.md) section) 44 | - Derived from a password using a **password-based** key derivation function (please see the [Password Hashing/Password-Based Key Derivation](password-hashing-password-based-key-derivation.md) section) 45 | 46 | [String keys](https://littlemaninmyhead.wordpress.com/2021/09/15/if-you-copied-any-of-these-popular-stackoverflow-encryption-code-snippets-then-you-did-it-wrong/), passwords/passphrases, any other low-entropy information, and [shared secrets](https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange#General_overview) must **not** be used directly! 47 | 48 | ### Never use a key for more than one thing 49 | Keys should be used with a single algorithm, a single mode of operation (if applicable), and for one purpose. For instance, you should use two **distinct** keys when doing Encrypt-then-MAC. 50 | 51 | This is [recommended](https://crypto.stackexchange.com/questions/8081/using-the-same-secret-key-for-encryption-and-authentication-in-a-encrypt-then-ma) practice because it's safest and there can/may be overlap between algorithms. 52 | 53 | For Encrypt-then-MAC, there are two main ways of doing this: 54 | - With passwords, use a larger output length (e.g. 512 bits) for the password-based KDF and split the output into two keys (e.g. 256-bit and 256-bit) 55 | - With **high-entropy** keys (e.g. a randomly generated key), you can use a regular KDF twice with the same input keying material but different context information for domain separation 56 | 57 | Please read the [Password Hashing/Password-Based Key Derivation](password-hashing-password-based-key-derivation.md) and [(Non-Password-Based) Key Derivation Functions](non-password-based-key-derivation-functions.md) sections for more important information. 58 | 59 | ### Use a new key each time 60 | Generally, it's sensible to use a unique key each time you encrypt or authenticate a different message. For example, with a key exchange, [ephemeral keys](https://en.wikipedia.org/wiki/Ephemeral_key) should be involved, and the associated public keys [should be used](https://www.rfc-editor.org/rfc/rfc7748#section-7) as context information with the KDF. 61 | 62 | The main exception is [stream encryption](https://www.imperialviolet.org/2014/06/27/streamingencryption.html), as explained in the [Symmetric Encryption](sections/symmetric-encryption.md) Notes section. 63 | 64 | This helps prevent compromise of one key affecting lots of data, [cryptographic wear-out](https://soatok.blog/2020/12/24/cryptographic-wear-out-for-symmetric-encryption/) (using a single key to encrypt too much data), nonce reuse, and reusing keys with multiple algorithms. 65 | 66 | One common way of doing this for file encryption is to: 67 | 1. Randomly generate a unique data encryption key (DEK) for each message 68 | 2. Encrypt the DEK using a key encryption key (KEK), which can be reused for many DEKs, derived using a KDF 69 | 3. Prepend the encrypted DEK to the ciphertext 70 | 71 | For decryption: 72 | 1. Derive the KEK using the KDF 73 | 2. Use it to decrypt the encrypted DEK 74 | 3. Use the DEK to decrypt the ciphertext 75 | 76 | Alternatively, you can derive unique keys using a random salt with a KDF, although this is inefficient when using a password-based KDF since it means a delay for every message. 77 | 78 | ### Erase keys from memory 79 | Once you’ve finished using a key, an attempt should be made to erase it from memory to prevent an attacker with physical or remote access to a machine being able to retrieve it. 80 | 81 | Note that in garbage collected programming languages, such as [C#](https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/), [Go](https://go.dev/blog/ismmkeynote), and [JavaScript](https://javascript.info/garbage-collection), this is difficult to achieve because the garbage collector can copy keys around in memory. 82 | 83 | To ensure proper erasure, you should [pin](https://learn.microsoft.com/en-us/dotnet/api/system.gc.allocatearray) memory, which prevents the key from being copied. Then compiler optimisations whilst zeroing should be [disabled](https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.cryptographicoperations.zeromemory). 84 | 85 | [Locking memory](https://doc.libsodium.org/memory_management#locking-memory) via an external library can also help prevent keys being [written to disk](https://veracrypt.fr/en/Paging%20File.html). -------------------------------------------------------------------------------- /sections/cryptographic-libraries.md: -------------------------------------------------------------------------------- 1 | # Cryptographic Libraries 2 | As a rule of thumb, **if the library doesn't include many of the algorithms I recommend in these guidelines, it's probably bad**. 3 | 4 | ## Use 5 | ### [Libsodium](https://doc.libsodium.org/) 6 | A modern, extremely fast, easy-to-use, well documented, and [audited](https://www.privateinternetaccess.com/blog/libsodium-v1-0-12-and-v1-0-13-security-assessment/) library that covers all common use cases, except for implementing [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security). It's frequently [recommended](https://crypto.stackexchange.com/questions/50760/how-can-a-non-crypto-expert-implement-crypto-libraries-in-a-programming-language/50762#50762) and used by [large companies](https://doc.libsodium.org/libsodium_users#companies-using-libsodium). 7 | 8 | However, it’s much bigger than Monocypher (see below), meaning it’s harder to audit and not suitable for constrained environments. It also annoyingly requires the [Visual C++ Redistributable](https://support.microsoft.com/sl-si/topic/the-latest-supported-visual-c-downloads-2647da03-1eea-4433-9aff-95f26a218cc0) to work on Windows. The latest vcruntime DLLs to bundle with portable programs can be found [here](https://github.com/abbodi1406/vcredist). 9 | 10 | ### [Monocypher](https://monocypher.org/) 11 | Another modern, easy-to-use, well documented, and [audited](https://monocypher.org/quality-assurance/audit) library. Assuming you like [Daniel J. Bernstein](https://en.wikipedia.org/wiki/Daniel_J._Bernstein), it covers typical and some rarer use cases (e.g. steganography support using [Elligator 2](https://elligator.cr.yp.to/)). It's also compatible with libsodium whilst being much smaller, portable, and fast for constrained environments (e.g microcontrollers). 12 | 13 | However, it’s about [half](https://monocypher.org/speed) the speed of libsodium on desktops/servers, has no misuse resistant functions (e.g. like libsodium’s [secretstream()](https://doc.libsodium.org/secret-key_cryptography/secretstream) and [secretbox()](https://doc.libsodium.org/secret-key_cryptography/secretbox)), only supports [Argon2i](https://www.rfc-editor.org/rfc/rfc9106.html#name-introduction) for password hashing, allowing for insecure parameters (please see the [Password Hashing/Password-Based Key Derivation](password-hashing-password-based-key-derivation.md) Notes section), and offers no memory locking, random number generation, or convenience functions (e.g. Base64/hex encoding, padding, etc). 14 | 15 | ### [Tink](https://developers.google.com/tink) 16 | A misuse resistant library by Google that prevents common pitfalls like [nonce reuse](https://www.daemonology.net/blog/2011-01-18-tarsnap-critical-security-bug.html). Unlike Monocypher, it supports [FIPS approved algorithms](https://developers.google.com/tink/FIPS) if that's a requirement. 17 | 18 | However, it doesn’t support hashing or password hashing, it’s not available in [as many programming languages](https://github.com/google/tink/blob/master/docs/PRIMITIVES.md#primitives-supported-by-language) as [libsodium](https://doc.libsodium.org/bindings_for_other_languages) and [Monocypher](https://monocypher.org/download/), the documentation is a bit [harder to navigate](https://github.com/google/tink/tree/master/docs), and it provides access to [some algorithms](https://github.com/google/tink/blob/master/docs/PRIMITIVES.md#primitive-implementations-supported-by-language) that you ideally shouldn’t use. 19 | 20 | ### [LibHydrogen](https://libhydrogen.org) 21 | A lightweight, easy-to-use, hard-to-misuse, and well documented library suitable for constrained environments. 22 | 23 | The downsides are that it's [not compatible](https://monocypher.org/why/) with libsodium whilst also running [slower](https://monocypher.org/speed) than Monocypher. However, it has some advantages over Monocypher, like support for [random number generation](https://github.com/jedisct1/libhydrogen/wiki/Random-numbers) and easy access to [key exchange patterns](https://github.com/jedisct1/libhydrogen/wiki/Key-exchange), among other things. 24 | 25 | ## Avoid 26 | ### A [random library](https://github.com/martijnat/crypturd) (e.g. with 0 stars) on GitHub 27 | Assuming it’s not been written by an [experienced](https://github.com/jedisct1) [professional](https://github.com/FiloSottile) and it’s not a libsodium or Monocypher [binding](https://github.com/ektrah/nsec) to another programming language, you should generally stay away from less popular, unaudited libraries. 28 | 29 | They are much more likely to suffer from vulnerabilities and be significantly slower than the more popular, audited libraries. Also, note that even [experienced professionals make mistakes](https://www.daemonology.net/blog/2011-01-18-tarsnap-critical-security-bug.html). 30 | 31 | ### [OpenSSL](https://www.openssl.org/) 32 | Very [difficult](https://blog.trailofbits.com/2020/05/29/detecting-bad-openssl-usage/) to use, let alone use correctly, offers access to algorithms and functions that you shouldn't use, the [documentation](https://www.openssl.org/docs/) is a mess, and lots of [vulnerabilities](https://www.openssl.org/news/vulnerabilities.html) have been found over the years. 33 | 34 | These issues have led to OpenSSL [forks](https://www.libressl.org/index.html) and new, non-forked [libraries](https://bearssl.org/goals.html) that aim to be better alternatives if you need to implement TLS. However, OpenSSL is sadly the [standard](https://news.ycombinator.com/item?id=25346355) and often [relied upon](https://github.com/dotnet/runtime/issues/52482). 35 | 36 | ### Your [programming language](https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography?view=net-6.0) 37 | Most programming languages provide access to old algorithms (e.g. MD5 and SHA1) that shouldn’t be used anymore instead of newer ones (e.g. BLAKE2, BLAKE3, and SHA3). Alongside missing or unnoticeable [warnings](https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.sha1?view=net-6.0#remarks), this can lead to poor algorithm choices. 38 | 39 | Furthermore, the APIs are typically easy to misuse, the documentation may fail to mention important security related information, and the implementations will be slower than libsodium. 40 | 41 | However, certain languages, such as [Go](https://golang.org/) and [Zig](https://ziglang.org/) have impressive modern cryptography support. 42 | 43 | ### Other popular but unmentioned libraries 44 | For example, [BouncyCastle](https://bouncycastle.org/) and [CryptoJS](https://cryptojs.gitbook.io/docs/). These again often provide or rely on dated algorithms and typically have bad documentation. For instance, CryptoJS uses an [insecure](https://www.npmjs.com/package/evp_bytestokey) KDF called [EVP_BytesToKey()](https://www.openssl.org/docs/man1.1.1/man3/EVP_BytesToKey.html) in OpenSSL when you pass a string password to [AES.encrypt()](https://cryptojs.gitbook.io/docs/#ciphers), and BouncyCastle has [no](https://github.com/bcgit/bc-csharp/wiki) C# documentation. 45 | 46 | However, this avoidance recommendation is too broad really since there are *some* libraries that I haven't mentioned that are worth using, like [PASETO](https://github.com/paragonie/paseto) and various [RustCrypto](https://github.com/RustCrypto) libraries. Just do your research and assess the quality of the documentation. **There's no excuse for poor documentation**. 47 | 48 | ### [NaCl](https://nacl.cr.yp.to/) 49 | An unmaintained, less modern, and more confusing version of libsodium and Monocypher. For example, [crypto_sign()](https://nacl.cr.yp.to/sign.html) for digital signatures has been experimental for many years. It also doesn’t have password hashing support and is [difficult to install/package](https://monocypher.org/why). 50 | 51 | ### [TweetNaCl](https://tweetnacl.cr.yp.to/) 52 | Unmaintained, [slower](https://monocypher.org/speed) than Monocypher, doesn’t offer access to newer algorithms, doesn’t have password hashing, and [doesn’t zero out buffers](https://monocypher.org/why). 53 | 54 | ## Notes 55 | ### Older algorithms aren't necessarily better 56 | You can argue that older algorithms are more battle-tested and therefore proven to be a safe choice, but the reality is that most modern algorithms, like [ChaCha20-Poly1305](https://www.rfc-editor.org/rfc/rfc8439.html), [BLAKE2](https://www.blake2.net/), and [Argon2](https://www.rfc-editor.org/rfc/rfc9106.html), have been properly analysed at this point and shown to offer security and performance [benefits](https://eprint.iacr.org/2019/1492.pdf) over their older counterparts. 57 | 58 | Therefore, it doesn’t make sense to stick to this overly cautious mindset of avoiding newer algorithms unless they [lack](https://github.com/tscholl2/siec) analysis or are still candidates in a [competition](https://csrc.nist.gov/projects/post-quantum-cryptography) (e.g. new post-quantum algorithms), which do [need](https://csrc.nist.gov/CSRC/media/Projects/Post-Quantum-Cryptography/documents/round-1/official-comments/guess-again-official-comment.pdf) further analysis to be considered safe. 59 | 60 | ### Read the documentation 61 | Don’t immediately jump into coding something because that’s how mistakes are made. Good libraries have high quality documentation that will explain potential security pitfalls and how to avoid them. 62 | 63 | I also **strongly** recommend reading bits of [RFC standards](https://datatracker.ietf.org/doc/html/rfc2104) for algorithms you're using, particularly the ['Security Considerations'](https://datatracker.ietf.org/doc/html/rfc7748#section-7) sections. 64 | 65 | ### Be aware of dodgy design 66 | If you're using a recommended library, then this probably won't be a problem. If you're not, some libraries do bad things like releasing unauthenticated plaintext, which shouldn't be touched, when using AEADs. For example, OpenSSL and BouncyCastle [apparently do](https://gotchas.salusa.dev/). 67 | 68 | ### Speed matters 69 | It can make a noticeable difference for the user. For instance, a [C# Argon2 library](https://github.com/kmaragon/Konscious.Security.Cryptography) is going to be significantly slower than Argon2 in [libsodium](https://doc.libsodium.org/), meaning unnecessary and unwanted extra delay during key derivation. 70 | 71 | Libsodium is the go-to for speed on desktops/servers, and Monocypher is the go-to for constrained environments (e.g. microcontrollers). -------------------------------------------------------------------------------- /sections/hashing.md: -------------------------------------------------------------------------------- 1 | # Hashing 2 | ## Use 3 | ### BLAKE2 4 | [Faster](https://www.blake2.net/) than MD5, SHA-1, SHA-2, and SHA-3, yet as real-world [secure](https://eprint.iacr.org/2019/1492.pdf) as SHA-3. It relies on essentially the same core algorithm (borrowed from ChaCha20) as BLAKE, which received a [significant amount of cryptanalysis](https://nvlpubs.nist.gov/nistpubs/ir/2012/NIST.IR.7896.pdf) as part of the [SHA-3 competition](https://competitions.cr.yp.to/sha3.html). 5 | 6 | It's available in [many](https://www.blake2.net/#us) cryptographic libraries and is used in [password hashing schemes](https://www.rfc-editor.org/rfc/rfc9106.html) and well-known software (e.g. [WireGuard](https://www.wireguard.com/protocol/) and the [Linux kernel](https://www.kernel.org/)). 7 | 8 | There are two main variants. In most cases, use BLAKE2b-256, -384, or -512. However, on 8- to 32-bit platforms, BLAKE2s-256 will be more performant. 9 | 10 | The biggest weakness is the [large](https://www.imperialviolet.org/2017/05/31/skipsha3.html) number of variants (e.g. BLAKE2x to get an XOF), but this issue is hidden from users by cryptographic libraries. 11 | 12 | ### SHAKE 13 | Still [part of](https://en.wikipedia.org/wiki/SHA-3#Instances) the SHA-3 standard but [faster](https://en.wikipedia.org/wiki/SHA-3#Speed) than SHA-3 (similar to SHA-2) and an XOF. 14 | 15 | SHAKE128 provides a 128-bit security level with at least a 256-bit output, and SHAKE256 provides a 256-bit security level with at least a 512-bit output. 16 | 17 | To get the same security level as SHA3-256/SHA-256, you should use SHAKE256 with a 256-bit output. This provides 128-bit collision resistance. 18 | 19 | ### SHA-3 20 | [Relatively slow](https://www.imperialviolet.org/2017/05/31/skipsha3.html) in software, but the [new standard](https://www.nist.gov/publications/sha-3-standard-permutation-based-hash-and-extendable-output-functions), [fast](https://keccak.team/2017/is_sha3_slow.html) in hardware, [well analysed](https://keccak.team/third_party.html), [very different](https://keccak.team/keccak.html) to SHA-2 due to the praised [sponge construction](https://keccak.team/sponge_duplex.html), and has a [higher security margin](https://eprint.iacr.org/2019/1492.pdf) than the other algorithms listed here. 21 | 22 | The main criticism aside from speed is that the security is [over the top](https://crypto.stackexchange.com/a/70582). Even the designers [agree](https://en.wikipedia.org/wiki/SHA-3#Weakening_controversy) as they've developed [reduced functions](https://keccak.team/kangarootwelve.html) based on it. However, those alternative functions haven't seen much use compared to BLAKE2/BLAKE3. 23 | 24 | ### BLAKE3 25 | The [fastest](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf) cryptographic hash in software when accelerated at the cost of receiving less analysis, having a [lower security margin](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf), and being limited to a [128-bit security level](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf). It's also less accessible via cryptographic libraries than the other recommended algorithms. 26 | 27 | However, it has the BLAKE legacy behind it and improves on BLAKE2 in that there’s only one variant that covers all use cases (e.g. a KDF and XOF too). 28 | 29 | The speed is a huge bonus when using it as a MAC for Encrypt-then-MAC because it holds up against Poly1305 (from ChaCha20-Poly1305) and GHASH (from AES-GCM) when hashing enough data whilst providing stronger security guarantees (e.g. collision resistance for [committing security](https://youtu.be/dZqEtrLh9aM)). 30 | 31 | ### SHA-2 32 | The most popular hash function. It’s widely available in cryptographic libraries, still secure after many years of [cryptanalysis](https://en.wikipedia.org/wiki/SHA-2#Cryptanalysis_and_validation), and offers decent performance. 33 | 34 | However, unlike the other recommendations, it suffers from [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack) (please see the [Notes](#beware-of-length-extension-attacks) section) and uses the [Merkle-Damgård construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction), which [isn't used](https://crypto.stackexchange.com/a/83279) for new hash functions. It also [wasn't](https://keccak.team/keccak_strengths.html) the result of an open competition. 35 | 36 | Thus, to take advantage of security improvements, a newer hash function should be used if possible. 37 | 38 | ## Avoid 39 | ### Non-cryptographic hash functions 40 | For example, [Meow Hash](https://peter.website/meow-hash-cryptanalysis) and error-detecting codes (e.g. [CRC](https://en.wikipedia.org/wiki/Cyclic_redundancy_check)). These are **not secure**. 41 | 42 | ### MD5 and SHA-1 43 | Both are very old and **no longer secure**. For instance, there’s an [attack](https://eprint.iacr.org/2013/170.pdf) that breaks MD5 collision resistance in 218 time, which takes less than a second to execute on an ordinary computer. 44 | 45 | Obviously, don't use their older counterparts either (e.g. MD4 and SHA-0) because they're **even less secure**. 46 | 47 | ### Streebog 48 | It has a [flawed](https://eprint.iacr.org/2016/071.pdf) [S-Box](https://eprint.iacr.org/2019/092.pdf), with no design rationale ever being made public, which is **likely a [backdoor](https://www.schneier.com/blog/archives/2019/05/cryptanalyzing_.html)**. It's somehow available in [VeraCrypt](https://www.veracrypt.fr/en/Streebog.html), but I've luckily not seen it used anywhere else. 49 | 50 | ### Non-finalist SHA-3 candidates 51 | For example, [Edon-R](https://eprint.iacr.org/2009/378.pdf), which is **broken**. Nobody uses these, and they've received less scrutiny than finalists. 52 | 53 | ### SipHash 54 | Despite the name, this is a [MAC](https://en.wikipedia.org/wiki/Message_authentication_code) (please see the [Message Authentication Codes](message-authentication-codes.md) section), meaning it requires a key. It's also [not collision resistant](https://crypto.stackexchange.com/questions/35086/siphashs-non-collision-resistance). 55 | 56 | ### Chaining hash functions 57 | For instance, `SHA-256(SHA-1(message))` or [SHA-256d](https://crypto.stackexchange.com/a/7896). This can be [**insecure**](https://crypto.stackexchange.com/a/44454) (an inner collision means an outer collision) and is obviously less efficient than hashing once. 58 | 59 | ### RIPEMD 60 | The original RIPEMD has [collisions](https://eprint.iacr.org/2004/199.pdf) and RIPEMD-128 has a small output size, meaning they're **insecure**. 61 | 62 | Then the longer variants (e.g. RIPEMD-160) are still old, not long enough, unpopular, suffer from [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack), and have worse performance and have received less analysis compared to the recommended algorithms. 63 | 64 | Fun fact, RIPEMD-256 uses RIPEMD-128 internally, and RIPEMD-320 uses RIPEMD-160 internally. This means the longer versions provide the same security level as their half-sized counterpart, which isn't what you want. 65 | 66 | ### Cryptographic hash functions nobody uses 67 | Whirlpool, Tiger, SHA-224, and so on. These are all worse in one way or another than the recommended algorithms. 68 | 69 | For instance, there are attacks too close for comfort on Tiger plus there are [weird versions](https://crypto.stackexchange.com/questions/28986/what-is-tiger192-4-in-php), Whirlpool is [slower](https://www.cryptopp.com/benchmarks.html) than most other cryptographic hash functions and only produces a 512-bit output, and SHA-224 only provides [112-bit collision resistance](https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions), which is below the recommended 128-bit security level. 70 | 71 | ### 128-bit hashes 72 | These only provide a 64-bit security level when you want 128-bit security, which requires using a 256-bit output. 73 | 74 | ### KangarooTwelve 75 | From the people behind Keccak/SHA-3, much [faster](https://keccak.team/2017/is_sha3_slow.html) than SHA-3 and SHAKE, has a safe security margin, and has no variants. However, it's not that accessible, rarely used, and only offers [128-bit security](https://crypto.stackexchange.com/a/46529) like SHAKE128/BLAKE3. 76 | 77 | ## Notes 78 | ### These are not suitable for password hashing 79 | Regular hash functions are fast, whereas password hashing [needs to be slow](https://crypto.stackexchange.com/a/3198) to prevent [password cracking](https://en.wikipedia.org/wiki/Password_cracking). Furthermore, password hashing requires using a **random** salt for each password to derive unique hashes when given the same input and to protect against [precomputation attacks](https://en.wikipedia.org/wiki/Rainbow_table). 80 | 81 | ### These are not suitable for authentication 82 | These algorithms are unkeyed. You need to use a [MAC](https://en.wikipedia.org/wiki/Message_authentication_code) (please see the [Message Authentication Codes](message-authentication-codes.md) section), such as keyed BLAKE2b-256 or HMAC-SHA-256, for authentication because they provide the [appropriate security guarantees](https://en.wikipedia.org/wiki/Message_authentication_code#Security). 83 | 84 | ### Beware of length extension attacks 85 | MD4, MD5, SHA-1, RIPEMD, Whirlpool, SHA-256, and SHA-512 are susceptible to [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack). 86 | 87 | An attacker can use `Hash(message1)` and the length of `message1` to calculate `Hash(message1 || message2)` for an attacker-controlled `message2`, without knowing `message1`. 88 | 89 | Therefore, **concatenating things (e.g. `Hash(secret || message)`) with these algorithms is a bad idea**. 90 | 91 | Use any of the non-SHA-2 recommendations instead because they're not susceptible. SHA-512/256, SHA-384, and HMAC-SHA-2 aren't either. 92 | 93 | ### Concatenation requires care 94 | When feeding multiple inputs into a hash function, you [need to be careful](https://soatok.blog/2021/07/30/canonicalization-attacks-against-macs-and-signatures/) to avoid canonicalization attacks. As this is mostly relevant for MACs, please read the [Message Authentication Codes](message-authentication-codes.md) Notes section for more details. 95 | 96 | ### Hash functions do not increase entropy 97 | If you hash a single ASCII character, there are still only 128 possible values. Therefore, [prehashing passwords](https://crypto.stackexchange.com/questions/66581/is-there-an-advantage-to-using-a-hash-in-combination-with-a-key-derivation-funct) before using a password-based KDF doesn't improve the entropy of the password. 98 | 99 | ### Truncation 100 | With a modern, fixed-length hash (e.g. SHA3-512), use the full standard output length if possible, meaning no truncation. This maximises the security level. 101 | 102 | If the hash function lets you specify a range (e.g. BLAKE2b), use that instead of manual truncation. 103 | 104 | The same is true for XOFs (e.g. SHAKE and BLAKE3) because that's what they're designed for. However, with XOFs, a larger output for the same input will also start the same. 105 | 106 | If you can use a fixed-length hash that does the truncation for you internally (e.g. SHA-512/256), use that instead of manual truncation. This provides [domain separation](https://crypto.stackexchange.com/questions/60966/which-attacks-are-prevented-by-the-different-initial-hash-values-for-sha-2-with). 107 | 108 | Otherwise, you can take the first n bits from the left to get an n-bit hash. This is [safe](https://crypto.stackexchange.com/a/3156), but you lose domain separation compared to the above. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Attribution-ShareAlike 4.0 International Public License 2 | 3 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. 4 | 5 | Section 1 – Definitions. 6 | 7 | Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. 8 | Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. 9 | BY-SA Compatible License means a license listed at creativecommons.org/compatiblelicenses, approved by Creative Commons as essentially the equivalent of this Public License. 10 | Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. 11 | Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. 12 | Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. 13 | License Elements means the license attributes listed in the name of a Creative Commons Public License. The License Elements of this Public License are Attribution and ShareAlike. 14 | Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License. 15 | Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. 16 | Licensor means the individual(s) or entity(ies) granting rights under this Public License. 17 | Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. 18 | Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. 19 | You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. 20 | 21 | Section 2 – Scope. 22 | 23 | License grant. 24 | Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: 25 | reproduce and Share the Licensed Material, in whole or in part; and 26 | produce, reproduce, and Share Adapted Material. 27 | Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 28 | Term. The term of this Public License is specified in Section 6(a). 29 | Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material. 30 | Downstream recipients. 31 | Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. 32 | Additional offer from the Licensor – Adapted Material. Every recipient of Adapted Material from You automatically receives an offer from the Licensor to exercise the Licensed Rights in the Adapted Material under the conditions of the Adapter’s License You apply. 33 | No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 34 | No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). 35 | 36 | Other rights. 37 | Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 38 | Patent and trademark rights are not licensed under this Public License. 39 | To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties. 40 | 41 | Section 3 – License Conditions. 42 | 43 | Your exercise of the Licensed Rights is expressly made subject to the following conditions. 44 | 45 | Attribution. 46 | 47 | If You Share the Licensed Material (including in modified form), You must: 48 | retain the following if it is supplied by the Licensor with the Licensed Material: 49 | identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); 50 | a copyright notice; 51 | a notice that refers to this Public License; 52 | a notice that refers to the disclaimer of warranties; 53 | a URI or hyperlink to the Licensed Material to the extent reasonably practicable; 54 | indicate if You modified the Licensed Material and retain an indication of any previous modifications; and 55 | indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 56 | You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 57 | If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. 58 | ShareAlike. 59 | 60 | In addition to the conditions in Section 3(a), if You Share Adapted Material You produce, the following conditions also apply. 61 | The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License. 62 | You must include the text of, or the URI or hyperlink to, the Adapter's License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share Adapted Material. 63 | You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply. 64 | 65 | Section 4 – Sui Generis Database Rights. 66 | 67 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: 68 | 69 | for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database; 70 | if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material, including for purposes of Section 3(b); and 71 | You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. 72 | 73 | For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. 74 | 75 | Section 5 – Disclaimer of Warranties and Limitation of Liability. 76 | 77 | Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You. 78 | To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You. 79 | 80 | The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. 81 | 82 | Section 6 – Term and Termination. 83 | 84 | This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. 85 | 86 | Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: 87 | automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 88 | upon express reinstatement by the Licensor. 89 | For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. 90 | For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. 91 | Sections 1, 5, 6, 7, and 8 survive termination of this Public License. 92 | 93 | Section 7 – Other Terms and Conditions. 94 | 95 | The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. 96 | Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. 97 | 98 | Section 8 – Interpretation. 99 | 100 | For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. 101 | To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. 102 | No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. 103 | Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. 104 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Cryptography Guidelines 2 | ![Creative Commons License Icon](https://i.creativecommons.org/l/by-sa/4.0/88x31.png) This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) because it took bloody ages to write. 3 | 4 | ## Background 5 | This document outlines recommendations for cryptographic algorithm choices and parameters as well as important implementation details based on what I have learnt from reading about the subject and the consensus I have observed online. Note that *some* knowledge of cryptography is required to understand the terminology used in these guidelines. 6 | 7 | My goal with these guidelines is to provide a resource that I wish I had access to when I first started writing programs related to cryptography. If this information helps prevent even just one vulnerability, then I consider it time well spent. 8 | 9 | > **Note** 10 | > 11 | > This document is slowly being rewritten and split into individual pages. Please view the sections folder for the latest information. 12 | 13 | ## Acknowledgements 14 | These guidelines were inspired by [this](https://gist.github.com/atoponce/07d8d4c833873be2f68c34f9afc5a78a#file-gistfile1-md) Cryptographic Best Practices gist, Latacora's [Cryptographic Right Answers](https://latacora.singles/2018/04/03/cryptographic-right-answers.html), and [Crypto Gotchas](https://github.com/SalusaSecondus/CryptoGotchas), which is licensed under the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/). The difference is that I mention newer algorithms and have tried to justify my algorithm recommendations whilst also offering important notes about using them correctly. 15 | 16 | ## Contribute 17 | If you find these guidelines helpful, please **star** this repository and **share** the link around. Doing so might just prevent someone from making a catastrophic mistake. 18 | 19 | If you have any **feedback**, please **contact me** privately [here](https://samuellucas.com/) or publicly [here](https://github.com/samuel-lucas6/Cryptography-Guidelines/discussions) to help improve these guidelines. [Pull requests](https://github.com/samuel-lucas6/Cryptography-Guidelines/pulls) are also welcome but please be prepared for things to be reworded. 20 | 21 | ## Disclaimer 22 | I’m a psychology undergraduate with an interest in applied cryptography, not an experienced cryptographer. I primarily have experience with the [libsodium](https://doc.libsodium.org/) library since that’s what I’ve used for my projects, but I've also reported some security vulnerabilities related to cryptography. 23 | 24 | Most experienced cryptographers don't have the time to write things like this, and the following information is freely available online or in books, so whilst more experience would be beneficial, I’m trying my best to provide accurate information that can be fact checked. **If I've made a mistake, please contact me to get it fixed**. 25 | 26 | Note that the rankings are based on my opinion, algorithm availability in cryptographic libraries, and which algorithms are typically used in modern protocols, such as [TLS 1.3](https://www.davidwong.fr/tls13/), [Noise Protocol Framework](https://noiseprotocol.org/noise.html), [WireGuard](https://www.wireguard.com/protocol/), and so on. Such protocols and recommended practices make for the best guidelines because they’ve been approved by experienced professionals. 27 | 28 | ## General Guidance 29 | 1. Research, research, research: you often don’t need to know how cryptographic algorithms work under the hood to implement them correctly, just like how you don’t need to know how a car works to drive. However, you need to know enough about what you’re trying to do, which requires looking up relevant information online or in books, reading the documentation for the cryptographic library you’re using, reading RFC standards, reading helpful blog posts, and reading guidelines like this one. Furthermore, reading books about the subject in general will be beneficial, again like how knowing about cars can help if you break down. For a list of great resources, check out my [How to Learn About Cryptography](https://samuellucas.com/blog/how-to-learn-about-cryptography.html) blog post. 30 | 31 | 2. Check and check again: it’s your responsibility to get things right the first time around to the best of your ability rather than relying on peer review. Therefore, I **strongly** recommend always reading over security sensitive code at least twice and testing it to ensure that it’s operating as expected (e.g. checking the value of variables line by line using a debugger, using test vectors, etc). 32 | 33 | 3. Peer review is great but often doesn’t happen: unless your project is popular, you have a bug bounty program with cash rewards, or what you’re developing is for an organisation, very few people, perhaps none, will look through the code to find and report vulnerabilities. Similarly, receiving funding for a code audit will probably be impossible. 34 | 35 | 4. **Please don't create your own custom cryptographic algorithms (e.g. a custom cipher or hash function)**: this is like flying a Boeing 747 without a pilot license but worse because even experienced cryptographers design [insecure](https://competitions.cr.yp.to/sha3.html) algorithms, which is why cryptographic algorithms are thoroughly analysed by a large number of cryptanalysts, usually as part of a [competition](https://competitions.cr.yp.to/index.html). By contrast, you rarely see experienced airline pilots crashing planes. The only *exception* to this rule is implementing something like Encrypt-then-MAC with secure, **existing** cryptographic algorithms **when you know what you're doing**. 36 | 37 | 5. **Please avoid coding existing cryptographic algorithms yourself (e.g. coding AES yourself)**: cryptographic libraries provide access to these algorithms for you to prevent people from making mistakes that cause vulnerabilities and to offer good performance. Whilst a select few algorithms are relatively simple to implement, like [HKDF](https://datatracker.ietf.org/doc/html/rfc5869), [many aren't](https://loup-vaillant.fr/articles/implementing-elligator) and require a great deal of experience to implement correctly. Lastly, another reason to avoid doing this is that it's not much fun since academic papers and reference implementations can be very difficult to understand. 38 | 39 | ## Cryptographic Libraries 40 | #### Use (in order): 41 | 1. [Libsodium](https://doc.libsodium.org/): a modern, extremely fast, easy-to-use, well documented, and [audited](https://www.privateinternetaccess.com/blog/libsodium-v1-0-12-and-v1-0-13-security-assessment/) library that covers all common use cases, except for implementing TLS. However, it’s much bigger than Monocypher, meaning it’s harder to audit and not suitable for constrained environments, and requires the [Visual C++ Redistributable](https://support.microsoft.com/sl-si/topic/the-latest-supported-visual-c-downloads-2647da03-1eea-4433-9aff-95f26a218cc0) to work on Windows. 42 | 43 | 2. [Monocypher](https://monocypher.org/): another modern, easy-to-use, well documented, and [audited](https://monocypher.org/quality-assurance/audit) library, but it’s about [half](https://monocypher.org/speed) the speed of libsodium on desktops/servers, has no misuse resistant functions (e.g. like libsodium’s [secretstream()](https://doc.libsodium.org/secret-key_cryptography/secretstream) and [secretbox()](https://doc.libsodium.org/secret-key_cryptography/secretbox)), only supports Argon2i for password hashing, allowing for insecure parameters (please see the [Password Hashing/Password-Based Key Derivation Notes](#notes-6) section), and offers no memory locking, random number generation, or convenience functions (e.g. Base64/hex encoding, padding, etc). However, it’s compatible with libsodium whilst being much smaller, portable, and fast for constrained environments (e.g microcontrollers). 44 | 45 | 3. [Tink](https://developers.google.com/tink): a misuse resistant library that prevents common pitfalls, like nonce reuse. However, it doesn’t support hashing or password hashing, it’s not available in as many programming languages as libsodium and Monocypher, the documentation is a bit harder to navigate, and it provides access to some algorithms that you shouldn’t use. 46 | 47 | 4. [LibHydrogen](https://libhydrogen.org): a lightweight, easy-to-use, hard-to-misuse, and well documented library suitable for constrained environments. The downsides are that it's not compatible with libsodium whilst also running [slower](https://monocypher.org/speed) than Monocypher. However, it has some advantages over Monocypher, like support for random number generation, even on Arduino boards, and easy access to key exchange patterns, among other things. 48 | 49 | #### Avoid (in order): 50 | 1. A random library (e.g. with 0 stars) on GitHub: assuming it’s not been written by an experienced professional and it’s not a libsodium or Monocypher [binding](https://github.com/ektrah/nsec) to another programming language, you should generally stay away from less popular, unaudited libraries. They are much more likely to suffer from vulnerabilities and be significantly slower than the more popular, audited libraries. Also, note that even [experienced professionals make mistakes](https://www.daemonology.net/blog/2011-01-18-tarsnap-critical-security-bug.html). 51 | 52 | 2. [OpenSSL](https://www.openssl.org/): very [difficult](https://blog.trailofbits.com/2020/05/29/detecting-bad-openssl-usage/) to use, let alone use correctly, offers access to algorithms and functions that you shouldn't use, the [documentation](https://www.openssl.org/docs/) is a mess, and lots of [vulnerabilities](https://www.openssl.org/news/vulnerabilities.html) have been found over the years. These issues have led to OpenSSL [forks](https://www.libressl.org/index.html) and new, non-forked [libraries](https://bearssl.org/goals.html) that aim to be better alternatives if you need to implement TLS. 53 | 54 | 3. The library available in your [programming language](https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography?view=net-6.0): most languages provide access to old algorithms (e.g. MD5 and SHA1) that shouldn’t be used anymore instead of newer ones (e.g. BLAKE2, BLAKE3, and SHA3), which can lead to poor algorithm choices. Furthermore, the APIs are typically easy to misuse, the documentation may fail to mention important security related information, and the implementations will be slower than libsodium. However, certain languages, such as [Go](https://golang.org/) and [Zig](https://ziglang.org/) have impressive modern cryptography support. 55 | 56 | 4. Other popular libraries I haven’t mentioned (e.g. [BouncyCastle](https://bouncycastle.org/), [CryptoJS](https://cryptojs.gitbook.io/docs/), etc): these again often provide or rely on dated algorithms and typically have bad documentation. For instance, CryptoJS uses an [insecure](https://www.npmjs.com/package/evp_bytestokey) KDF called [EVP_BytesToKey()](https://www.openssl.org/docs/man1.1.1/man3/EVP_BytesToKey.html) in OpenSSL when you pass a string password to [AES.encrypt()](https://cryptojs.gitbook.io/docs/#ciphers), and BouncyCastle has no C# documentation. However, this recommendation is too broad really since there are *some* libraries that I haven't mentioned that are worth using, like [PASETO](https://github.com/paragonie/paseto). Therefore, as a rule of thumb, **if it doesn't include several of the algorithms I recommend in this document, then it's probably bad**. Just do your research and assess the quality of the documentation. There's no excuse for poor documentation. 57 | 58 | 5. [NaCl](https://nacl.cr.yp.to/): an unmaintained, less modern, and more confusing version of libsodium and Monocypher. For example, [crypto_sign()](https://nacl.cr.yp.to/sign.html) for digital signatures has been [experimental](https://nacl.cr.yp.to/sign.html) for several years. It also doesn’t have password hashing support and is [difficult to install/package](https://monocypher.org/why). 59 | 60 | 6. [TweetNaCl](https://tweetnacl.cr.yp.to/): unmaintained, [slower](https://monocypher.org/speed) than Monocypher, doesn’t offer access to newer algorithms, doesn’t have password hashing, and [doesn’t zero out buffers](https://monocypher.org/why). 61 | 62 | #### Notes: 63 | 1. If the library you’re currently using/planning to use doesn’t support several of the algorithms I’m recommending, then it’s time to upgrade and take advantage of the improved security and performance benefits available to you if you switch. 64 | 65 | 2. Please read the documentation: don’t immediately jump into coding something because that’s how mistakes are made. Good libraries have high quality documentation that will explain potential security pitfalls and how to avoid them. 66 | 67 | 3. Some libraries release unauthenticated plaintext when using AEADs: for example, OpenSSL and BouncyCastle [apparently do](https://github.com/SalusaSecondus/CryptoGotchas). Firstly, don’t use these libraries for this reason and the reasons I’ve already listed. Secondly, **never do anything with unauthenticated plaintext; ignore it to be safe**. 68 | 69 | 4. Older doesn't mean better: you can argue that older algorithms are more battle tested and therefore proven to be a safe choice, but the reality is that most modern algorithms, like ChaCha20, BLAKE2, and Argon2, have been properly analysed at this point and shown to offer security and performance benefits over their older counterparts. Therefore, it doesn’t make sense to stick to this overly cautious mindset of avoiding newer algorithms, except for algorithms that are still candidates in a [competition](https://csrc.nist.gov/projects/post-quantum-cryptography) (e.g. new post-quantum algorithms), which do need further analysis to be considered safe. 70 | 71 | 5. You should prioritise speed: this can make a noticeable difference for the user. For example, a C# Argon2 library is going to be significantly slower than Argon2 in libsodium, meaning unnecessary and unwanted extra delay during key derivation. Libsodium is the go-to for speed on desktops/servers, and Monocypher is the go-to for constrained environments (e.g. microcontrollers). 72 | 73 | ## Symmetric Encryption 74 | #### Use (in order): 75 | 1. [XChaCha20](https://doc.libsodium.org/advanced/stream_ciphers/xchacha20)-then-[BLAKE2b](https://www.blake2.net/) (Encrypt-then-MAC): **if you know what you are doing**, then implementing Encrypt-then-MAC offers better security than an AEAD because it provides better security properties, such as [key commitment](https://eprint.iacr.org/2020/1491.pdf), and allows for a longer authentication tag, making it more suitable for long-term storage. This combo is now being employed by [PASETO](https://github.com/paragonie/paseto/pull/127), an alternative to [JWT](https://jwt.io/), as well as my file encryption software called [Kryptor](https://www.kryptor.co.uk/). ChaCha20 has a [higher security margin](https://eprint.iacr.org/2019/1492.pdf) than AES whilst also being fast in software and running in [constant time](https://cr.yp.to/chacha/chacha-20080128.pdf), meaning it’s not vulnerable to timing attacks like AES [can be](https://cr.yp.to/antiforgery/cachetiming-20050414.pdf). Moreover, [Salsa20](https://cr.yp.to/snuffle/salsafamily-20071225.pdf), the cipher ChaCha20 was based on, underwent rigorous analysis as part of the [eSTREAM competition](https://www.ecrypt.eu.org/stream/e2-salsa20.html), making it into the [final portfolio](https://competitions.cr.yp.to/estream.html). Salsa20 has also received [further analysis](https://en.wikipedia.org/wiki/Salsa20#Cryptanalysis_of_Salsa20) since then. 76 | 77 | 2. [XChaCha20-Poly1305](https://doc.libsodium.org/secret-key_cryptography/aead/chacha20-poly1305/xchacha20-poly1305_construction): this is the gold standard for when you don’t know how to implement Encrypt-then-MAC or need maximum performance on all devices. As mentioned above, ChaCha20 has a higher security margin than AES, always runs in constant time, and (X)ChaCha20-Poly1305 is faster than AES-GCM without AES-NI hardware support. Note that XChaCha20-Poly1305 should be favoured over regular ChaCha20-Poly1305 in many cases because it allows for random nonces, which helps prevent nonce reuse (please see point 1 of the [Notes](#notes-1) section). If you just need a counter nonce or intend to use a unique key for encryption each time, then ChaCha20-Poly1305 is fine. Unfortunately, there are two ChaCha20-Poly1305 constructions - the original [ChaCha20-Poly1305](https://doc.libsodium.org/secret-key_cryptography/aead/chacha20-poly1305/original_chacha20-poly1305_construction) and [ChaCha20-Poly1305-IETF](https://datatracker.ietf.org/doc/html/rfc7539). The original construction is arguably better because it has a smaller nonce, meaning it doesn’t encourage unsafe random nonces, and a larger internal counter, meaning it can encrypt more data using the same key and nonce pair (please see point 5 of the [Notes](#notes-1) section), but the IETF variant is more popular and should therefore almost always be used. 78 | 79 | 3. [XSalsa20-Poly1305](https://doc.libsodium.org/secret-key_cryptography/secretbox): although (X)ChaCha20 has slightly [better diffusion and performance](https://cr.yp.to/chacha/chacha-20080128.pdf) and has seen more adoption in recent years, (X)Salsa20 is, practically speaking, just as [secure](https://en.wikipedia.org/wiki/Salsa20#Cryptanalysis_of_Salsa20), with the same benefits as (X)ChaCha20 over AES (please see points 1 and 2). It has been the recipient of lots of [cryptanalysis](https://en.wikipedia.org/wiki/Salsa20#Cryptanalysis_of_Salsa20) (more than ChaCha20) and is still considered one of the best alternatives to AES. 80 | 81 | 4. [AES-CTR](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_(CTR)) (or [CBC](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_block_chaining_(CBC)))-then-[HMAC](https://datatracker.ietf.org/doc/html/rfc2104) (Encrypt-then-MAC): again, **if you know what you are doing**, this is superior to using an AEAD in terms of security for the reasons outlined in point 1 above. **AES-CTR should be preferred** because AES-CBC is less efficient, requires padding, doesn't support a counter nonce, and can't encrypt as many blocks before a collision occurs. However, both AES-CTR-then-HMAC and AES-CBC-then-HMAC can be faster than AES-GCM without AES-NI hardware support. With that said, generating an IV for CBC and CTR can be a source of trouble, with CBC requiring unpredictable (aka random) IVs and CTR implementations differing in terms of nonce size and whether a random/counter nonce is safe. 82 | 83 | 5. [AEGIS-256](https://competitions.cr.yp.to/round3/aegisv11.pdf): one of the [finalists](https://competitions.cr.yp.to/caesar-submissions.html) for the CAESAR competition. It's much [faster](https://github.com/ziglang/zig/pull/6442#issuecomment-699704030) than AES-GCM and (X)ChaCha20-Poly1305 with hardware support, expected to be [key committing](https://datatracker.ietf.org/doc/html/draft-denis-aegis-aead#section-1), and supports safe random nonces. In Zig, it even performs [better](https://github.com/ziglang/zig/pull/6442#issuecomment-699704293) than (X)ChaCha20-Poly1305 and AES-GCM without hardware support. However, it's [not](https://datatracker.ietf.org/doc/html/draft-denis-aegis-aead#section-6) [compactly committing](https://eprint.iacr.org/2017/664.pdf) because of the short 128-bit tag, so Encrypt-then-MAC is still preferable for security. It has also received little adoption at the time of writing and isn't available in [many](https://github.com/jedisct1/draft-aegis-aead#known-implementations) cryptographic libraries. With that said, it will be in the [next release](https://github.com/jedisct1/libsodium/issues/1028) of libsodium (1.0.0.19-stable) and hopefully [standardised](https://github.com/jedisct1/draft-aegis-aead) given the advantages over AES-GCM. 84 | 85 | 6. [AES-OCB](https://en.wikipedia.org/wiki/OCB_mode): another one of the [finalists](https://competitions.cr.yp.to/caesar-submissions.html) for the CAESAR competition. It performs [well](https://github.com/jedisct1/rust-aegis#other-implementations) compared to AES-GCM and (X)ChaCha20-Poly1305 with hardware support, supports random nonces, has been [researched](https://competitions.cr.yp.to/round3/ocbv11.pdf) for over a decade, the design is efficient and timing-attack resistant (assuming the block cipher implementation is), and it's available in [some](https://en.wikipedia.org/wiki/Comparison_of_cryptography_libraries#Cipher_modes) cryptographic libraries. However, it's [slower](https://github.com/jedisct1/zig-rocca#readme) than AEGIS and not key committing. 86 | 87 | 7. [AES-GCM](https://en.wikipedia.org/wiki/Galois/Counter_Mode): the industry standard despite it [not being the best](https://doc.libsodium.org/secret-key_cryptography/aead#limitations) and receiving [various criticism](https://soatok.blog/2020/05/13/why-aes-gcm-sucks/). It’s easier to use correctly than Encrypt-then-MAC and faster than (X)ChaCha20-BLAKE2b, (X)ChaCha20-Poly1305, XSalsa20-Poly1305, and AES-CTR-then-HMAC/AES-CBC-then-HMAC with AES-NI hardware support, but it's slow without hardware support, it has a weird nonce size (96 bits) that means you should use a counter nonce, [some](https://pycryptodome.readthedocs.io/en/latest/src/cipher/modern.html#gcm-mode) implementations incorrectly allow 128-bit nonces (**only use a 96-bit nonce** since longer nonces get [hashed](https://soatok.blog/2020/05/13/why-aes-gcm-sucks/), which could result in multiple nonces producing some of the same AES-CTR output), reusing a nonce is more [catastrophic](https://eprint.iacr.org/2016/475.pdf) than in AES-CBC for example, and there are [relatively small](https://doc.libsodium.org/secret-key_cryptography/aead#limitations) max encryption limits for a single key (e.g. ~350 GB when using 16 KB long messages). Furthermore, there can be [side-channels](https://eprint.iacr.org/2009/129.pdf) in software implementations and mitigating them [reduces the speed](https://doc.libsodium.org/secret-key_cryptography/aead#aes-256-gcm) of the algorithm. Therefore, AES-GCM should only be used when there’s hardware support, although I strongly recommend the above algorithms instead regardless. 88 | 89 | #### Avoid (not in order because they’re all bad): 90 | 1. Your own [custom](https://github.com/Serpent27/PARSEC) symmetric encryption algorithm: even experienced cryptographers design [insecure](https://competitions.cr.yp.to/sha3.html) algorithms, which is why cryptographic algorithms are thoroughly analysed by a large number of cryptanalysts, usually as part of a [competition](https://competitions.cr.yp.to/index.html). 91 | 92 | 2. [AES-ECB](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Electronic_codebook_(ECB)): identical plaintext blocks get encrypted into identical ciphertext blocks, which means the algorithm lacks diffusion and fails to hide data patterns. In other words, it’s **horribly insecure** in the vast majority of contexts. 93 | 94 | 3. [RC4](https://en.wikipedia.org/wiki/RC4): there are lots of [attacks](https://en.wikipedia.org/wiki/RC4#Security) against it, rendering it **horribly insecure**. 95 | 96 | 4. **Unauthenticated** [AES-CBC](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CBC), [AES-CTR](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_(CTR)), [ChaCha20](https://en.wikipedia.org/wiki/Salsa20#ChaCha_variant), and other **unauthenticated ciphers without a MAC**: this allows an attacker to tamper with the ciphertext without detection and can sometimes allow for other attacks, like [padding oracle attacks](https://en.wikipedia.org/wiki/Padding_oracle_attack) in the case of AES-CBC. 97 | 98 | 5. [One-time pad](https://en.wikipedia.org/wiki/One-time_pad): completely impractical since the key needs to be the same size as the message, and a **true** random number generator (e.g. atmospheric noise) is required to generate the keystream for it to be impossible to decrypt. Furthermore, some people incorrectly assume an [XOR cipher](https://en.wikipedia.org/wiki/XOR_cipher) with a repeating key is equivalent to a one-time pad, but this is [**horribly insecure**](https://en.wikipedia.org/wiki/XOR_cipher#Use_and_security). **Never do this**. 99 | 100 | 6. [Kuznyechik](https://en.wikipedia.org/wiki/Kuznyechik): it has a [flawed](https://eprint.iacr.org/2016/071.pdf) [S-Box](https://eprint.iacr.org/2019/092), with no design rational ever being made public, which is **likely a [backdoor](https://www.schneier.com/blog/archives/2019/05/cryptanalyzing_.html)**. This algorithm is available in [VeraCrypt](https://www.veracrypt.fr/en/Kuznyechik.html), but I've luckily not seen it used anywhere else. **Never use it or any program/protocol relying on it**. 101 | 102 | 7. [Blowfish](https://en.wikipedia.org/wiki/Blowfish_(cipher)), [CAST-128](https://en.wikipedia.org/wiki/CAST-128), [GOST](https://en.wikipedia.org/wiki/GOST_(block_cipher)), [IDEA](https://en.wikipedia.org/wiki/International_Data_Encryption_Algorithm), [3DES](https://en.wikipedia.org/wiki/Triple_DES), [DES](https://en.wikipedia.org/wiki/Data_Encryption_Standard), [RC2](https://en.wikipedia.org/wiki/RC2), and **any cipher with a 64-bit block size**: a 64-bit block size means [collision attacks](https://sweet32.info/) can be performed after encrypting a certain amount of data using the same key. **Don’t use any algorithm with a block size less than 128 bits**. Using algorithms with an even larger block size (e.g. ChaCha20 and Salsa20, which are stream ciphers that operate using 512-bit blocks) is even more preferable because a 128-bit block size can still lead to [collisions](https://soatok.blog/2020/12/24/cryptographic-wear-out-for-symmetric-encryption/#cryptographic-limits-for-aes-cbc) eventually. Algorithms like DES and 3DES are also very old and have small key sizes that are **insecure** (please see the [Symmetric Key Size](#symmetric-key-size) section). 103 | 104 | 8. [AES-CCM](https://en.wikipedia.org/wiki/CCM_mode), [AES-EAX](https://en.wikipedia.org/wiki/EAX_mode), [AES-CFB](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_feedback_(CFB)), [AES-OFB](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Output_feedback_(OFB)), [Serpent](https://en.wikipedia.org/wiki/Serpent_(cipher)), [Threefish](https://en.wikipedia.org/wiki/Threefish), [Twofish](https://en.wikipedia.org/wiki/Twofish), [Camellia](https://en.wikipedia.org/wiki/Camellia_(cipher)), [RC6](https://en.wikipedia.org/wiki/RC6), [ARIA](https://en.wikipedia.org/wiki/ARIA_(cipher)), [SEED](https://en.wikipedia.org/wiki/SEED), and other ciphers nobody uses: very few people use these because they’re worse in one way or another. For example, AES-CCM uses MAC-then-Encrypt and CBC-MAC, AES-EAX is slower than AES-GCM and uses OMAC, AES-OFB can be insecure since two messages can end up using the same keystream, some of them are unbalanced in terms of security to performance (e.g. Serpent is slower whilst having a high security margin), some have received limited cryptanalysis, and implementations of uncommon non-AES algorithms are very rare in mainstream cryptographic libraries, with random implementations found on GitHub being less likely to be secure because these types of algorithms can be hard to implement correctly. 105 | 106 | 9. [AES-XTS](https://en.wikipedia.org/wiki/Disk_encryption_theory#XTS), [AES-XEX](https://en.wikipedia.org/wiki/Disk_encryption_theory#Xor%E2%80%93encrypt%E2%80%93xor_(XEX)), [AES-LRW](https://en.wikipedia.org/wiki/Disk_encryption_theory#Liskov,_Rivest,_and_Wagner_(LRW)) [AES-CMC](https://en.wikipedia.org/wiki/Disk_encryption_theory#CBC%E2%80%93mask%E2%80%93CBC_(CMC)_and_ECB%E2%80%93mask%E2%80%93ECB_(EME)), [AES-EME](https://en.wikipedia.org/wiki/Disk_encryption_theory#CBC%E2%80%93mask%E2%80%93CBC_(CMC)_and_ECB%E2%80%93mask%E2%80%93ECB_(EME)), and other wide block/disk encryption only modes: **these are not suitable for encrypting data in transit**. They should **only** be used for [disk encryption](https://en.wikipedia.org/wiki/Disk_encryption_theory), with **AES-XTS being preferred** since it’s popular, more secure than some other disk encryption modes, less malleable than AES-CBC and AES-CTR (tampering causes random, unpredictable changes to the plaintext), and ordinary authentication using an AEAD or Encrypt-then-MAC cannot be used for disk encryption because it would require extra storage and slow down read/write speeds, among other things. 107 | 108 | 10. [MORUS](https://competitions.cr.yp.to/round3/morusv2.pdf), [Ascon](http://ascon.iaik.tugraz.at/), [ACORN](https://competitions.cr.yp.to/round3/acornv3.pdf), [Deoxys-II](https://competitions.cr.yp.to/round3/deoxysv141.pdf), [COLM](https://competitions.cr.yp.to/round2/elmdv21.pdf), and non-finalist [CAESAR competition](https://competitions.cr.yp.to/caesar-submissions.html) ciphers: MORUS [doesn't](https://eprint.iacr.org/2019/172.pdf) provide the expected security level, non-finalists should generally never be used, and these finalists are all essentially unavailable in cryptographic libraries. By contrast, AEGIS-256 and AES-OCB have gained some traction, which is why I'm now recommending them. 109 | 110 | 11. [Rocca](https://tosc.iacr.org/index.php/ToSC/article/view/8904/8480): extremely [fast](https://github.com/jedisct1/zig-rocca), key committing, and supports safe random nonces, but it hasn't received proper analysis yet since it's a new scheme. 111 | 112 | 12. [AES-GCM-SIV](https://en.wikipedia.org/wiki/AES-GCM-SIV) and [AES-SIV](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Synthetic_initialization_vector_(SIV)): these [don't](https://www.imperialviolet.org/2017/05/14/aesgcmsiv.html) provide unlimited protection against nonce reuse like some people believe, they're slower than regular AES-GCM, they're rarely available in cryptographic libraries, they rely on Mac-then-Encrypt, and AES-SIV uses [CMAC](https://blog.cryptographyengineering.com/2013/02/15/why-i-hate-cbc-mac) and takes a larger key. If you're concerned about nonces repeating, then you should use XChaCha20-then-MAC, XChaCha20-Poly1305, or AEGIS-256 with a randomly generated nonce or a nonce derived alongside a subkey for encryption using a KDF or MAC, as described [here](https://doc.libsodium.org/secret-key_cryptography/encrypted-messages#short-nonces). If that isn’t possible for some reason, then use AES-GCM-SIV. 113 | 114 | #### Notes: 115 | 1. **Never reuse a nonce/IV with the same key (e.g. never hardcode a nonce/IV)**: doing so is **catastrophic** to security. You must either use a counter nonce, a KDF generated nonce/IV, or a randomly generated nonce/IV, depending on the algorithm you’re using. For instance, you should use a counter nonce (e.g. starting with 12 bytes of zeroes) with ChaCha20-Poly1305 and AES-GCM because the small nonce size (64- or 96 bits) means random nonces are **not** safe unless you're encrypting a small amount of data per key. By contrast, you can use a random or counter nonce safely with XChaCha20-Poly1305 because it has a large nonce size (192 bits). Then AES-CBC **requires** an unpredictable (aka random) 128-bit IV, and some implementations of AES-CTR need a random nonce too, although most involve using a 64- or 96-bit counter nonce for the reasons explained above. Note that if you always rotate the key before encrypting (**never** encrypting anything with the same key more than once), then you *can* get away with using a nonce full of zeroes (e.g. 12 bytes of zeroes for AES-GCM), but I generally wouldn’t recommend doing this, especially if you have to use a 128-bit key, which I again **don't** recommend (please see the [Symmetric Key Size](#symmetric-key-size) section), since this can lead to [multi-target attacks](https://blog.cr.yp.to/20151120-batchattacks.html). 116 | 117 | 2. Prepend the nonce/IV to the ciphertext: this is the recommended approach because it’s read before the ciphertext and doesn't need to be kept secret. However, if you're performing key wrapping (encrypting a key using another key), as described in point 6 below, then you could encrypt the nonce/IV too as an additional layer of protection. 118 | 119 | 3. **Never** use string variables for keys, nonces, IVs, and passwords: these parameters should **always** be byte arrays. **String keys are essentially just passwords, meaning they're not suitable for use as keys directly** (please see the [Password Hashing/Password-Based Key Derivation](#password-hashingpassword-based-key-derivation) section). Furthermore, strings are immutable (unchangeable) in many programming languages (e.g. C#, Java, JavaScript, Go, etc), meaning they can’t be zeroed out from memory (please see point 7 below). 120 | 121 | 4. **Avoid** encryption functions/APIs that include a password parameter: these often use dated or [insecure](https://www.npmjs.com/package/evp_bytestokey) password-based KDFs that shouldn’t be used. Instead, use one of the recommended password-based KDFs (please see the [Password Hashing/Password-Based Key Derivation](#password-hashingpassword-based-key-derivation) section) yourself to derive an encryption key for an AEAD or an encryption key and MAC key for Encrypt-then-MAC. 122 | 123 | 5. Ciphers have [limits](https://soatok.blog/2020/12/24/cryptographic-wear-out-for-symmetric-encryption/) on the amount of data they can safely encrypt using a single key: for AES-GCM, you can encrypt ~64 GB using a key and nonce pair for one message (don't reuse the nonce, as explained in point 1 above) and ~350 GB (assuming 16 KB messages) with a single key. For ChaCha20-Poly1305-IETF, you can encrypt 256 GB using a key and nonce pair for one message, but there's no practical limit for a single key (2^64 bytes). XChaCha20-Poly1305 and non-IETF ChaCha20-Poly1305 have no practical limits (~2^64 bytes). Then with AES-CTR, you can encrypt ~2^64 bytes, and with AES-CBC, you can encrypt ~2^47 bytes. **Make sure you follow the recommendations below to ensure that these limits are never reached**. 124 | 125 | 6. Ideally, **use a new key for each message** (except when chunking the same message, as explained in point 8 below): this helps prevent [cryptographic wear-out](https://soatok.blog/2020/12/24/cryptographic-wear-out-for-symmetric-encryption/) (using a single key to encrypt too much data), nonce reuse, and reusing keys with multiple algorithms whilst being beneficial for security in that a compromise of one key doesn’t compromise data encrypted under different keys. One common way of doing this is to randomly generate a unique data encryption key (DEK) for each message, encrypt the DEK using a key encryption key (KEK) derived using a key derivation function (KDF), and then prepend the encrypted DEK to the ciphertext. For decryption, you derive the KEK, use it to decrypt the encrypted DEK, and use the DEK to decrypt the ciphertext. Alternatively, you can derive unique keys using a random salt with a KDF, although this is inefficient when using a password-based KDF since it means a delay for every message. 126 | 127 | 7. Erase secret keys from memory as soon as possible: once you’ve finished using a secret key, it should be zeroed out from memory to prevent an attacker with physical or remote access to a machine being able to retrieve it. Note that in garbage collected programming languages, such as C#, Go, and JavaScript, this is difficult to achieve because the garbage collector can copy secrets around in memory. [Locking memory](https://doc.libsodium.org/memory_management) via an external library can solve this problem. Also, always [disable compiler optimisations](https://github.com/samuel-lucas6/Kryptor/blob/2dada2ba7321a4284f6f9030ecb91c54c3e6291a/src/KryptorCLI/GeneralPurpose/Arrays.cs#L84) for the zero memory method. Even without locking memory, *attempting* to erase sensitive data from memory is better than doing nothing. 128 | 129 | 8. Encrypt large amounts of data in (16-64 KiB) chunks: this lowers memory usage, can be faster for Encrypt-then-MAC, allows for more encryptions under the same key with AEADs, reduces theoretical [attack boundaries](https://doc.libsodium.org/secret-key_cryptography/aead#limitations) for AEADs, means that a corruption in a ciphertext might only affect one chunk rather than rendering the entire message unrecoverable, and enables the detection of tampered chunks before an entire message is sent or read. However, this is tricky to get right because you need to add and remove padding in the last chunk (e.g. using an encrypted header to store the length of padding or a padding scheme, as explained in point 13 below) and prevent chunks from being truncated (e.g. using the total ciphertext length as additional data), reordered, duplicated, or removed (e.g. using a counter nonce that's incremented for each chunk), so you should ideally use or replicate an existing API, like [secretstream()](https://doc.libsodium.org/secret-key_cryptography/secretstream) in libsodium. 130 | 131 | 9. **Don’t** *just* use a standardised AEAD (AES-GCM, (X)ChaCha20-Poly1305, XSalsa20-Poly1305, AES-GCM-SIV, AES-OCB, etc) if you’re performing password-based encryption in an **online scenario**: most AEADs are **not** key committing, meaning they are susceptible to [partitioning oracle attacks](https://eprint.iacr.org/2020/1491.pdf). In summary, an attacker can generate a ciphertext that successfully decrypts under multiple different keys. By recursively submitting such a ciphertext to an oracle (a server that knows the key and returns an error), an attacker can guess a large number of passwords at once, [speeding up a password search](https://emilymstark.com/2021/02/01/padding-partitioning-oracles-and-another-hot-take-on-pakes.html). To solve this problem, you can either use Encrypt-then-MAC following the instructions later on in this [Notes](#notes-1) section or apply a fix for a non-committing AEAD. There are currently no standardised committing AEADs, and they would not be truly committing unless the tag was large enough to be collision-resistant (e.g. 256 bits), which is why Encrypt-then-MAC is still preferable. The simplest mitigation involves hashing the key and prepending the hash to the ciphertext, but [this leaks the identity of the key](https://keymaterial.net/2020/09/07/invisible-salamanders-in-aes-gcm-siv/) unless you include a salt. The fix I'd recommend involves deriving an encryption key and a MAC key using a KDF, encrypting the message using the AEAD with the encryption key, retrieving the authentication tag from the end of the ciphertext, and prepending a MAC of the encryption key, nonce, and AEAD authentication tag to the ciphertext (e.g. `HMAC(message: encryptionKey || nonce || tag, key: macKey)`). For decryption, you derive the encryption key and MAC key again, read the AEAD authentication tag, and verify the MAC in constant time (see point 18 below) before decrypting the message using the AEAD. An example of this fix can be found [here](https://github.com/samuel-lucas6/Committing-ChaCha20-Poly1305/tree/fb6b0c5aada36e011a6174daf52470d7784e1061/src/Method%202%20-%20Separate%20Keys%20for%20ChaCha20%20and%20Robustness%20Tag/CommittingChaCha20Poly1305). 132 | 133 | 10. Standardised AEADs (AES-GCM, (X)ChaCha20-Poly1305, XSalsa20-Poly1305, AES-GCM-SIV, AES-OCB, etc) **aren't** [key](https://doc.libsodium.org/secret-key_cryptography/aead#robustness) or [compactly](https://neilmadden.blog/2021/02/16/when-a-kem-is-not-enough/) committing: if an algorithm is key committing, then an attacker cannot generate a ciphertext that successfully decrypts using multiple keys. If an algorithm is compactly committing, then someone cannot find two different messages and two different encryption keys that lead to the same tag. These are properties often intuitively expected from AEADs, and the lack of these properties can cause *rare* problems such as [partitioning oracle attacks](https://www.youtube.com/watch?v=h-T1bQTt4_Y) for password-based encryption in *some* online scenarios, [deanonymisation](https://github.com/LoupVaillant/Monocypher/issues/218#issuecomment-886997371) when using hybrid encryption in *some* online scenarios, [invisible salamander attacks](https://www.youtube.com/watch?v=3M1jIO-jLHI) in *some* online scenarios, and decryption to [different but valid plaintexts](https://eprint.iacr.org/2020/1456.pdf). To fix this problem, you should use Encrypt-then-MAC as explained in these guidelines or apply the fix for AEADs outlined above in point 9. 134 | 135 | 11. Make use of the additional data parameter in AEADs: this parameter is useful for binding context information to a ciphertext and preventing issues like [replay attacks](https://en.wikipedia.org/wiki/Replay_attack) and [confused deputy attacks](https://cloud.google.com/kms/docs/additional-authenticated-data#confused_deputy_attack_example). It’s often used to authenticate things like headers, version numbers, timestamps, and message counters. Note that additional data is not part of the ciphertext; it’s just information included in the computation of the authentication tag. You either need to store additional data securely in some sort of database (e.g. in the case of a user’s email address being used as additional data) or be able to reproduce the additional data when it’s time for decryption (e.g. using a file name as additional data). 136 | 137 | 12. If an attacker knows the encryption key, then they can still decrypt an AEAD encrypted message without knowing the additional data: for example, they can use AES-CTR with the key to decrypt an AES-GCM encrypted message, ignoring the authentication tag and additional data. 138 | 139 | 13. Pad messages before encryption if you want to hide their length: stream ciphers, such as ChaCha20 and AES-CTR (used in AES-GCM), don’t perform any padding, meaning the ciphertext is the same length as the plaintext. This generally isn’t a concern for most applications, but when it is, you should use [ISO/IEC 7816-4](https://en.wikipedia.org/wiki/Padding_(cryptography)#ISO/IEC_7816-4) padding on the message before encryption and remove the padding after decryption. This padding scheme is [more resistant to some types of attacks](https://doc.libsodium.org/padding#algorithm) than other padding algorithms and always reversible, unlike [zero padding](https://en.wikipedia.org/wiki/Padding_(cryptography)#Zero_padding). Such padding can be [randomised](https://en.wikipedia.org/wiki/Padding_(cryptography)#Randomized_padding) or [deterministic](https://en.wikipedia.org/wiki/Padding_(cryptography)#Deterministic_padding), with both techniques having pros and cons. Randomised padding is typically better for obscuring the usage of cryptography (e.g. making an encrypted file look like random data). Encrypting data in chunks, as described in point 8 above, is an example of deterministic padding since the last chunk will always be padded to the size of a chunk. [PADMÉ](https://petsymposium.org/2019/files/papers/issue4/popets-2019-0056.pdf) padding is another type of deterministic padding with minimal storage overhead, but it doesn't pad small messages. 140 | 141 | 14. Encrypt-then-Compress is pointless and Compress-then-Encrypt *can* **leak information**: high-entropy (random) data can't be compressed, and Compress-then-Encrypt can leak the compression ratio of the plaintext, which led to the [CRIME and BREACH](https://en.wikipedia.org/wiki/Transport_Layer_Security#Security) attacks on TLS. 142 | 143 | 15. Stick to **Encrypt-then-MAC**: **don’t** MAC-then-Encrypt or Encrypt-and-MAC because both can be susceptible to attacks, whereas Encrypt-then-MAC is [always secure](https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac) when implemented correctly. Encrypt-then-MAC is the standard approach and is what’s used in non-SIV (aka most) AEADs. The only exception to this rule is when [implementing](https://github.com/samuel-lucas6/ChaCha20-BLAKE3#xchacha20-blake3-siv) an SIV AEAD to have nonce-misuse resistance, but you should ideally let a library do that for you. 144 | 145 | 16. **Always** use separate keys for authentication and encryption: this is considered good practice, even though reusing the same key *may* be *theoretically* fine. In the case of a password-based KDF, this can be done by using a larger output length (e.g. 96 bytes) and splitting the output into two keys (e.g. 256-bit and 512-bit). In the case of a non-password-based KDF, you can use the KDF twice with the same input keying material but different context information and output lengths for domain separation. Please see the [Symmetric Key Size](#symmetric-key-size) section for details on what key size you should use for encryption and MACs. 146 | 147 | 17. **Always** MAC the nonce/IV and everything in the message (e.g. file headers too): if you fail to authenticate the nonce/IV, then an attacker can tamper with it undetected. AEADs always authenticate the nonce for this reason. 148 | 149 | 18. **Always** compare secrets and MACs in constant time: if you don’t compare the authentication tags in constant time, then this can lead to timing attacks that allow an attacker to calculate a valid tag for a forged message. Libraries like libsodium have [constant time comparison functions](https://doc.libsodium.org/helpers#constant-time-test-for-equality) that you can use to prevent this. 150 | 151 | 19. Concatenating multiple variable length parameters when using a MAC (e.g. `HMAC(message: additionalData || ciphertext, key: macKey)`) can lead to **attacks**: please see point 5 of the [Message Authentication Codes Notes](#notes-2) section. 152 | 153 | 20. Cipher agility is [harmful](https://paragonie.com/blog/2019/10/against-agility-in-cryptography-protocols): less is more in the case of supporting multiple ciphers/algorithms because more choices means more can go wrong, which is one reason why [WireGuard](https://www.wireguard.com/) is regarded as superior to [OpenVPN](https://openvpn.net/) and TLS 1.3 supports [fewer algorithms](https://en.wikipedia.org/wiki/Transport_Layer_Security#Cipher) than TLS 1.2. Cipher agility has caused serious problems, like in the case of [JWTs](https://paragonie.com/blog/2017/03/jwt-json-web-tokens-is-bad-standard-that-everyone-should-avoid). Also, in the case of programs like [GPG](https://gnupg.org/) and [VeraCrypt](https://www.veracrypt.fr/), customisation allows the user to worsen their security. Therefore, **choose one secure Encrypt-then-MAC combo or AEAD recommended above, and that’s it**. If the algorithm you chose gets broken, which is **extremely** unlikely if you’re following these guidelines, then you can just increment the protocol/format version number and switch to a different algorithm. 154 | 155 | 21. Cascade encryption is unnecessary: although I’ve written a cascade encryption library based on [TripleSec](https://keybase.io/triplesec/) called [DoubleSec](https://github.com/samuel-lucas6/DoubleSec), cascade encryption is significantly slower and solves a problem that pretty much [doesn’t exist](https://blog.cryptographyengineering.com/2012/02/02/multiple-encryption/) because algorithms like ChaCha20 and AES are [nowhere near broken](https://eprint.iacr.org/2019/1492.pdf) and other issues are more likely to cause problems. Furthermore, it’s a hassle to implement yourself compared to using a single algorithm, with more things that can go wrong. Therefore, unless you’re extremely paranoid (e.g. in an Edward Snowden type situation) and don’t care about speed at all, please don’t bother. 156 | 157 | #### Discussion: 158 | Not everyone will agree with my recommendation to use Encrypt-then-MAC over AEADs when possible for the following reasons: 159 | 160 | 1. It’s easier to implement an AEAD: you don’t need to worry about deriving separate keys, appending and removing the authentication tag, and comparing authentication tags in constant time. AEADs also make it easy to use additional data in the calculation of the authentication tag. This should mean fewer mistakes. 161 | 162 | 2. AEADs are typically faster: AES-GCM with AES-NI instruction set support is very fast, AES-OCB, AEGIS, and Rocca are even faster, and ChaCha20-Poly1305 is also fast without any reliance on hardware support. 163 | 164 | 3. It’s easier to chunk data with an AEAD: Encrypt-then-MAC normally involves encrypting all the data in one go and appending one authentication tag at the end, which requires loading the entire message into memory and means a corruption renders the entire message unrecoverable. Whilst you can also do this with AEADs, it's recommended to chunk messages, as explained in point 8 of the [Notes](#notes-1), meaning the ciphertext contains multiple authentication tags. This is trickier with Encrypt-then-MAC unless you're using a library that offers it as a function. 165 | 166 | My response to these arguments is: 167 | 168 | 1. Yes, AEADs are simpler, which is exactly why we need [committing](https://eprint.iacr.org/2019/016.pdf) AEADs and Encrypt-then-MAC implementations to be standardised and included in cryptographic libraries. Unfortunately, this isn’t happening because everyone is busy promoting non-committing AEADs. 169 | 170 | 2. Whilst this is often true, except for AEADs like AES-GCM without AES-NI support, Encrypt-then-MAC, especially using MACs like BLAKE2b and BLAKE3, is not slow enough for this to be considered a serious problem, particularly in non-interactive/offline scenarios or when dealing with long-term storage. In fact, using BLAKE3 with a large enough amount of data can be faster than Poly1305 and GMAC. Moreover, I would argue that the additional security makes up for any loss in speed. AEADs are not designed for long-term storage, as indicated by the small nonces and tags, whereas Encrypt-then-MAC is. 171 | 172 | 3. This is another reason why Encrypt-then-MAC implementations like (X)ChaCha20-BLAKE2b should be included in cryptographic libraries. If they were, then you could call them like any other AEAD. For instance, I made [ChaCha20-BLAKE2b](https://github.com/samuel-lucas6/ChaCha20-BLAKE2b) and [ChaCha20-BLAKE3](https://github.com/samuel-lucas6/ChaCha20-BLAKE3) libraries to allow me to do this. 173 | 174 | So when should you use an AEAD? Exceptions to my Encrypt-then-MAC recommendation include when: 175 | 176 | 1. Maximum performance is necessary: for example, in online scenarios where there's a large key space (e.g. passwords aren't being used) and data is not being stored long-term, such as [TLS 1.3](https://www.davidwong.fr/tls13/) and [WireGuard](https://www.wireguard.com/protocol/). This is what AEADs are designed for. However, with non-committing AEADs and a small key space in an online scenario, things like [partitioning oracle attacks](https://eprint.iacr.org/2020/1491.pdf) and [deanonymisation](https://github.com/LoupVaillant/Monocypher/issues/218#issuecomment-886997371) may be possible. 177 | 178 | 2. You’re not comfortable implementing Encrypt-then-MAC: if there’s no decent library you can use (e.g. [Tink](https://developers.google.com/tink) isn’t available in your programming language) or copy code from (make sure you respect the code license!), then you’re more likely to implement an AEAD correctly. However, implementing the fix I recommend for partitioning oracle attacks (please see point 9 of the [Notes](#notes-1)), which affect online password-based encryption scenarios, requires knowing how to use a MAC, so at that point, you may as well use Encrypt-then-MAC, especially if you’re storing data long-term. With enough research and attention to detail, Encrypt-then-MAC can be implemented correctly by anyone. 179 | 180 | ## Message Authentication Codes 181 | #### Use (in order): 182 | 1. [Keyed BLAKE2b-256](https://www.blake2.net/) or [keyed BLAKE2b-512](https://www.blake2.net/): [faster](https://www.blake2.net/) than HMAC and SHA3, yet as real-world [secure](https://eprint.iacr.org/2019/1492.pdf) as SHA3. Furthermore, BLAKE2 relies on essentially the same core algorithm as BLAKE, which received a [significant amount of cryptanalysis](https://nvlpubs.nist.gov/nistpubs/ir/2012/NIST.IR.7896.pdf), even more than [Keccak](https://keccak.team/keccak.html) (the SHA3 finalist), as part of the [SHA3 competition](https://competitions.cr.yp.to/sha3.html). Despite being one of the best candidates on paper, it didn't win the SHA3 competition because the design was [more similar](https://leastauthority.com/blog/BLAKE2-harder-better-faster-stronger-than-MD5/) to that of SHA2. However, this is arguably a good thing since SHA2 is still [secure](https://en.wikipedia.org/wiki/SHA-2#Cryptanalysis_and_validation) after many years of cryptanalysis. Lastly, it's available in [many](https://www.blake2.net/#us) cryptographic libraries and has become increasingly popular in software (e.g. it’s used in [Argon2](https://www.rfc-editor.org/rfc/rfc9106.html) and many [other](https://www.blake2.net/#us) password hashing schemes). 183 | 184 | 2. [HMAC-SHA256](https://doc.libsodium.org/advanced/hmac-sha2) or [HMAC-SHA512](https://doc.libsodium.org/advanced/hmac-sha2): slower and older than BLAKE2 but [well-studied](https://en.wikipedia.org/wiki/SHA-2#Cryptanalysis_and_validation). HMAC-SHA2 is also faster than SHA3, extremely popular in software, and available in about every cryptographic library. However, unlike BLAKE, BLAKE2, BLAKE3, and SHA3, SHA2 was designed behind closed doors at the NSA rather than the result of an open competition, with [no design rationale](https://keccak.team/2017/open_source_crypto.html) in the standard. 185 | 186 | 3. [SHAKE256](https://en.wikipedia.org/wiki/SHA-3#Instances): this is [faster](https://keccak.team/2017/is_sha3_slow.html) than regular SHA3 and similar in speed to SHA2, which is why it's being [recommended](https://pkg.go.dev/golang.org/x/crypto@v0.0.0-20210921155107-089bfa567519/sha3#hdr-Recommendations) for most applications over regular SHA3 by the [Keccak (SHA3) team](https://en.wikipedia.org/wiki/SHA-3#Speed) and in the [go/x/crypto](https://pkg.go.dev/golang.org/x/crypto@v0.0.0-20210921155107-089bfa567519) documentation. Using it as a MAC requires concatenating a fixed length key and the message to authenticate (`SHAKE256(key || message)`). Use a 256-bit output length to get an equivalent security level to SHA3-256 and SHA256. A 512-bit output length provides 256-bit collision resistance but only 256-bit preimage and second preimage resistance still, which is less than SHA3-512 and SHA512, not that this is a practical concern since 2^256 is impossible to reach. 187 | 188 | 4. [KMAC256](https://en.wikipedia.org/wiki/SHA-3#Additional_instances), [SHA3-256](https://en.wikipedia.org/wiki/SHA-3), or [SHA3-512](https://en.wikipedia.org/wiki/SHA-3): SHA3 is [slower](https://www.imperialviolet.org/2017/05/31/skipsha3.html) in software than BLAKE2, BLAKE3, SHA2, HMAC-SHA2, and SHAKE but has a [higher security margin](https://csrc.nist.gov/csrc/media/projects/hash-functions/documents/sha-3_selection_announcement.pdf) and is [fast](https://keccak.team/2017/is_sha3_slow.html) in hardware. If [KMAC256](https://en.wikipedia.org/wiki/SHA-3#Additional_instances) is available in your cryptographic library, then you should use it with a 256-bit or 512-bit output length (please see point 3 above since the same security level applies) because it's like HMAC for SHA3. Otherwise, you should perform concatenation of the fixed length key and message (`SHA3(key || message)`) to construct a SHA3 MAC because HMAC-SHA3 is needlessly inefficient since SHA3 is already a MAC. The worse performance and less accessible variants make it hard to recommend over HMAC-SHA2. 189 | 190 | 5. [Keyed BLAKE3-256](https://github.com/BLAKE3-team/BLAKE3#readme): [faster](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf) than BLAKE2, SHA2, SHAKE, and SHA3, but it has a [smaller security margin](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf), only targets the [128-bit security level](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf), and isn't available in many cryptographic libraries. Therefore, I'd only recommend this when speed is of utmost importance because it's not conservative. 191 | 192 | #### Avoid (not in order because they’re all bad): 193 | 1. Regular, unencrypted hashes (e.g. `SHA256(ciphertext)`): this is **insecure** because unkeyed hashes don't provide authentication. 194 | 195 | 2. Regular, encrypted hashes (e.g. `AES-CTR(SHA256(ciphertext))`): this is **insecure**. For example, with a stream cipher, you could flip bits in the ciphertext hash. 196 | 197 | 3. `SHA2(key || message)`: this is **vulnerable** to [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack), as discussed in point 3 of the [Hashing Notes](#notes-5) section. Technically speaking, `SHA2(message || key)` works as a MAC if the attacker doesn’t know the key, but it’s weaker than constructions like HMAC because it requires the hash function to be collision-resistant rather than a pseudorandom function and therefore shouldn’t be used. Newer hash functions, like BLAKE2, SHA3, and BLAKE3, are resistant to length extension attacks and could be used to perform `Hash(key || message)` safely, but you should still just use a keyed hash function when possible to do the work for you. 198 | 199 | 4. [Meow Hash](https://mollyrocket.com/meowhash): this is **insecure**, as explained by [this](https://peter.website/meow-hash-cryptanalysis) cryptanalysis blog post. 200 | 201 | 5. [HMAC-MD5](https://en.wikipedia.org/wiki/HMAC#Security) and [HMAC-SHA1](https://en.wikipedia.org/wiki/HMAC): MD5 and SHA1 should no longer be used for anything. 202 | 203 | 6. [Poly1305](https://doc.libsodium.org/advanced/poly1305) and other polynomial MACs: these are easier to misuse than the recommended algorithms (e.g. Poly1305 requires a secret, unique, and unpredictable key each time that’s independent from the encryption key). They also produce small tags that are designed for online protocols and small messages. 204 | 205 | 7. [CBC-MAC](https://en.wikipedia.org/wiki/CBC-MAC): this is unpopular and often [implemented incorrectly](https://blog.cryptographyengineering.com/2013/02/15/why-i-hate-cbc-mac/) because it has [weird requirements](https://en.wikipedia.org/wiki/CBC-MAC#Security_with_fixed_and_variable-length_messages) that most people are probably completely unaware of, **allowing for attacks**. Even when implemented correctly, the recommended algorithms are better. 206 | 207 | 8. [CMAC/OMAC](https://en.wikipedia.org/wiki/One-key_MAC): almost nobody uses this, even though it improves on CBC-MAC in terms of preventing mistakes. Furthermore, it only produces a 128-bit tag. 208 | 209 | 9. 128-bit [keyed hashes](https://doc.libsodium.org/hashing/generic_hashing#usage) or [HMACs](https://en.wikipedia.org/wiki/HMAC): **you shouldn’t go below a 256-bit output length** with hash functions because a 128-bit security level should be the minimum, and 128-bit authentication tags only provide 64-bit collision resistance. 210 | 211 | 10. [SHAKE128](https://en.wikipedia.org/wiki/SHA-3#Instances) and [KMAC128](https://en.wikipedia.org/wiki/SHA-3#Additional_instances): these only provide at best 128-bit preimage and second preimage resistance regardless of the output length, which is lower than a typical 256-bit hash. Therefore, use SHAKE256/KMAC256 with a 256-bit or 512-bit output length to obtain 256-bit preimage and second preimage resistance. 212 | 213 | 11. [Keyed BLAKE2s](https://www.blake2.net/): in most cases, you'll want to use BLAKE2b, which is faster on 64-bit platforms, does more rounds, and can produce larger digests. Only use BLAKE2s if you're hashing on 8- to 32-bit platforms since that's what it's designed for. 214 | 215 | #### Notes: 216 | 1. **Please read** points 15-18 of the [Symmetric Encryption Notes](#notes-1) for guidance on implementing a MAC correctly. 217 | 218 | 2. **Please read** point 2 of the [Symmetric Key Size Use](#symmetric-key-size) section for guidance on what key size to use. 219 | 220 | 3. A 256-bit authentication tag is sufficient for most use cases: however, a 512-bit tag provides additional security if you’re concerned about quantum computing. I wouldn’t recommend bothering with an output length in-between (e.g. HMAC-SHA384) because that’s not common, and you may as well go all the way to get a 256-bit security level. 221 | 222 | 4. Append the authentication tag to the ciphertext: this is common practice and how AEADs operate. 223 | 224 | 5. Concatenating multiple variable length parameters (e.g. `HMAC(message: additionalData || ciphertext, key: macKey)`) can lead to **attacks**: if you fail to concatenate the lengths of the parameters (e.g. `HMAC(message: additionalData || ciphertext || additionalDataLength || ciphertextLength, key: macKey)`, with the lengths converted to a fixed number of bytes, such as 4 bytes to represent an integer, consistently in either big- or little-endian, regardless of the endianness of the machine), then your implementation will be susceptible to [canonicalization attacks](https://soatok.blog/2021/07/30/canonicalization-attacks-against-macs-and-signatures/) because an attacker can shift bytes in the different parameters whilst producing a valid authentication tag. AEADs do this length concatenation for you to prevent this. Another potentially more efficient method of safely supporting multiple inputs is to iteratively MAC each input, using the previous tag as the key for the next MAC calculation and a different random key to MAC the final tag to prevent [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack). This alternative is outlined [here](https://neilmadden.blog/2021/10/27/multiple-input-macs/), and I have created a small library called [MultiMAC](https://github.com/samuel-lucas6/MultiMAC) that does this if you need further help. 225 | 226 | ## Symmetric Key Size 227 | #### Use (not in order because they have different use cases): 228 | 1. 256-bit keys: there’s essentially no reason not to use 256-bit keys for symmetric encryption. With AES-128, a 128-bit key doesn't necessarily translate to 128-bit security due to [batch attacks](https://blog.cr.yp.to/20151120-batchattacks.html), this is the only available key size for most (X)ChaCha20 and (X)Salsa20 implementations, it’s the key size that’s used for [top secret material](https://www.keylength.com/en/6/) by intelligence agencies and governments, and it’s [now recommended](https://www.keylength.com/en/3/) for long-term storage due to concerns surrounding quantum computers being able to bruteforce 128-bit keys. 229 | 230 | 2. 512-bit keys: it’s [recommended](https://www.rfc-editor.org/rfc/rfc2104#section-3) to always use a key size as large as the output length for HMAC (e.g. a 512-bit key for HMAC-SHA512). This principle is a good rule to follow for MACs in general as it ensures that the key size doesn't decrease the security provided by the output length. However, you can use a larger key size (e.g. a 512-bit key with a 256-bit output length) for domain separation when deriving keys. 231 | 232 | #### Avoid (in order): 233 | 1. Smaller than 128-bit keys: this won’t stand the test of time and in [some cases](https://en.wikipedia.org/wiki/Data_Encryption_Standard#Brute-force_attack) can already be bruteforced. 234 | 235 | 2. Symmetric encryption algorithms with large key sizes (e.g. [Threefish](https://en.wikipedia.org/wiki/Threefish)): key sizes over 256-bit are widely regarded as unnecessary because they provide no practical security benefit. Furthermore, encryption algorithms supporting such key sizes are unpopular in practice. 236 | 237 | 3. 128-bit keys: **this is the minimum** and provides better performance, but please just use 256-bit keys because they provide a higher security margin. AES-128 is less secure than AES-256 because it's considerably faster to bruteforce, especially when you consider [batch attacks](https://blog.cr.yp.to/20151120-batchattacks.html). The [argument](https://blog.1password.com/why-we-moved-to-256-bit-aes-keys/) that AES-128 is more secure than AES-256 due to certain attacks being more effective on AES-256 is incorrect because such attacks are **not** practical in the real world. 238 | 239 | 4. HMAC keys larger than the hash function block size (e.g. > 512 bits with HMAC-SHA256 and > 1024 bits with HMAC-SHA512): this causes the key to get [hashed](https://www.rfc-editor.org/rfc/rfc2104#section-3) down to the output length of the hash function, which ironically reduces security compared to using a key as large as the block size. 240 | 241 | #### Notes: 242 | 1. Symmetric keys **must** be kept secret: unlike with public-key cryptography, where you can share the public key safely, you must not share a symmetric key via an insecure (e.g. unencrypted) channel. 243 | 244 | 2. Keys **must** be uniformly random: they can either be randomly generated using a **cryptographically secure** pseudorandom number generator (please see the [Random Numbers](#random-numbers) section) or derived using one of the recommended key derivation or password-based key derivation functions (please see the [(Non-Password-Based) Key Derivation Functions](#non-password-based-key-derivation-functions) and [Password Hashing/Password-Based Key Derivation](#password-hashingpassword-based-key-derivation) sections). 245 | 246 | ## Random Numbers 247 | #### Use (in order): 248 | 1. The **cryptographically secure** pseudorandom number generator ([CSPRNG](https://en.wikipedia.org/wiki/Cryptographically-secure_pseudorandom_number_generator)) in your programming language or cryptographic library: these should use the CSPRNG in your operating system. For example, [RandomNumberGenerator()](https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.randomnumbergenerator?view=net-6.0) in C#, [SecureRandom()](https://docs.oracle.com/javase/8/docs/api/java/security/SecureRandom.html) in Java, [Crypto.getRandomValues()](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getRandomValues) in JavaScript, and so on. 249 | 250 | 2. A [fast-key-erasure](https://blog.cr.yp.to/20170723-random.html) userspace RNG: this should be **a last resort** because it’s hard to collect entropy properly. **A lot can go wrong if you don’t know what you’re doing**. On embedded devices, allow a library like [LibHydrogen](https://github.com/jedisct1/libhydrogen) to handle random number generation for you. 251 | 252 | #### Avoid (not in order because they’re both bad): 253 | 1. A **non-cryptographically secure** pseudorandom number generator: for example, [Math.random()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random) in JavaScript, [Random.Next()](https://docs.microsoft.com/en-us/dotnet/api/system.random.next?view=net-6.0) in C#, [Random()](https://docs.oracle.com/javase/8/docs/api/java/util/Random.html) in Java, and so on. **These are not secure and should not be used for anything related to security**. 254 | 255 | 2. A custom RNG: this is **likely** going to be **insecure** because it’s harder to do properly than you’d think. **Just trust the CSPRNG in your operating system**. 256 | 257 | #### Notes: 258 | 1. Ideally, generate 256-bit random values for IDs, salts, etc: this reduces the chances of a collision into the realm of not having anything to worry about. By contrast, random 128-bit values will collide after 2^64 due to the [birthday paradox](https://en.wikipedia.org/wiki/Birthday_problem). 259 | 260 | ## Hashing 261 | #### Use (in order): 262 | 1. [BLAKE2b-512](https://www.blake2.net/) or [BLAKE2b-256](https://www.blake2.net/): [faster](https://www.blake2.net/) than MD5, SHA1, SHA2, and SHA3, yet as real-world [secure](https://eprint.iacr.org/2019/1492.pdf) as SHA3. Furthermore, BLAKE2 relies on essentially the same core algorithm as BLAKE, which received a [significant amount of cryptanalysis](https://nvlpubs.nist.gov/nistpubs/ir/2012/NIST.IR.7896.pdf), even more than [Keccak](https://keccak.team/keccak.html) (the SHA3 finalist), as part of the [SHA3 competition](https://competitions.cr.yp.to/sha3.html). Despite being one of the best candidates on paper, it didn't win the SHA3 competition because the design was [more similar](https://leastauthority.com/blog/BLAKE2-harder-better-faster-stronger-than-MD5/) to that of SHA2. However, this is arguably a good thing since SHA2 is still [secure](https://en.wikipedia.org/wiki/SHA-2#Cryptanalysis_and_validation) after many years of cryptanalysis. Lastly, it's available in [many](https://www.blake2.net/#us) cryptographic libraries and has become increasingly popular in software (e.g. it’s used in [Argon2](https://www.rfc-editor.org/rfc/rfc9106.html) and many [other](https://www.blake2.net/#us) password hashing schemes). 263 | 264 | 2. [SHA512](https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions), [SHA512/256](https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions), or [SHA256](https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions): SHA2 is the most popular hash function, meaning it’s widely available in cryptographic libraries, it’s still secure after many years of [cryptanalysis](https://en.wikipedia.org/wiki/SHA-2#Cryptanalysis_and_validation) besides [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack) (please see point 3 of the [Notes](#notes-5) section), and it offers decent performance. 265 | 266 | 3. [SHAKE256](https://en.wikipedia.org/wiki/SHA-3#Instances): this is [faster](https://keccak.team/2017/is_sha3_slow.html) than regular SHA3 and similar in speed to SHA2, which is why it's being [recommended](https://pkg.go.dev/golang.org/x/crypto@v0.0.0-20210921155107-089bfa567519/sha3#hdr-Recommendations) for most applications over regular SHA3 by the [Keccak (SHA3) team](https://en.wikipedia.org/wiki/SHA-3#Speed) and in the [go/x/crypto](https://pkg.go.dev/golang.org/x/crypto@v0.0.0-20210921155107-089bfa567519) documentation. Use a 256-bit output length to get an equivalent security level to SHA3-256 and SHA256. A 512-bit output length provides 256-bit collision resistance but only 256-bit preimage and second preimage resistance still, which is less than SHA3-512 and SHA512, not that this is a practical concern since 2^256 is impossible to reach. 267 | 268 | 4. [SHA3-512](https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions) or [SHA3-256](https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions): [slow](https://www.imperialviolet.org/2017/05/31/skipsha3.html) in software, but the [new standard](https://www.nist.gov/publications/sha-3-standard-permutation-based-hash-and-extendable-output-functions), [fast](https://keccak.team/2017/is_sha3_slow.html) in hardware, has a [flexible construction](https://keccak.team/sponge_duplex.html) that has been used to build [other algorithms](https://keccak.team/keyak.html), [well analysed](https://keccak.team/third_party.html), [very different](https://keccak.team/keccak.html) to SHA2, and has a [higher security margin](https://eprint.iacr.org/2019/1492.pdf) than the other algorithms listed here. 269 | 270 | 5. [BLAKE3-256](https://github.com/BLAKE3-team/BLAKE3#readme): the [fastest](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf) cryptographic hash in software at the cost of having a [lower security margin](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf) and being limited to a [128-bit security level](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf). It's also rarely available in cryptographic libraries. However, it improves on BLAKE2 in that there’s only one variant that covers all use cases (it’s a regular hash, PRF, MAC, KDF, and XOF), but depending on the cryptographic library you use, this probably isn't something you’ll notice when using BLAKE2b anyway. I'd only recommend this when speed is of utmost importance because it's not conservative. 271 | 272 | #### Avoid (not in order because they’re all bad): 273 | 1. **Non-cryptographic** hash functions and error-detecting codes (e.g. [CRC](https://en.wikipedia.org/wiki/Cyclic_redundancy_check)): the clue is in the name. These are **not secure**. 274 | 275 | 2. [MD5](https://en.wikipedia.org/wiki/MD5) and [SHA1](https://en.wikipedia.org/wiki/SHA-1): both are very old and **no longer secure**. For instance, there’s an [attack](https://eprint.iacr.org/2013/170.pdf) that breaks MD5 collision resistance in 2^18 time, which takes less than a second to execute on an ordinary computer. 276 | 277 | 3. [Streebog](https://en.wikipedia.org/wiki/Streebog): it has a [flawed](https://eprint.iacr.org/2016/071.pdf) [S-Box](https://eprint.iacr.org/2019/092), with no design rational ever being made public, which is **likely a [backdoor](https://www.schneier.com/blog/archives/2019/05/cryptanalyzing_.html)**. This algorithm is available in [VeraCrypt](https://www.veracrypt.fr/en/Streebog.html), but I've luckily not seen it used anywhere else. **Never use it or any program/protocol relying on it**. 278 | 279 | 4. **Insecure** and non-finalist SHA3 competition candidates (e.g. [EDON-R](https://eprint.iacr.org/2009/378.pdf)): if you want to use something from the SHA3 competition, then you should either use BLAKE2b (based on BLAKE, which was thoroughly analysed and deemed to have a [very high security margin](https://nvlpubs.nist.gov/nistpubs/ir/2012/NIST.IR.7896.pdf)), SHA3 (the winner, [very different](https://keccak.team/sponge_duplex.html) to SHA2 in design, and has a [very high security margin](http://nvlpubs.nist.gov/nistpubs/ir/2012/NIST.IR.7896.pdf)), or BLAKE3 (based on BLAKE2 but faster and with a [lower security margin](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf)). 280 | 281 | 5. Chaining hash functions (e.g. `SHA256(SHA1(message))`): this can be **insecure** (e.g. SHA1 has worse collision resistance than SHA256, meaning a collision for SHA1 results in a collision for `SHA256(SHA1(message))`) and is obviously less efficient than hashing once. **Just don’t do this**. 282 | 283 | 6. [RIPEMD](https://en.wikipedia.org/wiki/RIPEMD), [RIPEMD-128](https://en.wikipedia.org/wiki/RIPEMD), [RIPEMD-160](https://en.wikipedia.org/wiki/RIPEMD), [RIPEMD-256](https://en.wikipedia.org/wiki/RIPEMD), and [RIPEMD-360](https://en.wikipedia.org/wiki/RIPEMD): the original RIPEMD has [collisions](https://eprint.iacr.org/2004/199.pdf) and RIPEMD-128 has a small output size, meaning they're **insecure**. Then the longer variants are still old, unpopular, most implementations are limited to small output lengths (e.g. 160-bit is the most common), and they have worse performance and have received less analysis compared to the recommended algorithms. 284 | 285 | 7. [Whirlpool](https://en.wikipedia.org/wiki/Whirlpool_(hash_function)), [SHA224](https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions), [MD6](https://en.wikipedia.org/wiki/MD6), and other hashes nobody uses: these are all worse in one way or another than the recommended algorithms, which is why nobody uses them. For instance, Whirlpool is [slower](https://www.cryptopp.com/benchmarks.html) than most other cryptographic hash functions, SHA224 only provides [112-bit collision resistance](https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions), which is below the recommended 128-bit security level, MD6 didn't make it to the [second round](https://competitions.cr.yp.to/sha3.html) of the SHA3 competition and has [speed issues](https://en.wikipedia.org/wiki/MD6), and so on. 286 | 287 | 8. 128-bit hashes: you shouldn’t go below a 256-bit output with hash functions to ensure 128-bit security. 128-bit hashes only provide a 64-bit security level. 288 | 289 | 9. [SHAKE128](https://en.wikipedia.org/wiki/SHA-3#Instances): this only provides at best 128-bit preimage and second preimage resistance regardless of the output length, which is lower than a typical 256-bit hash. Therefore, use SHAKE256 with a 256-bit or 512-bit output length to obtain 256-bit preimage and second preimage resistance. 290 | 291 | 10. [KangarooTwelve](https://keccak.team/kangarootwelve.html): much [faster](https://keccak.team/2017/is_sha3_slow.html) than SHA3 and SHAKE, has a safe security margin, and has no variants, but it's rarely accessible and used, so you may as well just use SHAKE if you want something based on Keccak. 292 | 293 | 11. [BLAKE2s](https://www.blake2.net/): in most cases, you'll want to use BLAKE2b, which is faster on 64-bit platforms, does more rounds, and can produce larger digests. Only use BLAKE2s if you're hashing on 8- to 32-bit platforms since that's what it's designed for. 294 | 295 | #### Notes: 296 | 1. **These hash functions are not suitable for password hashing**: these algorithms are fast, whereas password hashing needs to be slow to prevent [bruteforce attacks](https://en.wikipedia.org/wiki/Password_cracking). Furthermore, password hashing requires using a **random** salt for each password to derive unique hashes when given the same input and to protect against attacks using [precomputed hashes](https://en.wikipedia.org/wiki/Rainbow_table). 297 | 298 | 2. **These unkeyed hash functions are not suitable for authentication**: you need to use [MACs](https://en.wikipedia.org/wiki/Message_authentication_code) (please see the [Message Authentication Codes](#message-authentication-codes) section), such as keyed BLAKE2b-512 and HMAC-SHA512, for authentication because they provide the [appropriate security guarantees](https://en.wikipedia.org/wiki/Message_authentication_code#Security). 299 | 300 | 3. **SHA2** (except for SHA512/256 – SHA224 and SHA384 [don’t provide the same level of protection](https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions)), MD5, SHA1, Whirlpool, RIPEMD-160, and MD4 are susceptible to [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack): an attacker can use `Hash(message1)` and the length of `message1` to calculate `Hash(message1 || message2)` for an attacker-controlled `message2`, without knowing `message1`. Therefore, **concatenating things (e.g. `Hash(secret || message)`) with these algorithms is a bad idea**. Instead, BLAKE2b, SHA512/256, HMAC-SHA2, SHA3, HMAC-SHA3, or BLAKE3 should be used because none of these are susceptible to length extension attacks. Also, please read point 5 of the [Message Authentication Codes Notes](#notes-2) section because concatenating variable length parameters incorrectly can lead to [another](https://soatok.blog/2021/07/30/canonicalization-attacks-against-macs-and-signatures/) type of attack. 301 | 302 | 4. Hash functions do **not** increase entropy: if you hash a single ASCII character, then that means there are still only 128 possible values. Therefore, prehashing passwords before using a password-based KDF doesn't improve the entropy of the password. This is also why inputs to hash functions need to be high in entropy in some contexts (e.g. using the hash of a keyfile as an encryption key). 303 | 304 | ## Password Hashing/Password-Based Key Derivation 305 | #### Use (in order): 306 | 1. [Argon2id](https://en.wikipedia.org/wiki/Argon2) (64+ MiB of RAM, 3+ iterations, and 1+ parallelism): winner of the [Password Hashing Competition](https://www.password-hashing.net/) in 2015, widely used and recommended now, and very [easy to use](https://doc.libsodium.org/password_hashing/default_phf) in libraries like libsodium. Use as high of a memory size as possible and then as many iterations as possible to reach a suitable delay for your use case (e.g. a delay of 500 milliseconds for server authentication, 1 second for file encryption, 3-5 seconds for disk encryption, etc). The parallelism can't be adjusted in libraries like [libsodium](https://github.com/jedisct1/libsodium/issues/986) and [Monocypher](https://monocypher.org/manual/argon2i), but higher values based on your CPU core count (e.g. a parallelism of 4) should be used when possible on machines that aren't servers. 307 | 308 | 2. [scrypt](https://en.wikipedia.org/wiki/Scrypt) (N=32768, r=8, p=1 and higher): the parameters are more confusing and less scalable than Argon2, and it’s susceptible to [cache-timing attacks](https://crypto.stanford.edu/cs359c/17sp/projects/MarkAnderson.pdf). However, it’s still a [strong algorithm](https://www.tarsnap.com/scrypt/scrypt.pdf) when configured correctly. 309 | 310 | 3. [bcrypt](https://en.wikipedia.org/wiki/Bcrypt) (12+ work factor): note that **this is not a KDF** because the output length cannot be adjusted. **However, it's stronger than Argon2 and scrypt at shorter runtimes (e.g. a 100ms delay for password hashing on a server) since it's [minimally cache-hard](https://soatok.blog/2022/12/29/what-we-do-in-the-etc-shadow-cryptography-with-passwords/)**. For longer runtimes (e.g. 1 second), Argon2 and scrypt are better choices because then memory-hardness becomes more important, whereas bcrypt uses a small, fixed amount of memory. Most importantly, it [blows PBKDF2 out of the water](https://tobtu.com/minimum-password-settings/) in terms of [resisting GPU/ASIC attacks](https://www.tarsnap.com/scrypt/scrypt.pdf). Unfortunately, it has a stupid password length limit of [72 characters](https://en.wikipedia.org/wiki/Bcrypt#Maximum_password_length), meaning people often prehash the password using something like SHA-2 to support longer passwords. However, this can lead to [password shucking](https://www.youtube.com/watch?v=OQD3qDYMyYQ) when using an unsalted/unpeppered hash function (e.g. MD5, SHA-1, SHA-256) and null bytes in the hash, allowing an attacker to find [collisions](https://blog.ircmaxell.com/2015/03/security-issue-combining-bcrypt-with.html) that speed up attacks. Therefore, you should use [hmac-bcrypt](https://github.com/epixoip/hmac-bcrypt), which addresses these issues. For example, it [Base64 encodes the prehash](https://paragonie.com/blog/2015/04/secure-authentication-php-with-long-term-persistence) to avoid the null bytes problem. 311 | 312 | 4. [PBKDF2-SHA-512](https://en.wikipedia.org/wiki/PBKDF2) (200,000+ iterations): **only use this when none of the better algorithms are available** or due to compatibility restraints because it can be [efficiently bruteforced](https://www.tarsnap.com/scrypt/scrypt.pdf) using GPUs and ASICs when not using a high iteration count. Note that it’s generally recommended not to ask for more than the output length of the underlying hash function because this can lead to [attacks](https://blog.1password.com/1password-hashcat-strong-master-passwords/). Instead, if that’s required, use PBKDF2 first to get the output length of the underlying hash function (64 bytes with PBKDF2-SHA-512) before calling a non-password-based KDF, like HKDF-Expand, with the PBKDF2 output as the input keying material (IKM) to derive more output. 313 | 314 | #### Avoid (not in order because they’re all bad): 315 | 1. Storing passwords in plaintext: **this is a recipe for disaster**. If your password database is ever compromised, all your users are screwed, and your reputation in terms of security will go down the drain as well. 316 | 317 | 2. Using a password as a key (e.g. `key = Encoding.UTF8.GetBytes(password)`): firstly, passwords are low in entropy, whereas **cryptographic keys need to be high in entropy**. Secondly, not using a password-based KDF with a random salt means **attackers can quickly bruteforce passwords** and users using the same password will end up using the same key. 318 | 319 | 3. Using a regular/fast hash function (e.g. [MD5](https://en.wikipedia.org/wiki/MD5), [SHA-1](https://en.wikipedia.org/wiki/SHA-1), [SHA-2](https://en.wikipedia.org/wiki/SHA-2), etc): **these are not suitable for password hashing** because they’re not slow, which allows for **fast bruteforce attacks**. Password hashing also requires using a salt to protect against attacks using precomputed hashes and to prevent the same password always having the same hash. However, adding a salt to certain regular hash functions, such as SHA-2, can lead to [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack), as discussed in point 3 of the [Hashing Notes](#notes-5) section. 320 | 321 | 4. Encrypting passwords: **encryption is reversible, whereas hashing is not**. If an attacker compromises a password database and obtains a password hash, then they don’t know the password without computing the hash. By contrast, if an attacker compromises a password database and the relevant encryption key(s), then they can easily obtain the plaintext passwords. Encryption would also reveal the password length unless you padded the input. 322 | 323 | 5. [PBKDF1](https://en.wikipedia.org/wiki/PBKDF2): **never use this** as it was **superseded by PBKDF2** and can only derive keys up to 160 bits, which is basically not suitable for anything. Some implementations, such as [PasswordDeriveBytes()](https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.passwordderivebytes?view=net-6.0) in C#, are also [completely broken](https://crypto.stackexchange.com/questions/1842/how-does-pbkdf1-work). 324 | 325 | 6. [SHAcrypt](https://en.wikipedia.org/wiki/Crypt_(C)#SHA2-based_scheme): it's [weaker](https://www.akkadia.org/drepper/SHA-crypt.txt) than the recommended algorithms, nobody uses this, and I’ve never even seen it in a cryptographic library. 326 | 327 | 7. [PBKDF2-MD5](https://en.wikipedia.org/wiki/PBKDF2), [PBKDF2-SHA-1](https://en.wikipedia.org/wiki/PBKDF2), [PBKDF2-SHA-256](https://en.wikipedia.org/wiki/PBKDF2), and [PBKDF2-SHA-384](https://en.wikipedia.org/wiki/PBKDF2): **use SHA-512 if you must use PBKDF2**. MD5 and SHA-1 are old hash functions that **should not be used anymore**. Then PBKDF2-SHA-256 and PBKDF2-SHA-384 require [significantly more iterations](https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#pbkdf2) than PBKDF2-SHA-512 to be secure and have a smaller block size, meaning long passwords may get prehashed. 328 | 329 | 8. [Argon2i](https://en.wikipedia.org/wiki/Argon2) with less than 3 iterations: unlike Argon2id and Argon2d, Argon2i has been [attacked](https://en.wikipedia.org/wiki/Argon2#Cryptanalysis), with [3+ iterations](https://mailarchive.ietf.org/arch/msg/cfrg/beOzPh41Hz3cjl5QD7MSRNTi3lA/) being required for the attack to not be efficient and [11+ iterations](https://eprint.iacr.org/2016/759.pdf) being required for the attack to completely fail. Argon2i is also [weaker](https://www.rfc-editor.org/rfc/rfc9106.html#name-security-against-time-space) than both Argon2id and Argon2d when it comes to resistance against GPU/ASIC cracking. Therefore, as per the [RFC](https://www.rfc-editor.org/rfc/rfc9106.html), Argon2id should be used if you do not know the difference between the types or you consider side-channel attacks to be a viable threat but want better GPU/ASIC resistance because Argon2id offers the benefits of both Argon2i (side-channel resistance, albeit to a lesser extent) and Argon2d (GPU/ASIC resistance). 330 | 331 | 9. Chaining password hashing functions (e.g. `scrypt(PBKDF2(password))`): **this just reduces the strength of the stronger algorithm** since it means having worse parameters to get the same total delay. 332 | 333 | 10. [Balloon hashing](https://en.wikipedia.org/wiki/Balloon_hashing): arguably better than Argon2 since it's [similar in strength](https://eprint.iacr.org/2016/759.pdf) whilst having a [more impressive design](https://youtu.be/7vs47CYnDsQ) (e.g. no separate variants, resistance to cache attacks, easy to implement with standard cryptographic hash functions, and performant). Unfortunately, it has seen virtually no adoption. There seems to be no information on recommended parameters, the reference implementation is [no longer maintained](https://github.com/henrycg/balloon/issues/5#issuecomment-616425762), there are no official test vectors, there's no RFC draft, and only a handful of people have implemented the algorithm, with it not being in any popular libraries. Therefore, just use Argon2, which has now been [standardised](https://datatracker.ietf.org/doc/html/rfc9106) and widely adopted. 334 | 335 | #### Notes: 336 | 1. **Never hard-code passwords into source code**: these can be easily retrieved. 337 | 338 | 2. **Always use a random 128-bit or 256-bit salt**: salts ensure that each password hash is different, which prevents an attacker from identifying two identical passwords without cracking the hashes. Moreover, salting defends against [attacks](https://en.wikipedia.org/wiki/Password_cracking) that rely on [precomputed hashes](https://en.wikipedia.org/wiki/Rainbow_table). The typical salt size is 128 bits, but 256-bit is also fine for further reassurance that the salt won’t repeat. Anything above that is excessive, and short salts can lead to salt reuse and allow for precomputed attacks, which defeats the point of salting. 339 | 340 | 3. **Always use the highest parameters/delay you can afford**: ideally, use a delay of 250+ milliseconds. In many cases, that’s too small. For instance, PBKDF2 requires a high number of iterations because it’s not resistant to GPU/ASIC attacks, and if you’re performing a non-interactive operation (e.g. disk encryption), then you can afford longer delays, like 3-5 seconds. 341 | 342 | 4. **Avoid** string password variables: strings are [immutable](https://en.wikipedia.org/wiki/Immutable_object) (unchangeable) in many programming languages (e.g. C#, Java, JavaScript, Go, etc), meaning they can’t be zeroed out from memory. Instead, use a char array if possible and convert that into a byte array for password hashing/password-based key derivation. Then erase both arrays from memory after you’ve finished using them. Note that this is also difficult in many programming languages, as explained in point 7 of the [Symmetric Encryption Notes](#notes-1) section, but *attempting* to erase sensitive data from memory is better than doing nothing. 343 | 344 | 5. Compare passwords in **constant time**: if you ever need to compare passwords (e.g. for password re-entry in a console application), then you should use a constant time comparison function to prevent [timing attacks](https://en.wikipedia.org/wiki/Timing_attack). Sometimes these functions require both arrays to be equal in length to work correctly, in which case you can compare two MACs of the passwords calculated using the same random key; just erase the key from memory afterwards. 345 | 346 | 6. Use a 256-bit and above output length: for password storage, a 128-bit hash is normally fine, but a 256-bit output provides a better security level for high entropy passwords. For key derivation, you should derive at least a 256-bit output and perhaps more, depending on whether you need to derive multiple keys (e.g. a 256-bit encryption key and a 512-bit MAC key). 347 | 348 | 7. **Always** store the parameters (e.g. memory size, iterations, and parallelism for Argon2) with the password hash: these values don't need to be secret and are required to derive the correct hash. When storing passwords in a database, you should store these values for each user to verify the hashes and transition to stronger parameters over time as hardware improves. In [some](https://doc.libsodium.org/password_hashing/default_phf#password-storage) cryptographic libraries, this is done for you. By contrast, in a key derivation scenario, you can get away with using fixed parameters based on a version number stored as a header (e.g. file format v3 = 256 MiB of RAM and 12 iterations). Then if you want to change the parameters, you just increment the version number. 349 | 350 | 8. **Perform client-side password prehashing** for [server relief](https://doc.libsodium.org/password_hashing#server-relief) or to [hide the plaintext password from the server](https://bitwarden.com/help/article/what-encryption-is-used/#pbkdf2): when creating an account, the server can send a **random** salt to the client that’s used to perform password hashing on the client’s device. The server then performs server-side password hashing on the transmitted password hash using the same salt. Then the salt and final password hash are stored in the password database. When logging in, the server sends the stored salt to the client, the client performs client-side password hashing, the client transmits the password hash to the server, the server performs server-side password hashing using the stored salt, and then the server compares the result with the password hash stored in the database. In the event of a non-existent user, the salt that’s sent should always be the same for a given username, which involves using a MAC (e.g. keyed BLAKE2b-512), with the username as the message. 351 | 352 | 9. **Don’t** use padding to hide the length of a password when sending it to a server: instead, perform client-side password hashing if possible (please see point 8 above). If that’s not possible, then you should hash the password using a regular hash function, with the largest possible output length (e.g. BLAKE2b-512), on the client’s device, transmit the hash to the server, and perform server-side password hashing, using the transmitted hash as the password. Both techniques ensure that the amount of data transmitted is constant and prevent the server [effortlessly](https://about.fb.com/news/2019/03/keeping-passwords-secure/) obtaining a copy of the password, but client-side password prehashing should be preferred as it allows for more secure password hashing parameters and provides additional security compared to if the server leaks/stores the client-side regular/fast hash of the password. 353 | 354 | 10. **Use** [rate limiting](https://www.cloudflare.com/en-gb/learning/bots/what-is-rate-limiting/) to prevent denial of service (DOS) and bruteforce attacks: this involves blacklisting certain IP addresses and usernames from trying to log in temporarily to prevent the server being overwhelmed and to prevent attackers from bruteforcing passwords. 355 | 356 | 11. If a user can supply very long passwords, then this *can* lead to denial of service attacks: this happened to [Django](https://www.djangoproject.com/weblog/2013/sep/15/security/) in 2013. To fix this, either enforce a password length limit (e.g. 128 characters is the max) or prehash passwords using a regular/fast hashing algorithm, with the highest possible output length (e.g. BLAKE2b-512), before performing password hashing. 357 | 358 | 12. [Hash-then-Encrypt](https://github.com/paragonie/password_lock#readme) for additional security when storing passwords: you can use a **password hashing algorithm** on the password before encrypting the salt and password hash using an AEAD or Encrypt-then-MAC, with a secret key **stored separately from the password database**. This forces an attacker to decrypt the password hashes before trying to crack them. Furthermore, it means that if your secret key is ever compromised but the password hashes are not, then you can decrypt all the stored password hashes and re-encrypt them using a new key, which is easier than resetting every user’s password in the event of a pepper being compromised. 359 | 360 | 13. Use a [pepper](https://en.wikipedia.org/wiki/Pepper_(cryptography)) for additional security when deriving keys: a pepper is essentially a secret key that’s mixed with the password using a MAC (e.g. `HMAC-SHA512(message: password, key: pepper)`) before password hashing. In the case of password storage, using Hash-then-Encrypt makes more sense for the reason I explained above. By contrast, for key derivation, using a pepper is a great idea if possible because it means an additional secret is required, making a bruteforce more difficult. For instance, a keyfile in [file](https://www.kryptor.co.uk/tutorial#using-a-keyfile)/[disk](https://veracrypt.fr/en/Keyfiles%20in%20VeraCrypt.html) encryption software acts as a pepper, which improves the security of the key derivation assuming that the keyfile is stored correctly (e.g. on an encrypted memory stick away from the encrypted file/disk). 361 | 362 | ## (Non-Password-Based) Key Derivation Functions 363 | #### Use (in order): 364 | 1. [Salted BLAKE2b](https://doc.libsodium.org/key_derivation): [restricted](https://www.blake2.net/blake2.pdf) to a 128-bit `salt` and 128-bit (16 character) `personalisation` parameter for domain separation, which is annoying. However, you can feed more context information into the `message` parameter. Besides the weird context information size limit, this is easier to use than HKDF because there’s only one function rather than three, which can be confusing. Furthermore, please see the [Hashing](#hashing) section for why BLAKE2b should be preferred over other hash functions. If there's no KDF variant of BLAKE2b available in your library, then you can construct a BLAKE2b KDF using `BLAKE2b(message: salt || info || saltLength || infoLength, key: inputKeyingMaterial)`, with the `saltLength` and `infoLength` parameters being encoded as specified in point 5 of the [Message Authentication Codes Notes](#notes-2) section. Like HKDF, this custom approach allows for salt and info parameters of practically any length. 365 | 366 | 2. [HKDF-SHA512](https://en.wikipedia.org/wiki/HKDF) or [HKDF-SHA3-512](https://en.wikipedia.org/wiki/HKDF): the most popular KDF with support for a larger salt and lots of context information. However, people get confused about the difference between the `Expand` and `Extract` functions, the `salt` parameter ironically [shouldn't](https://soatok.blog/2021/11/17/understanding-hkdf/) be used to pass in the salt (please see point 5 of the [Notes](#notes-7) below), it doesn’t require a salt despite it being [recommended and beneficial for security](https://datatracker.ietf.org/doc/html/rfc5869#section-3.1), and it’s slower than salted BLAKE2b. Please see the [Hashing](#hashing) and [Message Authentication Codes](#message-authentication-codes) sections for a comparison between SHA2/SHA3 and HMAC-SHA2/HMAC-SHA3. 367 | 368 | 3. [BLAKE3](https://github.com/BLAKE3-team/BLAKE3/#the-blake3-crate-): as mentioned before, BLAKE3 has a [lower security margin](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf), but it also doesn’t have a salt parameter. With that said, [very good guidance](https://github.com/BLAKE3-team/BLAKE3#the-blake3-crate-) is given on how to produce globally unique and application specific context strings in the [official GitHub repo](https://github.com/BLAKE3-team/BLAKE3). If you'd like to use a salt, then you can construct a custom KDF implementation as explained in point 1 above. 369 | 370 | #### Avoid (not in order because they’re all bad): 371 | 1. Regular (salted or unsalted) hash functions: whilst this *can* be fine for deriving an encryption key from a Diffie-Hellman shared secret for example, it’s typically **not recommended**. Just use an actual KDF when possible as there’s less that can go wrong (e.g. there's no risk of [length extension attacks](https://en.wikipedia.org/wiki/Length_extension_attack)). 372 | 373 | 2. Password-based KDFs (e.g. [PBKDF2](https://en.wikipedia.org/wiki/PBKDF2)): if you’re not using a password, then you shouldn’t be using a password-based KDF. Password-based KDFs are designed to be slow to prevent [bruteforce attacks](https://en.wikipedia.org/wiki/Brute-force_attack), whereas non-password-based KDFs are fast because they're designed for high-entropy keys. Even with a small delay (e.g. 1 iteration of PBKDF2), this is likely slower and makes the code more confusing because an inappropriate function is being used. 374 | 375 | 3. [HChaCha20](https://doc.libsodium.org/key_derivation#nonce-extension) and [HSalsa20](https://cr.yp.to/snuffle/xsalsa-20110204.pdf): **these are not general-purpose cryptographic hash functions**, can only take a 256-bit key as input and output a 256-bit key, and are very rarely used, except in the case of implementing XChaCha20 and XSalsa20. If you want something based on ChaCha20, then use BLAKE2b or BLAKE3. 376 | 377 | #### Notes: 378 | 1. **These KDFs are not suitable for hashing passwords**: they should be used for deriving multiple subkeys from a **high-entropy** master key or converting a shared secret concatenated with the public keys used to calculate the shared secret into a cryptographically strong secret key. 379 | 380 | 2. Using the same parameters besides changing the output length can result in related outputs (e.g. for HKDF and BLAKE3): this is exactly why you shouldn’t reuse the same parameters for different keys. 381 | 382 | 3. **Use different contexts for different keys**: a good format is `[application] [date and time] [purpose]` because this means the context information is application-specific and unique, which provides domain separation. 383 | 384 | 4. Salted BLAKE2b can use a **counter** salt: if you’re deriving multiple subkeys from a master key, then you can use a counter salt starting at 0 (16 bytes of 0s) that gets incremented for each subkey. However, if you’re deriving a single key, then you may want to use a random salt. 385 | 386 | 5. Counterintuitively, the `info` parameter should be used to provide the salt for HKDF: the `salt` parameter should be [left null](https://soatok.blog/2021/11/17/understanding-hkdf/) to get the [standard security definition](https://github.com/paseto-standard/paseto-spec/blob/dfd1115170724b056b3c1ac722239cf7084755a8/docs/Rationale-V3-V4.md#better-use-of-hkdf-salts-change) for HKDF. The `info` parameter should contain the unique context information for that subkey concatenated with a [randomly generated](https://datatracker.ietf.org/doc/html/rfc5869#section-3.1) 128-bit or 256-bit salt that's used for **all** subkeys. If these parameters are not fixed in length, then follow the guidance in point 5 of the [Message Authentication Codes Notes](#notes-2) section. Using a secret salt, which is a bit like a [pepper](https://en.wikipedia.org/wiki/Pepper_(cryptography)), further improves the security guarantees. 387 | 388 | ## Key Exchange/Hybrid Encryption 389 | #### Use (in order): 390 | 1. [Curve25519/X25519](https://en.wikipedia.org/wiki/Curve25519): popular, fast, easy to implement, fixes some issues with NIST curves, not designed by NIST, and offers ~128-bit security. 391 | 392 | 2. [Curve448/X448](https://en.wikipedia.org/wiki/Curve448): less popular and slower than X25519 but provides a 224-bit security level and is also not made by NIST. Generally, there's not much reason to use this as a 128-bit security level is deemed enough for key exchange and quantum computers will break both X25519 and X448. 393 | 394 | 3. [Pre-shared symmetric keys](https://en.wikipedia.org/wiki/Pre-shared_key): this approach allows for [post-quantum security](https://media.defense.gov/2021/Aug/04/2002821837/-1/-1/1/Quantum_FAQs_20210804.PDF) and can be combined [alongside](https://www.wireguard.com/protocol/#key-exchange-and-data-packets) an asymmetric key exchange. However, using pre-shared keys can be difficult since the key must be kept secret, whereas public keys are meant to be public and can therefore be easily shared. 395 | 396 | 4. [X25519/X448 plus a post-quantum KEM](https://soatok.blog/2022/01/27/the-controversy-surrounding-hybrid-cryptography/): considering some post-quantum algorithms have been found to be considerably [easier](https://eprint.iacr.org/2020/1343) to attack than originally thought, it would be reckless to recommend switching to a post-quantum KEM alone when these algorithms need further analysis. Therefore, if you can't use a pre-shared key but want to aim for post-quantum security, then you can concatenate the classical and post-quantum key exchange outputs and pass them through a secure KDF. 397 | 398 | #### Avoid (not in order because they’re all bad): 399 | 1. [Plain RSA](https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Attacks_against_plain_RSA), [RSA PKCS#1 v1.5](https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Padding_schemes), [RSA-KEM](https://en.wikipedia.org/wiki/Key_encapsulation), and [RSA-OAEP](https://en.wikipedia.org/wiki/Optimal_asymmetric_encryption_padding): plain/textbook RSA is **insecure** for [several reasons](https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Attacks_against_plain_RSA), RSA PKCS#1 v1.5 is also **vulnerable** to some [attacks](https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Padding_schemes), and RSA-KEM and RSA-OAEP, whilst both secure *when* [implemented correctly](https://paragonie.com/blog/2018/04/protecting-rsa-based-protocols-against-adaptive-chosen-ciphertext-attacks), are still **worse than using hybrid encryption** because asymmetric encryption is slower, designed for small messages, doesn’t provide sender authentication without signatures, and requires larger keys. RSA-KEM is also never used and very rarely available in cryptographic libraries. 400 | 401 | 2. [ElGamal](https://en.wikipedia.org/wiki/ElGamal_encryption): old, very rarely used, can only be used on small messages, produces a ciphertext that’s larger than the plaintext, the design is [malleable](https://en.wikipedia.org/wiki/Malleability_(cryptography)), it's slower than hybrid encryption, and it doesn’t provide sender authentication without signatures. 402 | 403 | 3. Unknown/unavailable curves (e.g. [SIEC](https://github.com/tscholl2/siec) and [Curve41417](https://eprint.iacr.org/2014/526.pdf)): some, such as [SIEC](https://github.com/tscholl2/siec), are completely unknown and have not received sufficient cryptanalysis, so **they should be avoided at all costs**. Then many curves are [rarely used/available](https://en.wikipedia.org/wiki/Comparison_of_TLS_implementations#Supported_elliptic_curves) compared to Curve25519/X25519, P-256, P-384, and P-512. Please see the [SafeCurves](https://safecurves.cr.yp.to/index.html) tables for a security comparison of most curves. 404 | 405 | 4. [NIST curves](https://en.wikipedia.org/wiki/Elliptic-curve_cryptography#Fast_reduction_(NIST_curves)) (e.g. P-256, P-384, and P-512): although P-256 is probably the most popular curve, the seeds for these curves [haven’t been explained](https://datatracker.ietf.org/doc/html/rfc8031#section-4), which is [not a good look](https://safecurves.cr.yp.to/rigid.html) considering that [Dual_EC_DRBG](https://en.wikipedia.org/wiki/Dual_EC_DRBG) was a NIST standard despite containing an [NSA backdoor](https://en.wikipedia.org/wiki/Dual_EC_DRBG#Weakness:_a_potential_backdoor). Furthermore, these curves require [point validation](https://crypto.stackexchange.com/questions/51320/ecdh-check-points), are [harder to write implementations for](https://safecurves.cr.yp.to/), meaning libraries are more likely to contain vulnerabilities, and are slower than Curve25519/X25519, which has become increasingly popular over recent years (e.g. it's used in [TLS 1.3](https://en.wikipedia.org/wiki/Transport_Layer_Security#TLS_1.3)). **These should only be used for interoperability reasons**. 406 | 407 | 5. [SRP](https://en.wikipedia.org/wiki/Secure_Remote_Password_protocol), [J-PAKE](https://en.wikipedia.org/wiki/Password_Authenticated_Key_Exchange_by_Juggling), and other [PAKE](https://en.wikipedia.org/wiki/Password-authenticated_key_agreement) protocols: note that these are only for password-based authenticated key exchange. SRP has an [odd design](https://blog.cryptographyengineering.com/should-you-use-srp/), no meaningful security proof, cannot be instantiated on elliptic curves so is less efficient, is incompatible with TLS 1.3, and there have been many versions with vulnerabilities. Some PAKEs can allow for [pre-computation attacks](https://blog.cryptographyengineering.com/2018/10/19/lets-talk-about-pake/). Furthermore, very few cryptographic libraries include PAKEs, which makes good ones, like [OPAQUE](https://palant.info/2018/10/25/should-your-next-web-based-login-form-avoid-sending-passwords-in-clear-text/), difficult to recommend [until they receive more adoption](https://blog.cloudflare.com/opaque-oblivious-passwords/). Some people have argued PAKEs will [not](https://emilymstark.com/2020/07/30/should-web-apps-use-pakes.html) see widespread adoption, and I wouldn't be surprised if that turns out to be the case. 408 | 409 | 6. [Post-quantum algorithms](https://csrc.nist.gov/projects/post-quantum-cryptography): these are still being researched, aren’t implemented in mainstream libraries, are much slower than existing algorithms, and typically have very large key sizes. However, it will eventually make sense to switch to one in the future. For now, if post-quantum security is a goal, then use a pre-shared symmetric key if possible. 410 | 411 | #### Notes: 412 | 1. Public keys should be shared, and **private keys must be kept secret**: **never** share private keys. Please see point 9 below for details about secure storage of private keys. 413 | 414 | 2. **Never hard-code private keys into source code**: these can be easily retrieved. 415 | 416 | 3. Use one of the recommended [(non-password-based) KDFs](#non-password-based-key-derivation-functions) on the shared secret with the public keys used to calculate the shared secret as part of the context information (e.g. `BLAKE2b-256(context: constant || publicKey1 || publicKey2, inputKeyingMaterial: sharedSecret)`): **shared secrets are not suitable for use as secret keys directly** because they’re not uniformly random. Moreover, you should **include the public keys in the key derivation** because multiple public keys can result in the same shared secret. By including the public keys, you [improve the entropy](https://github.com/jedisct1/libsodium/issues/586#issuecomment-325182090) of the derived key and ensure [sender authentication](https://doc.libsodium.org/advanced/scalar_multiplication#usage). The libsodium [key exchange API](https://doc.libsodium.org/key_exchange#algorithm-details) includes the public keys for you, but many libraries, like Monocypher, do [not](https://monocypher.org/manual/key_exchange#STANDARDS). Also, remember to derive unique keys each time by using the salt and context parameters, as explained in the [(Non-Password-Based) Key Derivation Functions](#non-password-based-key-derivation-functions) section. 417 | 418 | 4. For hybrid encryption, use one of the recommended key exchange algorithms above with one of the recommended [symmetric encryption algorithms](#symmetric-encryption): for example, use X25519 with (X)ChaCha20-Poly1305. 419 | 420 | 5. When using counter nonces for encryption, use different keys for different directions in a client-server scenario: after computing the shared secret, you can use a non-password-based KDF to derive two 256-bit keys as follows: `HKDF-SHA512(inputKeyingMaterial: sharedSecret, outputLength: 64, salt: null, info: clientPublicKey || serverPublicKey)`, splitting the output in two. One key should be used by the client for sending data to the server, and the other should be used by the server for sending data to the client. Both keys need to be calculated by the client and server. This approach allows counter nonces to be used [safely](https://doc.libsodium.org/key_exchange#notes) for encryption without having to wait for an acknowledgement after every message. 421 | 422 | 6. X25519 and X448 public keys **are** distinguishable from random data: if you need to obfuscate public keys so they’re indistinguishable from random, then you need to use [Elligator2](https://elligator.cr.yp.to/) with ['dirty' keys](https://monocypher.org/manual/advanced/x25519_dirty). **You [cannot](https://github.com/covert-encryption/covert/issues/55#issue-1074806752) use vanilla/standard keys**. The easiest way to do this involves using X25519 and [Monocypher](https://monocypher.org/manual/advanced/elligator) since libsodium [doesn't and probably never will](https://github.com/jedisct1/libsodium/issues/924) support Elligator2 fully. Note that other metadata (e.g. the number of bytes in a packet) can reveal the use of cryptography too, so you should pad such information using [randomised padding](https://en.wikipedia.org/wiki/Padding_(cryptography)#Randomized_padding) or a deterministic scheme, like [PADMÉ](https://bford.info/pub/sec/purb.pdf). 423 | 424 | 7. Use an **authenticated key exchange** in most non-interactive/offline protocols: the Noise protocol framework K and X one-way handshake [patterns](http://noiseprotocol.org/noise.html#one-way-handshake-patterns), as explained [here](https://neilmadden.blog/2018/11/26/public-key-authenticated-encryption-and-why-you-want-it-part-ii/), are perfect for non-interactive/offline protocols. These achieve sender and recipient authentication whilst preventing a compromise of the sender’s private key leading to an attacker being able to decrypt the ciphertext. 425 | 426 | 8. Opt for **forward secrecy** when possible in interactive/online protocols: this prevents a compromise of a long-term private key leading to a compromise of a session key, which is the strongest security guarantee you can achieve. This can be implemented using the Noise KK or IK interactive [handshakes](https://noiseprotocol.org/noise.html#interactive-handshake-patterns-fundamental). 427 | 428 | 9. **Store private keys encrypted**: when storing a private key in a file, you should always encrypt it with a strong password for protection at rest. Things become more complicated for interactive/online scenarios, with physical or virtual [hardware security modules (HSMs)](https://en.wikipedia.org/wiki/Hardware_security_module) and key vaults, such as [AWS Key Management Service (KMS)](https://aws.amazon.com/kms/), sometimes being used. These types of solutions are generally regarded as more secure than storing keys in encrypted configuration files and allow for easy key rotation but using a KMS requires trusting a third party. 429 | 430 | 10. Key pairs should be rotated: if a private key has or may have been compromised, then a new key pair should be generated. Similarly, you should consider rotating your keys after a set period of time (a [cryptoperiod](https://en.wikipedia.org/wiki/Cryptoperiod)) has elapsed. 431 | 432 | ## Digital Signatures 433 | #### Use (in order): 434 | 1. [Ed25519](https://en.wikipedia.org/wiki/EdDSA): very popular, accessible, fast, uses small keys, produces small signatures, deterministic, and offers ~128-bit security. 435 | 436 | 2. [Ed448](https://en.wikipedia.org/wiki/EdDSA#Ed448): less popular and slower than Ed25519 but uses SHAKE256 (a SHA3 variant) instead of SHA512 for hashing and edwards448 instead of edwards25519 for the curve, meaning a 224-bit security level. 437 | 438 | #### Avoid (not in order because they’re all bad): 439 | 1. [Plain RSA](https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Attacks_against_plain_RSA), [RSA-PKCS#1 v1.5](https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Padding_schemes), and [RSA-PSS](https://en.wikipedia.org/wiki/Probabilistic_signature_scheme): plain RSA is [insecure](https://crypto.stackexchange.com/questions/20085/which-attacks-are-possible-against-raw-textbook-rsa), RSA-PKCS#1 v1.5 has [no security proof](https://en.wikipedia.org/wiki/Probabilistic_signature_scheme) and is [no longer recommended in the RFC](https://tools.ietf.org/html/rfc8017#section-8), and RSA-PSS is slow for signing and generating keys, produces larger signatures, and requires larger keys than [ECC](https://en.wikipedia.org/wiki/Elliptic-curve_cryptography) based signing algorithms. Moreover, RSA has [implementation traps](https://crypto.stanford.edu/~dabo/papers/RSA-survey.pdf). 440 | 441 | 2. [ElGamal](https://en.wikipedia.org/wiki/ElGamal_signature_scheme): old, even slower than RSA, not included in cryptographic libraries, basically not used in any software, not standardised, produces large signatures, and if the message is used directly rather than hashed, as specified in the [original paper](https://caislab.kaist.ac.kr/lecture/2010/spring/cs548/basic/B02.pdf), then that allows for [existential forgery](https://en.wikipedia.org/wiki/ElGamal_signature_scheme#Security). 442 | 443 | 3. [DSA](https://en.wikipedia.org/wiki/Digital_Signature_Algorithm): very old, becoming less and less supported, typically used with an [insecure key size](https://buttondown.email/cryptography-dispatches/archive/cryptography-dispatches-dsa-is-past-its-prime/), slower than Ed25519, requires larger keys than [ECC](https://en.wikipedia.org/wiki/Elliptic-curve_cryptography), and it's not deterministic, which has led to [serious vulnerabilities](https://en.wikipedia.org/wiki/Elliptic_Curve_Digital_Signature_Algorithm#Security) (please see below). 444 | 445 | 4. [ECDSA](https://en.wikipedia.org/wiki/Elliptic_Curve_Digital_Signature_Algorithm): slower than Ed25519 and not deterministic, which has led to [serious vulnerabilities](https://en.wikipedia.org/wiki/Elliptic_Curve_Digital_Signature_Algorithm#Security) that affected Sony’s PS3 and Bitcoin, allowing attackers to recover private keys. This issue can be prevented by properly generating a random nonce, which requires having a good [CSPRNG](https://en.wikipedia.org/wiki/Cryptographically-secure_pseudorandom_number_generator), or by deriving the nonce deterministically using [something like HMAC](https://datatracker.ietf.org/doc/html/rfc6979#section-3). However, there’s been a shift to Ed25519 because it prevents this issue from happening as well as being better in other respects. Furthermore, there’s also the concern mentioned in the [Key Exchange/Hybrid Encryption](#key-exchangehybrid-encryption) Avoid section that the NIST curves use [unexplained seeds](https://datatracker.ietf.org/doc/html/rfc8031#section-4), which is [not a good look](https://safecurves.cr.yp.to/rigid.html) considering that [Dual_EC_DRBG](https://en.wikipedia.org/wiki/Dual_EC_DRBG) was a NIST standard despite containing an [NSA backdoor](https://en.wikipedia.org/wiki/Dual_EC_DRBG#Weakness:_a_potential_backdoor). 446 | 447 | 5. [Post-quantum algorithms](https://csrc.nist.gov/projects/post-quantum-cryptography): these are still being researched, aren’t implemented in mainstream libraries, are much slower, and typically have very large key sizes. However, it will eventually make sense to switch to one in the future. 448 | 449 | #### Notes: 450 | 1. Please read points 1, 2, 9, and 10 of the [Key Exchange/Hybrid Encryption Notes](#notes-8) section because all these points about key pairs/private keys apply for signature algorithms as well. 451 | 452 | 2. Use authenticated hybrid encryption (an authenticated key exchange with authenticated encryption) instead of encryption with signatures: this is easier to get right and more efficient. 453 | 454 | 3. Use Sign-then-Encrypt **if you must use signatures with encryption** to provide sender authentication: Encrypt-then-Sign can allow an attacker to strip off the original signature and replace it with their own. For symmetric encryption, Sign-then-Encrypt-then-MAC, which involves signing the message, appending the signature to the message, and using either Encrypt-then-MAC or an AEAD, prevents this problem. Similarly, if you’re forced to use asymmetric encryption, then you can still use Sign-then-Encrypt but should include the [recipient’s name or the sender and recipient's names in the message](https://theworld.com/~dtd/sign_encrypt/sign_encrypt7.html) because the recipient needs proof that the same person signed and encrypted the message. Once the signature and encryption layers are bound together, an attacker can't remove and replace the outer layer because the reference in the inner layer will reveal the tampering. Alternatively, you can Encrypt-then-Sign-then-Encrypt or Sign-then-Encrypt-then-Sign, which are both slower. 455 | 456 | 4. **Don’t** use the same key pair for signatures (e.g. Ed25519) and key exchange (e.g. X25519): it’s [recommended](https://github.com/jedisct1/libsodium/issues/632#issuecomment-345272065) to **never use the same key for more than one thing** in cryptography. **The security of using the same key pair for these two algorithms has not been (sufficiently) studied**, signing key pairs and encryption key pairs often have different life cycles, and using different key pairs limits the damage done if one key pair is compromised. Since the keys are so small, using different key pairs produces barely any overhead as well. The only time you should really convert an Ed25519 key pair to an X25519 key pair is if you’re [heavily resource constrained](https://monocypher.org/manual/advanced/from_eddsa) or when you’re forced to use Ed25519 keys (e.g. SSH public keys off GitHub could be used for hybrid encryption). 457 | 458 | 5. Prehash large messages: signing a message normally requires loading the entire message into memory, but this can be problematic for very large (e.g. 1+ GiB) messages. To solve this problem, you can use [Ed25519ph](https://doc.libsodium.org/public-key_cryptography/public-key_signatures#multi-part-messages) or Ed448ph (which probably isn't available) to perform the prehashing for you with some additional domain separation, or you can prehash the message yourself using a strong, modern hash function, like BLAKE2b or SHA3, with a 512-bit output length and sign the hash instead of the message. However, note that not prehashing means that Ed25519 is [resistant to collisions in the hash function](https://ed25519.cr.yp.to/eddsa-20150704.pdf). Therefore, when possible, ordinary signing should arguably be preferred for additional protection, although this isn't realistically a problem if you use a secure hash function. 459 | 460 | 6. Be aware of [fault attacks](https://eprint.iacr.org/2017/1014.pdf) against deterministic signatures: techniques like causing [voltage glitches](https://cybermashup.files.wordpress.com/2017/10/practical-fault-attack-against-eddsa_fdtc-2017.pdf) on a chip (e.g. on an Arduino) can be used to recover either the entire secret key or part of the secret key, depending on the signature algorithm, and create valid signatures with algorithms like Ed25519, Ed448, and deterministic ECDSA. However, this is primarily a concern on embedded devices and requires [physical or remote access](https://eprint.iacr.org/2017/1014.pdf) to a device. Four [countermeasures](https://crypto.stackexchange.com/questions/50228/can-deterministic-ecdsa-be-protected-against-fault-attacks?rq=1) include signing the same data twice and comparing the outputs, which is obviously slower than signing once, verifying the signature after signing, which is [slower](https://eprint.iacr.org/2017/1014.pdf) than signing twice for small messages but [faster](https://eprint.iacr.org/2017/1014.pdf) for large messages, [calculating a checksum over input values](https://eprint.iacr.org/2017/1014.pdf) before and after signature generation, or using a cryptographic library that implements the algorithm with some random data in the calculation of the nonce, which is the technique used by [Signal](https://signal.org/docs/specifications/xeddsa/#security-considerations). However, these countermeasures are [not guaranteed to be effective](https://eprint.iacr.org/2017/985.pdf), and there can be other [side-channel attacks](https://en.wikipedia.org/wiki/Side-channel_attack) as well. 461 | 462 | ## Asymmetric Key Size 463 | #### Use (in order): 464 | 1. 256-bit keys: the key size for X25519, which provides a ~128-bit security level. Why am I recommending this when I recommend 256-bit keys (a 256-bit security level) for symmetric encryption? Because 128-bit security means [something different](https://github.com/LoupVaillant/Monocypher/issues/127#issuecomment-536200435) in the case of these asymmetric algorithms. Furthermore, X25519 is faster, more common, and [more accessible](https://en.wikipedia.org/wiki/Comparison_of_TLS_implementations#Supported_elliptic_curves) than X448. Finally, when quantum computers do come along, [ECC](https://en.wikipedia.org/wiki/Elliptic-curve_cryptography) and RSA will be broken regardless of the key size anyway, so many people feel less of a need to use a higher security level curve. 465 | 466 | 2. 456-bit keys: the key size for X448, which provides a 224-bit security level. 467 | 468 | 3. 3072-bit or 4096-bit keys: **if you’re forced to use RSA**, then the minimum key size should be 3072-bit, which is the key size [currently used by the NSA](https://www.keylength.com/en/6/) and recommended by ECRYPT for [near term protection](https://www.keylength.com/en/3/). The maximum should be 4096-bit because the performance is really bad after that. However, **seriously don’t use RSA!** 469 | 470 | #### Avoid (not in order because they’re all bad): 471 | 1. 1024-bit keys: these are **no longer secure**. 472 | 473 | 2. 2048-bit keys: these only provide a 112-bit security level, which is below the standard 128-bit security level. Therefore, whilst commonly used and still safe as a **minimum** RSA key size, it makes sense to use 3072-bit keys instead. 474 | 475 | 3. 8192-bit keys: these are slow to generate and excessive to store. 476 | 477 | 4. Post-quantum algorithm key sizes: these algorithms are still being researched, and the key sizes are very large compared to those for [ECDH](https://en.wikipedia.org/wiki/Elliptic-curve_Diffie%E2%80%93Hellman). 478 | 479 | ## Concluding Remarks 480 | I believe there are three main areas of improvement when it comes to individuals with experience in cryptography helping developers: 481 | 482 | 1. Cryptographic libraries **should** be better: most don’t make it easy to use cryptography safely (e.g. they support insecure algorithms, require nonces, etc) and have [horrible documentation](https://www.openssl.org/docs/). This shouldn’t be the case, and there really should be greater uproar about this. Things need to be secure by default (e.g. insecure algorithms should never be implemented or get removed), and the documentation needs to be readable, as in concise, helpful to people of all skill levels, presented on a modern looking website rather than using [basic HTML](https://nacl.cr.yp.to/index.html) or [a bunch of files on GitHub](https://github.com/google/tink/tree/master/docs), and easy to navigate (e.g. supporting search functionality like [GitBook](https://doc.libsodium.org/) does). 483 | 484 | 2. People **should** stop saying 'don't roll your own crypto': **repeating this phrase doesn't help anyone**. Instead, educate developers about how to do things properly, whether that be by answering questions on [Cryptography Stack Exchange](https://crypto.stackexchange.com/) in an understandable manner, writing a [blog](https://soatok.blog/), or replying to emails from people asking for help. It's not a crime to implement Encrypt-then-MAC, and even when someone writes a custom cipher, **you should explain why that's not a good idea** (e.g. 'professional cryptographers still design insecure algorithms'). 485 | 486 | 3. There **should** be more peer review: it's often difficult to receive peer review, impossible to fund a bug bounty program with cash rewards, and extremely unlikely for projects to get funding for a code audit. Whilst developers who fail to do any reading related to cryptography obviously deserve criticism, [even experienced professionals make mistakes](https://github.com/agl/ed25519/issues/27). Simple peer review (e.g. using the search on GitHub for things like 'HMAC' and 'ECB') helps catch things that are easy to spot, and more thorough peer review helps catch things that even someone experienced [might have missed](https://github.com/str4d/rage/issues/195). If something seems dodgy, then you should investigate if possible. 487 | --------------------------------------------------------------------------------