├── README.md ├── meta.md ├── ipcipher.md.html ├── meta.md.html └── ipcipher.md /README.md: -------------------------------------------------------------------------------- 1 | ipcipher.md -------------------------------------------------------------------------------- /meta.md: -------------------------------------------------------------------------------- 1 | meta.md.html -------------------------------------------------------------------------------- /ipcipher.md.html: -------------------------------------------------------------------------------- 1 | ipcipher.md -------------------------------------------------------------------------------- /meta.md.html: -------------------------------------------------------------------------------- 1 | 2 | **ipcipher: discussion and guidance** 3 | 4 | ipcipher 5 | ======== 6 | `ipcipher` is a simple way to encrypt IPv4 and IPv6 addresses such 7 | that any address encrypts to a valid address. This enables existing tools 8 | to be used on encrypted IPv4 and IPv6 addresses. 9 | 10 | The protocol is described [here](ipcipher.md.html). 11 | 12 | This page is about how and when to use the protocol, and what guarantees it 13 | does and does not offer. 14 | 15 | Applicability 16 | ============= 17 | `ipcipher` is meant to be enable the analysis of traces, pcaps, logfiles etc 18 | containing customer IP addresses, without revealing those actual (customer) 19 | IP addresses. Given privacy trends around the world and specifically the 20 | advent of [EU GDPR](https://www.eugdpr.org/), it is more and more important 21 | to protect personally identifying information. Worth nothing, the GDPR 22 | touches specifically on how pseudonymization is part of [privacy by 23 | design](https://iapp.org/news/a/top-10-operational-impacts-of-the-gdpr-part-8-pseudonymization/) 24 | and how it can protect data if it leaks. 25 | 26 | `ipcipher` is not in any way meant as an encryption algorithm that enables 27 | the public dissemination of traces without impacting user privacy. It should 28 | not be compared to, say, AES or Salsa20. 29 | 30 | In other words, `ipcipher` encrypted IP addresses must still be protected. 31 | This is because of inherent limitations of 32 | '[pseudonymisation](https://en.wikipedia.org/wiki/Pseudonymization)'. 33 | 34 | Limitations of pseudonimity 35 | =========================== 36 | 37 | From the [Wikipedia](https://en.wikipedia.org/wiki/Pseudonymization): 38 | 39 | > Pseudonymization is a procedure by which the most identifying fields 40 | > within a data record are replaced by one or more artificial identifiers, 41 | > or pseudonyms. (...) The purpose is to render the data record less 42 | > identifying and therefore lower customer or patient objections to its use. 43 | > Data in this form is suitable for extensive analytics and processing. 44 | 45 | Pseudonymous data can be analysed just like regular data. So for example, a 46 | PCAP of a user-originated denial of service attack can be studied using 47 | regular tools to identify the pseudonymous IP addresses performing the 48 | attack. 49 | 50 | These IP addresses can subsequently be decrypted to identify the actual 51 | culprits. 52 | 53 | If however the PCAP file contains more information, it may be possible to 54 | deanonymize the encrypted IP addresses. For example, HTTP requests may 55 | contain the actual user IP address as a referrer. This ties the encrypted 56 | address to the original, and breaks pseudonimization. 57 | 58 | Another famous example was the [AOL search data 59 | leak](https://en.wikipedia.org/wiki/AOL_search_data_leak) in which AOL 60 | released search queries for AOL user-ids, without revealing user names. It 61 | rapidly proved possible however to identify specific users based on their 62 | search traffic. 63 | 64 | Chosen plaintext attack 65 | ======================= 66 | Since there are only around 4.3 billion IPv4 addresses, if an attacker can 67 | both determine what IP addresses appear in an `ipcipher` encrypted log, and 68 | have access to that log, the algorithm fails at least for IPv4. This is 69 | because the attacker can enumerate each and every IPv4 address to see how it 70 | ends up in the log. 71 | 72 | IPv6 is not suitable for enumeration, but a chosen plaintext attack could 73 | still be used to verify a potential actual IP address candidate. 74 | 75 | Suggested doctrine 76 | ================== 77 | It is suggested that a new passphrase is used whenever new data is encrypted 78 | for analysis. This minimizes the opportunity for attackers to benefit from 79 | previous re-identification efforts. 80 | 81 | The procedure is then as follows: 82 | 83 | * Collect data to be analysed 84 | * Create a random passphrase, possibly using `pwgen` 85 | * Store passphrase securely 86 | * Encrypt IP addresses in trace/log/pcap 87 | * Send data off for analysis 88 | * If analysts have found interesting pseudonomyzed IP addresses, decrypt 89 | using stored key 90 | * Analysis team destroys encrypted trace data 91 | * Passphrase can be destroyed 92 | 93 | Specific things to watch out for 94 | ================================ 95 | IP addresses should be encrypted wherever they appear in a trace or a log. 96 | This is not always easy. Of specific note, ICMP messages may for example 97 | contain another copy of the source or destination IP address of the packet. 98 | So when encrypting PCAPs, make sure to drop any traffic known to contain 99 | further copies of actual IP addresses that you aren't also encrypting. 100 | 101 | Of specific note, tunnelled but unencrypted traffic (GRE, VXLAN, IPIP, SIT) 102 | is guaranteed to carry further IP addresses 'inside'. 103 | 104 | It is advised to restrict PCAP captures to only the intended protocol (say, 105 | DNS). 106 | 107 | When encrypting text-based log files, be sure to encrypt not only 1.2.3.4 108 | but also 1.2.3.4:25. Similarly, ::1 should be encrypted, but also [::1]:25 109 | and ::1#25, as well as fe80::1%eth0. 110 | 111 | Additionally, when stamping out IPv4 or IPv6 addresses in data structures 112 | with checksums (like IP headers), be sure to also update (or zero) those 113 | checksums, as these may provide a weak or even strong indication of the 114 | original IP address. 115 | 116 | Protection level that should be accorded to encrypted IP addresses 117 | ================================================================== 118 | In general, the more data is stored, the higher should the protection level 119 | be, even with encrypted IP addresses. 120 | 121 | As an example, a trace of 100 DNS packets from 100 different IP addresses 122 | does not offer a lot of scope for deanonymization. Hower, a full day 123 | recording of millions of IP addresses should not be shared as if it can't be 124 | deanonymized. 125 | 126 | A lesser risk can already be achieved by encrypting 24 hours of IP 127 | adddresses with 24 different keys, for example. 128 | 129 | In general however, should data leak, damage will be significantly less if 130 | IP addresses were encrypted than if they had not been encrypted. 131 | 132 | Another way to look at it is that encrypting IP addresses is always a win, 133 | unless the traces are suddenly shared more widely than before. 134 | 135 | 137 | 138 | -------------------------------------------------------------------------------- /ipcipher.md: -------------------------------------------------------------------------------- 1 | 2 | **ipcipher: encrypting IP addresses** 3 | 4 | STATUS: This standard is open for discussion. We hope to finalize it 5 | quickly - bert.hubert@powerdns.com / 6 | [@PowerDNS_Bert](https://twitter.com/PowerDNS_Bert). 7 | 8 | ipcipher 9 | ======== 10 | This page documents a simple way to encrypt IPv4 and IPv6 addresses such 11 | that any address encrypts to a valid address. This enables existing tools 12 | to be used on encrypted IPv4 and IPv6 addresses. 13 | 14 | There are many ways to do this, especially for IPv6, but the method 15 | described here is simple and interoperable. This page: 16 | 17 | * Describes the algorithms used to encrypt/decrypt IP addresses 18 | * Specifies how to derive the key from a password 19 | * Links to reference implementations in various languages 20 | * Provides a set of published test vectors to test interoperabilty 21 | 22 | In order to enhance interoperability, implementations that want to encrypt 23 | IP addresses are encouraged to do so using this 'ipcipher' standard. 24 | 25 | Known implementations: 26 | 27 | * [In Go, by Silke Hofstra](https://github.com/silkeh/ipcipher) 28 | * PowerDNS 29 | 30 | Discussion on how and when to use `ipcipher` can be found in the 31 | [meta](meta.md.html) document. 32 | 33 | Acknowledgements 34 | ================ 35 | Silke Hofstra built the first interoperable implementation and found many 36 | mistakes in the specification and test vectors. Jean-Philippe Aumasson 37 | supplied the `ipcrypt` algorithm & guidance on key derivation. Further thanks to: 38 | Frank Denis for providing the C implementation of `ipcrypt` and general 39 | advice, Edwin van Vliet for noting the risk of checksums providing hint of 40 | old IP address. 41 | 42 | 43 | Why encrypt IP addresses? 44 | ========================= 45 | Frequently, privacy concerns and regulations get in the way of security 46 | analysis. Privacy is important, but so is security. Compromised systems 47 | eventually also harm privacy. 48 | 49 | Per-customer/subscriber traces are extremely useful for researching the 50 | security of networks. However, privacy officers rightly object the 51 | unbridled sharing of which IP address did what. 52 | 53 | One potential solution is to encrypt IP addresses in log files or PCAPs with 54 | a secret key. Crucially, this can be done in a way that the IP addresses 55 | still look like IP addresses, and can be stored 'in place'. 56 | 57 | The encryption key is held by the privacy officer, or their department, and 58 | if based on encrypted IP addresses something interesting is found, the 59 | address can be decrypted for further action. 60 | 61 | The needs and merits of IP encryption are further explored in '[On IP address encryption: security analysis with respect for 62 | privacy](https://medium.com/@bert.hubert/on-ip-address-encryption-security-analysis-with-respect-for-privacy-dabe1201b476)'. 63 | Importantly, this also touches on inherent limitations of encrypting IP 64 | addresses for privacy. 65 | 66 | Guidance on how to use `ipcipher` can be found [here](meta.md.html). 67 | 68 | Key derivation 69 | ============== 70 | Both IPv4 and IPv6 encryption use a 128-bit key. To derive this key from the 71 | passphrase, use PBKDF2 as follows: 72 | 73 | ``` 74 | DK = PBKDF2(SHA1, Password, "ipcipheripcipher", 50000, 16) 75 | ``` 76 | 77 | Or in words, RFC 2898 with SHA1 as hashing function, `ipcipheripcipher` as 78 | salt, 50000 iterations, 16 bytes of key `DK`. In OpenSSL this 79 | corresponds to: 80 | 81 | ``` 82 | static const char salt[]="ipcipheripcipher"; 83 | unsigned char out[16]; 84 | PKCS5_PBKDF2_HMAC_SHA1(passwordptr, passwordlen, (const unsigned char*)salt, sizeof(salt)-1, 50000, sizeof(out), out); 85 | 86 | ``` 87 | 88 | The key derivation step is not optional. The `ipcrypt` algorithm used for 89 | IPv4 requires a fully randomized key and is not secure without it. In 90 | addition, PBKDF2 protects against brute forcing of the passphrase. 91 | 92 | Some test vectors for key derivation, where first entry is an empty string: 93 | 94 | * "" -> bb 8d cd 7b e9 a6 f4 3b 33 04 c6 40 d7 d7 10 3c 95 | * "3.141592653589793" -> 37 05 bd 6c 0e 26 a1 a8 39 89 8f 1f a0 16 a3 74 96 | * "crypto is not a coin" -> 06 c4 ba d2 3a 38 b9 e0 ad 9d 05 90 b0 a3 d9 3a 97 | 98 | Take care not to process a possible trailing 0 in the password (or salt). 99 | 100 | Note: it is of course also possible to use a fully random 128-bit key that 101 | is not derived from a passphrase. This offers some security advantages too, 102 | as the full 128-bit keyspace is used. Implementations are encouraged to make 103 | it possible to either provide a passphrase or a 128-bit string, but be 104 | careful that it is not possible to disambiguate between these two 105 | automatically! 106 | 107 | IPv4 algorithm 108 | ============== 109 | An IPv4 address is a 32 bit value, and to encrypt it to another IPv4 address 110 | we need a block cipher that is 32 bit native. A modern and suitable 111 | algorithm is '[ipcrypt](https://github.com/veorq/ipcrypt)' by [Jean-Philippe 112 | Aumasson](https://aumasson.jp/). ipcrypt was inspired by 113 | [SipHash](https://en.wikipedia.org/wiki/SipHash) (which was invented by 114 | Aumasson and Dan J. Bernstein). 115 | 116 | ipcrypt uses a 128 bit key, there is no padding, no cipher modes or anything 117 | else. 118 | 119 | Implementations: 120 | 121 | * [C](https://github.com/jedisct1/c-ipcrypt) by Frank Denis 122 | * [Go](https://github.com/veorq/ipcrypt) by Jean-Philippe Aumasson 123 | * [Python](https://github.com/veorq/ipcrypt) by Jean-Philippe Aumasson 124 | * [Rust](https://github.com/stbuehler/rust-ipcrypt) by Stefan Bühler 125 | 126 | Note that the (combined) Python and Go repository also includes command line 127 | tools. 128 | 129 | Test vectors using the derived key "some 16-byte key" (minus the quotes): 130 | 131 | * 127.0.0.1 -> 114.62.227.59 132 | * 8.8.8.8 -> 46.48.51.50 133 | * 1.2.3.4 -> 171.238.15.199 134 | 135 | Using the following key in hex: 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 136 | 10 137 | 138 | * Start with IP address 192.168.69.42 and encrypt it 100 million times -> 139 | 93.155.197.186 (so keep on encrypting the encrypted address) 140 | 141 | Using the password "crypto is not a coin": 142 | 143 | * 198.41.0.4 -> 139.111.117.167 144 | * 130.161.180.1 -> 66.235.221.231 145 | * 0.0.0.0 -> 203.253.152.187 146 | 147 | Note that this password needs to be used to derive the actual key first. 148 | 149 | IPv6 algorithm 150 | ============== 151 | IPv6 addresses are 128 bits, and there is a wealth of suitable algorithms 152 | available. AES-128 is robust and widely available, and more than good 153 | enough. 154 | 155 | AES is typically deployed in a mode like Cipher Block Chaining, but no such 156 | mode is required to encrypt IP addresses. A straight AES operation is used, 157 | with no further XORing, as in Electronic Code Book "mode". 158 | 159 | AES is almost always already available. To get a raw AES-128 encryption 160 | operation out of OpenSSL or its variants: 161 | 162 | ``` 163 | AES_KEY wctx; 164 | AES_set_encrypt_key(key, 128, &wctx); 165 | AES_encrypt((const unsigned char*)&ca.sin6.sin6_addr.s6_addr, 166 | (unsigned char*)&ret.sin6.sin6_addr.s6_addr, &wctx); 167 | ``` 168 | 169 | Decryption is the same, with the obvious s/encrypt/decrypt/ change. 170 | 171 | There is as yet no command line tool that performs these operations, 172 | although PowerDNS `pdnsutil` will feature this in the 4.2 release. 173 | 174 | Test vectors using the key "some 16-byte key": 175 | 176 | * ::1 -> 3718:8853:1723:6c88:7e5f:2e60:c79a:2bf 177 | * 2001:503:ba3e::2:30 -> 64d2:883d:ffb5:dd79:24b:943c:22aa:4ae7 178 | * 2001:DB8:: -> ce7e:7e39:d282:e7b1:1d6d:5ca1:d4de:246f 179 | 180 | Using the password "crypto is not a coin": 181 | 182 | * ::1 -> a551:9cb0:c9b:f6e1:6112:58a:af29:3a6c 183 | * 2001:503:ba3e::2:30 -> 6e60:2674:2fac:d383:f9d5:dcfe:fc53:328e 184 | * 2001:DB8:: -> a8f5:16c8:e2ea:23b9:748d:67a2:4107:9d2e 185 | 186 | Note that this password needs to be used to derive the key first. 187 | 188 | 190 | 191 | --------------------------------------------------------------------------------