├── README.md
├── meta.md
├── ipcipher.md.html
├── meta.md.html
└── ipcipher.md
/README.md:
--------------------------------------------------------------------------------
1 | ipcipher.md
--------------------------------------------------------------------------------
/meta.md:
--------------------------------------------------------------------------------
1 | meta.md.html
--------------------------------------------------------------------------------
/ipcipher.md.html:
--------------------------------------------------------------------------------
1 | ipcipher.md
--------------------------------------------------------------------------------
/meta.md.html:
--------------------------------------------------------------------------------
1 |
2 | **ipcipher: discussion and guidance**
3 |
4 | ipcipher
5 | ========
6 | `ipcipher` is a simple way to encrypt IPv4 and IPv6 addresses such
7 | that any address encrypts to a valid address. This enables existing tools
8 | to be used on encrypted IPv4 and IPv6 addresses.
9 |
10 | The protocol is described [here](ipcipher.md.html).
11 |
12 | This page is about how and when to use the protocol, and what guarantees it
13 | does and does not offer.
14 |
15 | Applicability
16 | =============
17 | `ipcipher` is meant to be enable the analysis of traces, pcaps, logfiles etc
18 | containing customer IP addresses, without revealing those actual (customer)
19 | IP addresses. Given privacy trends around the world and specifically the
20 | advent of [EU GDPR](https://www.eugdpr.org/), it is more and more important
21 | to protect personally identifying information. Worth nothing, the GDPR
22 | touches specifically on how pseudonymization is part of [privacy by
23 | design](https://iapp.org/news/a/top-10-operational-impacts-of-the-gdpr-part-8-pseudonymization/)
24 | and how it can protect data if it leaks.
25 |
26 | `ipcipher` is not in any way meant as an encryption algorithm that enables
27 | the public dissemination of traces without impacting user privacy. It should
28 | not be compared to, say, AES or Salsa20.
29 |
30 | In other words, `ipcipher` encrypted IP addresses must still be protected.
31 | This is because of inherent limitations of
32 | '[pseudonymisation](https://en.wikipedia.org/wiki/Pseudonymization)'.
33 |
34 | Limitations of pseudonimity
35 | ===========================
36 |
37 | From the [Wikipedia](https://en.wikipedia.org/wiki/Pseudonymization):
38 |
39 | > Pseudonymization is a procedure by which the most identifying fields
40 | > within a data record are replaced by one or more artificial identifiers,
41 | > or pseudonyms. (...) The purpose is to render the data record less
42 | > identifying and therefore lower customer or patient objections to its use.
43 | > Data in this form is suitable for extensive analytics and processing.
44 |
45 | Pseudonymous data can be analysed just like regular data. So for example, a
46 | PCAP of a user-originated denial of service attack can be studied using
47 | regular tools to identify the pseudonymous IP addresses performing the
48 | attack.
49 |
50 | These IP addresses can subsequently be decrypted to identify the actual
51 | culprits.
52 |
53 | If however the PCAP file contains more information, it may be possible to
54 | deanonymize the encrypted IP addresses. For example, HTTP requests may
55 | contain the actual user IP address as a referrer. This ties the encrypted
56 | address to the original, and breaks pseudonimization.
57 |
58 | Another famous example was the [AOL search data
59 | leak](https://en.wikipedia.org/wiki/AOL_search_data_leak) in which AOL
60 | released search queries for AOL user-ids, without revealing user names. It
61 | rapidly proved possible however to identify specific users based on their
62 | search traffic.
63 |
64 | Chosen plaintext attack
65 | =======================
66 | Since there are only around 4.3 billion IPv4 addresses, if an attacker can
67 | both determine what IP addresses appear in an `ipcipher` encrypted log, and
68 | have access to that log, the algorithm fails at least for IPv4. This is
69 | because the attacker can enumerate each and every IPv4 address to see how it
70 | ends up in the log.
71 |
72 | IPv6 is not suitable for enumeration, but a chosen plaintext attack could
73 | still be used to verify a potential actual IP address candidate.
74 |
75 | Suggested doctrine
76 | ==================
77 | It is suggested that a new passphrase is used whenever new data is encrypted
78 | for analysis. This minimizes the opportunity for attackers to benefit from
79 | previous re-identification efforts.
80 |
81 | The procedure is then as follows:
82 |
83 | * Collect data to be analysed
84 | * Create a random passphrase, possibly using `pwgen`
85 | * Store passphrase securely
86 | * Encrypt IP addresses in trace/log/pcap
87 | * Send data off for analysis
88 | * If analysts have found interesting pseudonomyzed IP addresses, decrypt
89 | using stored key
90 | * Analysis team destroys encrypted trace data
91 | * Passphrase can be destroyed
92 |
93 | Specific things to watch out for
94 | ================================
95 | IP addresses should be encrypted wherever they appear in a trace or a log.
96 | This is not always easy. Of specific note, ICMP messages may for example
97 | contain another copy of the source or destination IP address of the packet.
98 | So when encrypting PCAPs, make sure to drop any traffic known to contain
99 | further copies of actual IP addresses that you aren't also encrypting.
100 |
101 | Of specific note, tunnelled but unencrypted traffic (GRE, VXLAN, IPIP, SIT)
102 | is guaranteed to carry further IP addresses 'inside'.
103 |
104 | It is advised to restrict PCAP captures to only the intended protocol (say,
105 | DNS).
106 |
107 | When encrypting text-based log files, be sure to encrypt not only 1.2.3.4
108 | but also 1.2.3.4:25. Similarly, ::1 should be encrypted, but also [::1]:25
109 | and ::1#25, as well as fe80::1%eth0.
110 |
111 | Additionally, when stamping out IPv4 or IPv6 addresses in data structures
112 | with checksums (like IP headers), be sure to also update (or zero) those
113 | checksums, as these may provide a weak or even strong indication of the
114 | original IP address.
115 |
116 | Protection level that should be accorded to encrypted IP addresses
117 | ==================================================================
118 | In general, the more data is stored, the higher should the protection level
119 | be, even with encrypted IP addresses.
120 |
121 | As an example, a trace of 100 DNS packets from 100 different IP addresses
122 | does not offer a lot of scope for deanonymization. Hower, a full day
123 | recording of millions of IP addresses should not be shared as if it can't be
124 | deanonymized.
125 |
126 | A lesser risk can already be achieved by encrypting 24 hours of IP
127 | adddresses with 24 different keys, for example.
128 |
129 | In general however, should data leak, damage will be significantly less if
130 | IP addresses were encrypted than if they had not been encrypted.
131 |
132 | Another way to look at it is that encrypting IP addresses is always a win,
133 | unless the traces are suddenly shared more widely than before.
134 |
135 |
137 |
138 |
--------------------------------------------------------------------------------
/ipcipher.md:
--------------------------------------------------------------------------------
1 |
2 | **ipcipher: encrypting IP addresses**
3 |
4 | STATUS: This standard is open for discussion. We hope to finalize it
5 | quickly - bert.hubert@powerdns.com /
6 | [@PowerDNS_Bert](https://twitter.com/PowerDNS_Bert).
7 |
8 | ipcipher
9 | ========
10 | This page documents a simple way to encrypt IPv4 and IPv6 addresses such
11 | that any address encrypts to a valid address. This enables existing tools
12 | to be used on encrypted IPv4 and IPv6 addresses.
13 |
14 | There are many ways to do this, especially for IPv6, but the method
15 | described here is simple and interoperable. This page:
16 |
17 | * Describes the algorithms used to encrypt/decrypt IP addresses
18 | * Specifies how to derive the key from a password
19 | * Links to reference implementations in various languages
20 | * Provides a set of published test vectors to test interoperabilty
21 |
22 | In order to enhance interoperability, implementations that want to encrypt
23 | IP addresses are encouraged to do so using this 'ipcipher' standard.
24 |
25 | Known implementations:
26 |
27 | * [In Go, by Silke Hofstra](https://github.com/silkeh/ipcipher)
28 | * PowerDNS
29 |
30 | Discussion on how and when to use `ipcipher` can be found in the
31 | [meta](meta.md.html) document.
32 |
33 | Acknowledgements
34 | ================
35 | Silke Hofstra built the first interoperable implementation and found many
36 | mistakes in the specification and test vectors. Jean-Philippe Aumasson
37 | supplied the `ipcrypt` algorithm & guidance on key derivation. Further thanks to:
38 | Frank Denis for providing the C implementation of `ipcrypt` and general
39 | advice, Edwin van Vliet for noting the risk of checksums providing hint of
40 | old IP address.
41 |
42 |
43 | Why encrypt IP addresses?
44 | =========================
45 | Frequently, privacy concerns and regulations get in the way of security
46 | analysis. Privacy is important, but so is security. Compromised systems
47 | eventually also harm privacy.
48 |
49 | Per-customer/subscriber traces are extremely useful for researching the
50 | security of networks. However, privacy officers rightly object the
51 | unbridled sharing of which IP address did what.
52 |
53 | One potential solution is to encrypt IP addresses in log files or PCAPs with
54 | a secret key. Crucially, this can be done in a way that the IP addresses
55 | still look like IP addresses, and can be stored 'in place'.
56 |
57 | The encryption key is held by the privacy officer, or their department, and
58 | if based on encrypted IP addresses something interesting is found, the
59 | address can be decrypted for further action.
60 |
61 | The needs and merits of IP encryption are further explored in '[On IP address encryption: security analysis with respect for
62 | privacy](https://medium.com/@bert.hubert/on-ip-address-encryption-security-analysis-with-respect-for-privacy-dabe1201b476)'.
63 | Importantly, this also touches on inherent limitations of encrypting IP
64 | addresses for privacy.
65 |
66 | Guidance on how to use `ipcipher` can be found [here](meta.md.html).
67 |
68 | Key derivation
69 | ==============
70 | Both IPv4 and IPv6 encryption use a 128-bit key. To derive this key from the
71 | passphrase, use PBKDF2 as follows:
72 |
73 | ```
74 | DK = PBKDF2(SHA1, Password, "ipcipheripcipher", 50000, 16)
75 | ```
76 |
77 | Or in words, RFC 2898 with SHA1 as hashing function, `ipcipheripcipher` as
78 | salt, 50000 iterations, 16 bytes of key `DK`. In OpenSSL this
79 | corresponds to:
80 |
81 | ```
82 | static const char salt[]="ipcipheripcipher";
83 | unsigned char out[16];
84 | PKCS5_PBKDF2_HMAC_SHA1(passwordptr, passwordlen, (const unsigned char*)salt, sizeof(salt)-1, 50000, sizeof(out), out);
85 |
86 | ```
87 |
88 | The key derivation step is not optional. The `ipcrypt` algorithm used for
89 | IPv4 requires a fully randomized key and is not secure without it. In
90 | addition, PBKDF2 protects against brute forcing of the passphrase.
91 |
92 | Some test vectors for key derivation, where first entry is an empty string:
93 |
94 | * "" -> bb 8d cd 7b e9 a6 f4 3b 33 04 c6 40 d7 d7 10 3c
95 | * "3.141592653589793" -> 37 05 bd 6c 0e 26 a1 a8 39 89 8f 1f a0 16 a3 74
96 | * "crypto is not a coin" -> 06 c4 ba d2 3a 38 b9 e0 ad 9d 05 90 b0 a3 d9 3a
97 |
98 | Take care not to process a possible trailing 0 in the password (or salt).
99 |
100 | Note: it is of course also possible to use a fully random 128-bit key that
101 | is not derived from a passphrase. This offers some security advantages too,
102 | as the full 128-bit keyspace is used. Implementations are encouraged to make
103 | it possible to either provide a passphrase or a 128-bit string, but be
104 | careful that it is not possible to disambiguate between these two
105 | automatically!
106 |
107 | IPv4 algorithm
108 | ==============
109 | An IPv4 address is a 32 bit value, and to encrypt it to another IPv4 address
110 | we need a block cipher that is 32 bit native. A modern and suitable
111 | algorithm is '[ipcrypt](https://github.com/veorq/ipcrypt)' by [Jean-Philippe
112 | Aumasson](https://aumasson.jp/). ipcrypt was inspired by
113 | [SipHash](https://en.wikipedia.org/wiki/SipHash) (which was invented by
114 | Aumasson and Dan J. Bernstein).
115 |
116 | ipcrypt uses a 128 bit key, there is no padding, no cipher modes or anything
117 | else.
118 |
119 | Implementations:
120 |
121 | * [C](https://github.com/jedisct1/c-ipcrypt) by Frank Denis
122 | * [Go](https://github.com/veorq/ipcrypt) by Jean-Philippe Aumasson
123 | * [Python](https://github.com/veorq/ipcrypt) by Jean-Philippe Aumasson
124 | * [Rust](https://github.com/stbuehler/rust-ipcrypt) by Stefan Bühler
125 |
126 | Note that the (combined) Python and Go repository also includes command line
127 | tools.
128 |
129 | Test vectors using the derived key "some 16-byte key" (minus the quotes):
130 |
131 | * 127.0.0.1 -> 114.62.227.59
132 | * 8.8.8.8 -> 46.48.51.50
133 | * 1.2.3.4 -> 171.238.15.199
134 |
135 | Using the following key in hex: 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
136 | 10
137 |
138 | * Start with IP address 192.168.69.42 and encrypt it 100 million times ->
139 | 93.155.197.186 (so keep on encrypting the encrypted address)
140 |
141 | Using the password "crypto is not a coin":
142 |
143 | * 198.41.0.4 -> 139.111.117.167
144 | * 130.161.180.1 -> 66.235.221.231
145 | * 0.0.0.0 -> 203.253.152.187
146 |
147 | Note that this password needs to be used to derive the actual key first.
148 |
149 | IPv6 algorithm
150 | ==============
151 | IPv6 addresses are 128 bits, and there is a wealth of suitable algorithms
152 | available. AES-128 is robust and widely available, and more than good
153 | enough.
154 |
155 | AES is typically deployed in a mode like Cipher Block Chaining, but no such
156 | mode is required to encrypt IP addresses. A straight AES operation is used,
157 | with no further XORing, as in Electronic Code Book "mode".
158 |
159 | AES is almost always already available. To get a raw AES-128 encryption
160 | operation out of OpenSSL or its variants:
161 |
162 | ```
163 | AES_KEY wctx;
164 | AES_set_encrypt_key(key, 128, &wctx);
165 | AES_encrypt((const unsigned char*)&ca.sin6.sin6_addr.s6_addr,
166 | (unsigned char*)&ret.sin6.sin6_addr.s6_addr, &wctx);
167 | ```
168 |
169 | Decryption is the same, with the obvious s/encrypt/decrypt/ change.
170 |
171 | There is as yet no command line tool that performs these operations,
172 | although PowerDNS `pdnsutil` will feature this in the 4.2 release.
173 |
174 | Test vectors using the key "some 16-byte key":
175 |
176 | * ::1 -> 3718:8853:1723:6c88:7e5f:2e60:c79a:2bf
177 | * 2001:503:ba3e::2:30 -> 64d2:883d:ffb5:dd79:24b:943c:22aa:4ae7
178 | * 2001:DB8:: -> ce7e:7e39:d282:e7b1:1d6d:5ca1:d4de:246f
179 |
180 | Using the password "crypto is not a coin":
181 |
182 | * ::1 -> a551:9cb0:c9b:f6e1:6112:58a:af29:3a6c
183 | * 2001:503:ba3e::2:30 -> 6e60:2674:2fac:d383:f9d5:dcfe:fc53:328e
184 | * 2001:DB8:: -> a8f5:16c8:e2ea:23b9:748d:67a2:4107:9d2e
185 |
186 | Note that this password needs to be used to derive the key first.
187 |
188 |
190 |
191 |
--------------------------------------------------------------------------------