├── requirements.txt
├── .gitignore
├── LICENSE.md
├── util.py
├── README.md
└── collide.py
/requirements.txt:
--------------------------------------------------------------------------------
1 | Pillow
2 | numpy
3 | onnx
4 | onnx_tf
5 | scipy
6 | tensorflow
7 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | NeuralHashv3b-current.espresso.*
2 | model.onnx
3 | neuralhash_128x96_seed1.dat
4 | *.png
5 | *.jpg
6 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 | =====================
3 |
4 | **Copyright (c) Anish Athalye (me@anishathalye.com)**
5 |
6 | Permission is hereby granted, free of charge, to any person obtaining a copy of
7 | this software and associated documentation files (the "Software"), to deal in
8 | the Software without restriction, including without limitation the rights to
9 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
10 | of the Software, and to permit persons to whom the Software is furnished to do
11 | so, subject to the following conditions:
12 |
13 | The above copyright notice and this permission notice shall be included in all
14 | copies or substantial portions of the Software.
15 |
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22 | SOFTWARE.
23 |
--------------------------------------------------------------------------------
/util.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Anish Athalye. Released under the MIT license.
2 |
3 | import numpy as np
4 | import onnx
5 | from onnx_tf.backend import prepare
6 | from PIL import Image
7 |
8 |
9 | def load_model(path):
10 | onnx_model = onnx.load(path)
11 | model = prepare(onnx_model, training_mode=True)
12 | return model
13 |
14 |
15 | def load_seed(path):
16 | seed = open(path, 'rb').read()[128:]
17 | seed = np.frombuffer(seed, dtype=np.float32)
18 | seed = seed.reshape([96, 128])
19 | return seed
20 |
21 |
22 | def load_image(path):
23 | im = Image.open(path).convert('RGB')
24 | im = im.resize([360, 360])
25 | arr = np.array(im).astype(np.float32) / 255.0
26 | arr = arr * 2.0 - 1.0
27 | arr = arr.transpose(2, 0, 1).reshape([1, 3, 360, 360])
28 | return arr
29 |
30 |
31 | def save_image(arr, path):
32 | arr = arr.reshape([3, 360, 360]).transpose(1, 2, 0)
33 | arr = (arr + 1.0) * (255.0 / 2.0)
34 | arr = arr.astype(np.uint8)
35 | im = Image.fromarray(arr)
36 | im.save(path)
37 |
38 |
39 | def hash_from_hex(hex_repr):
40 | n = int(hex_repr, 16)
41 | h = np.zeros(96)
42 | for i in range(96):
43 | h[i] = (n >> (95 - i)) & 1
44 | return h
45 |
46 |
47 | def hash_to_hex(h):
48 | bits = ''.join(['1' if i >= 0.5 else '0' for i in h])
49 | return '{:0{}x}'.format(int(bits, 2), len(bits) // 4)
50 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # NeuralHash Collider
2 |
3 | Find target [hash collisions] for Apple's [NeuralHash] perceptual hash function.
4 |
5 | For example, starting from a picture of [this
6 | cat](https://github.com/anishathalye/neural-hash-collider/raw/assets/cat.jpg),
7 | we can find an adversarial image that has the same hash as the
8 | [picture](https://user-images.githubusercontent.com/1328/129860794-e7eb0132-d929-4c9d-b92e-4e4faba9e849.png)
9 | of the dog in [this post][hash collisions]:
10 |
11 | ```bash
12 | python collide.py --image cat.jpg --target 59a34eabe31910abfb06f308
13 | ```
14 |
15 |  
16 |
17 | We can confirm the hash collision using `nnhash.py` from
18 | [AsuharietYgvar/AppleNeuralHash2ONNX]:
19 |
20 | ```console
21 | $ python nnhash.py dog.png
22 | 59a34eabe31910abfb06f308
23 | $ python nnhash.py adv.png
24 | 59a34eabe31910abfb06f308
25 | ```
26 |
27 | [hash collisions]: https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1
28 | [NeuralHash]: https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_Summary.pdf
29 |
30 | ## How it works
31 |
32 | NeuralHash is a [perceptual hash
33 | function](https://en.wikipedia.org/wiki/Perceptual_hashing) that uses a neural
34 | network. Images are resized to 360x360 and passed through a neural network to
35 | produce a 128-dimensional feature vector. Then, the vector is projected onto
36 | R^96 using a 128x96 "seed" matrix. Finally, to produce a 96-bit hash, the
37 | 96-dimensional vector is thresholded: negative entries turn into a `0` bit, and
38 | non-negative entries turn into a `1` bit.
39 |
40 | This entire process, except for the thresholding, is differentiable, so we can
41 | use gradient descent to find hash collisions. This is a well-known property of
42 | neural networks, that they are vulnerable to [adversarial
43 | examples](https://arxiv.org/abs/1312.6199).
44 |
45 | We can define a loss that captures how close an image is to a given target
46 | hash: this loss is basically just the NeuralHash algorithm as described above,
47 | but with the final "hard" thresholding step tweaked so that it is "soft" (in
48 | particular, differentiable). Exactly how this is done (choices of activation
49 | functions, parameters, etc.) can affect convergence, so it can require some
50 | experimentation. After choosing the loss function, we can follow the standard
51 | method to find adversarial examples for neural networks: gradient descent.
52 |
53 | ### Details
54 |
55 | The implementation currently does an alternating projections style attack to
56 | find an adversarial example that has the intended hash and also looks similar
57 | to the original. See `collide.py` for the full details. The implementation uses
58 | two different loss functions: one measures the distance to the target hash, and
59 | the other measures the quality of the perturbation (l2 norm + total variation).
60 | We first optimize for a collision, focusing only on matching the target hash.
61 | Once we find a projection, we alternate between minimizing the perturbation and
62 | ensuring that the hash value does not change. The attack has a number of
63 | parameters; run `python collide.py --help` or refer to the code for a full
64 | list. Tweaking these parameters can make a big difference in convergence time
65 | and the quality of the output.
66 |
67 | The implementation also supports a flag `--blur [sigma]` that blurs the
68 | perturbation on every step of the search. This can slow down or break
69 | convergence, but on some examples, it can be helpful for getting results that
70 | look more natural and less like glitch art.
71 |
72 | ## Examples
73 |
74 | Reproducing the [Lena](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/lena.png)/[Barbara](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/barbara.png) result from [this post](https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1#issuecomment-903094036):
75 |
76 |
77 |
78 | The first image above is the original Lena image. The second was produced with `--target a426dae78cc63799d01adc32` to collide with Barbara. The third was produced with the additional argument `--blur 1.0`. The fourth is the original Barbara image. Checking their hashes:
79 |
80 | ```console
81 | $ python nnhash.py lena.png
82 | 32dac883f7b91bbf45a48296
83 | $ python nnhash.py lena-adv.png
84 | a426dae78cc63799d01adc32
85 | $ python nnhash.py lena-adv-blur-1.0.png
86 | a426dae78cc63799d01adc32
87 | $ python nnhash.py barbara.png
88 | a426dae78cc63799d01adc32
89 | ```
90 |
91 | Reproducing the [Picard](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/picard.png)/[Sidious](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/sidious.png) result from [this post](https://github.com/anishathalye/neural-hash-collider/issues/4):
92 |
93 |
94 |
95 | The first image above is the original Picard image. The second was produced with `--target e34b3da852103c3c0828fbd1 --tv-weight 3e-4` to collide with Sidious. The third was produced with the additional argument `--blur 0.5`. The fourth is the original Sidious image. Checking their hashes:
96 |
97 | ```console
98 | $ python nnhash.py picard.png
99 | 73fae120ad3191075efd5580
100 | $ python nnhash.py picard-adv.png
101 | e34b2da852103c3c0828fbd1
102 | $ python nnhash.py picard-adv-blur-0.5.png
103 | e34b2da852103c3c0828fbd1
104 | $ python nnhash.py sidious.png
105 | e34b2da852103c3c0828fbd1
106 | ```
107 |
108 | ## Prerequisites
109 |
110 | - Get Apple's NeuralHash model following the instructions in
111 | [AsuharietYgvar/AppleNeuralHash2ONNX] and either put all the
112 | files in this directory or supply the `--model` / `--seed` arguments
113 | - Install Python dependencies: `pip install -r requirements.txt`
114 |
115 | [AsuharietYgvar/AppleNeuralHash2ONNX]: https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX
116 |
117 | ## Usage
118 |
119 | Run `python collide.py --image [path to image] --target [target hash]` to
120 | generate a hash collision. Run `python collide.py --help` to see all the
121 | options, including some knobs you can tweak, like the learning rate and some
122 | other parameters.
123 |
124 | ## Limitations
125 |
126 | The code in this repository is intended to be a demonstration, and perhaps a
127 | starting point for other exploration. Tweaking the implementation (choice of
128 | loss function, choice of parameters, etc.) might produce much better results
129 | than this code currently achieves.
130 |
131 | ## Citation
132 |
133 | ```bibtex
134 | @misc{athalye2021neuralhashcollider,
135 | author = {Anish Athalye},
136 | title = {NeuralHash Collider},
137 | year = {2021},
138 | howpublished = {\url{https://github.com/anishathalye/neural-hash-collider}},
139 | }
140 | ```
141 |
--------------------------------------------------------------------------------
/collide.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Anish Athalye. Released under the MIT license.
2 |
3 | import numpy as np
4 | import tensorflow as tf
5 | from scipy.ndimage.filters import gaussian_filter
6 | import argparse
7 | import os
8 |
9 | from util import *
10 |
11 |
12 | DEFAULT_MODEL_PATH = 'model.onnx'
13 | DEFAULT_SEED_PATH = 'neuralhash_128x96_seed1.dat'
14 | DEFAULT_TARGET_HASH = '59a34eabe31910abfb06f308'
15 | DEFAULT_ITERATIONS = 10000
16 | DEFAULT_SAVE_ITERATIONS = 0
17 | DEFAULT_LR = 2.0
18 | DEFAULT_COMBINED_THRESHOLD = 2
19 | DEFAULT_K = 10.0
20 | DEFAULT_CLIP_RANGE = 0.1
21 | DEFAULT_W_L2 = 2e-3
22 | DEFAULT_W_TV = 1e-4
23 | DEFAULT_W_HASH = 0.8
24 | DEFAULT_BLUR = 0
25 |
26 |
27 | def main():
28 | tf.compat.v1.disable_eager_execution()
29 | options = get_options()
30 |
31 | model = load_model(options.model)
32 | image = model.tensor_dict['image']
33 | logits = model.tensor_dict['leaf/logits']
34 | seed = load_seed(options.seed)
35 |
36 | target = hash_from_hex(options.target)
37 |
38 | original = load_image(options.image)
39 | h = hash_from_hex(options.target)
40 |
41 | with model.graph.as_default():
42 | with tf.compat.v1.Session() as sess:
43 | sess.run(tf.compat.v1.global_variables_initializer())
44 |
45 | proj = tf.reshape(tf.linalg.matmul(seed, tf.reshape(logits, (128, 1))), (96,))
46 | # proj is in R^96; it's interpreted as a 96-bit hash by mapping
47 | # entries < 0 to the bit '0', and entries >= 0 to the bit '1'
48 | normalized, _ = tf.linalg.normalize(proj)
49 | hash_output = tf.sigmoid(normalized * options.k)
50 | # now, hash_output has entries in (0, 1); it's interpreted by
51 | # mapping entries < 0.5 to the bit '0' and entries >= 0.5 to the
52 | # bit '1'
53 |
54 | # we clip hash_output to (clip_range, 1-clip_range); this seems to
55 | # improve the search (we don't "waste" perturbation tweaking
56 | # "strong" bits); the sigmoid already does this to some degree, but
57 | # this seems to help
58 | hash_output = tf.clip_by_value(hash_output, options.clip_range, 1.0 - options.clip_range) - 0.5
59 | hash_output = hash_output * (0.5 / (0.5 - options.clip_range))
60 | hash_output = hash_output + 0.5
61 |
62 | # hash loss: how far away we are from the target hash
63 | hash_loss = tf.math.reduce_sum(tf.math.squared_difference(hash_output, h))
64 |
65 | perturbation = image - original
66 | # image loss: how big / noticeable is the perturbation?
67 | img_loss = options.l2_weight * tf.nn.l2_loss(perturbation) + options.tv_weight * tf.image.total_variation(perturbation)[0]
68 |
69 | # combined loss: try to minimize both at once
70 | combined_loss = options.hash_weight * hash_loss + (1 - options.hash_weight) * img_loss
71 |
72 | # gradients of all the losses
73 | g_hash_loss, = tf.gradients(hash_loss, image)
74 | g_img_loss, = tf.gradients(img_loss, image)
75 | g_combined_loss, = tf.gradients(combined_loss, image)
76 |
77 | # perform attack
78 |
79 | x = original
80 | best = (float('inf'), 0) # (distance, image quality loss)
81 | dist = float('inf')
82 |
83 | for i in range(options.iterations):
84 | # we do an alternating projections style attack here; if we
85 | # haven't found a colliding image yet, only optimize for that;
86 | # if we have a colliding image, then minimize the size of the
87 | # perturbation; if we're close, then do both at once
88 | if dist == 0:
89 | loss_name, loss, g = 'image', img_loss, g_img_loss
90 | elif best[0] == 0 and dist <= options.combined_threshold:
91 | loss_name, loss, g = 'combined', combined_loss, g_combined_loss
92 | else:
93 | loss_name, loss, g = 'hash', hash_loss, g_hash_loss
94 |
95 | # compute loss values and gradient
96 | xq = quantize(x) # take derivatives wrt the quantized version of the image
97 | hash_output_v, img_loss_v, loss_v, g_v = sess.run([hash_output, img_loss, loss, g], feed_dict={image: xq})
98 | dist = np.sum((hash_output_v >= 0.5) != (h >= 0.5))
99 |
100 | # if it's better than any image found so far, save it
101 | score = (dist, img_loss_v)
102 | if score < best or (options.save_iterations > 0 and (i+1) % options.save_iterations == 0):
103 | save_image(x, os.path.join(options.save_directory, 'out_iter={:05d}_dist={:02d}_q={:.3f}.png'.format(i+1, dist, img_loss_v)))
104 | if score < best:
105 | best = score
106 |
107 | # gradient descent step
108 | g_v_norm = g_v / np.linalg.norm(g_v)
109 | x = x - options.learning_rate * g_v_norm
110 | if options.blur > 0:
111 | x = blur_perturbation(original, x, options.blur)
112 | x = x.clip(-1, 1)
113 | print('iteration: {}/{}, best: ({}, {:.3f}), hash: {}, distance: {}, loss: {:.3f} ({})'.format(
114 | i+1,
115 | options.iterations,
116 | best[0],
117 | best[1],
118 | hash_to_hex(hash_output_v),
119 | dist,
120 | loss_v,
121 | loss_name
122 | ))
123 |
124 |
125 | def quantize(x):
126 | x = (x + 1.0) * (255.0 / 2.0)
127 | x = x.astype(np.uint8).astype(np.float32)
128 | x = x / (255.0 / 2.0) - 1.0
129 | return x
130 |
131 |
132 | def blur_perturbation(original, x, sigma):
133 | perturbation = x - original
134 | perturbation = gaussian_filter_by_channel(perturbation, sigma=sigma)
135 | return original + perturbation
136 |
137 |
138 | def gaussian_filter_by_channel(x, sigma):
139 | return np.stack([gaussian_filter(x[0,ch,:,:], sigma) for ch in range(x.shape[1])])[np.newaxis]
140 |
141 |
142 | def get_options():
143 | parser = argparse.ArgumentParser()
144 | parser.add_argument('--image', type=str, help='path to starting image', required=True)
145 | parser.add_argument('--model', type=str, help='path to model', default=DEFAULT_MODEL_PATH)
146 | parser.add_argument('--seed', type=str, help='path to seed', default=DEFAULT_SEED_PATH)
147 | parser.add_argument('--target', type=str, help='target hash', default=DEFAULT_TARGET_HASH)
148 | parser.add_argument('--learning-rate', type=float, help='learning rate', default=DEFAULT_LR)
149 | parser.add_argument('--combined-threshold', type=int, help='threshold to start using combined loss', default=DEFAULT_COMBINED_THRESHOLD)
150 | parser.add_argument('--k', type=float, help='k parameter', default=DEFAULT_K)
151 | parser.add_argument('--l2-weight', type=float, help='perturbation l2 loss weight', default=DEFAULT_W_L2)
152 | parser.add_argument('--tv-weight', type=float, help='perturbation total variation loss weight', default=DEFAULT_W_TV)
153 | parser.add_argument('--hash-weight', type=float, help='relative weight (0.0 to 1.0) of hash in combined loss', default=DEFAULT_W_HASH)
154 | parser.add_argument('--clip-range', type=float, help='clip range parameter', default=DEFAULT_CLIP_RANGE)
155 | parser.add_argument('--iterations', type=int, help='max number of iterations', default=DEFAULT_ITERATIONS)
156 | parser.add_argument('--save-directory', type=str, help='directory to save output images', default='.')
157 | parser.add_argument('--save-iterations', type=int, help='save this frequently, regardless of improvement', default=DEFAULT_SAVE_ITERATIONS)
158 | parser.add_argument('--blur', type=float, help='apply Gaussian blur with this sigma on every step', default=DEFAULT_BLUR)
159 | return parser.parse_args()
160 |
161 |
162 | if __name__ == '__main__':
163 | main()
164 |
--------------------------------------------------------------------------------