├── requirements.txt ├── .gitignore ├── LICENSE.md ├── util.py ├── README.md └── collide.py /requirements.txt: -------------------------------------------------------------------------------- 1 | Pillow 2 | numpy 3 | onnx 4 | onnx_tf 5 | scipy 6 | tensorflow 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | NeuralHashv3b-current.espresso.* 2 | model.onnx 3 | neuralhash_128x96_seed1.dat 4 | *.png 5 | *.jpg 6 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | ===================== 3 | 4 | **Copyright (c) Anish Athalye (me@anishathalye.com)** 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy of 7 | this software and associated documentation files (the "Software"), to deal in 8 | the Software without restriction, including without limitation the rights to 9 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 10 | of the Software, and to permit persons to whom the Software is furnished to do 11 | so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22 | SOFTWARE. 23 | -------------------------------------------------------------------------------- /util.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Anish Athalye. Released under the MIT license. 2 | 3 | import numpy as np 4 | import onnx 5 | from onnx_tf.backend import prepare 6 | from PIL import Image 7 | 8 | 9 | def load_model(path): 10 | onnx_model = onnx.load(path) 11 | model = prepare(onnx_model, training_mode=True) 12 | return model 13 | 14 | 15 | def load_seed(path): 16 | seed = open(path, 'rb').read()[128:] 17 | seed = np.frombuffer(seed, dtype=np.float32) 18 | seed = seed.reshape([96, 128]) 19 | return seed 20 | 21 | 22 | def load_image(path): 23 | im = Image.open(path).convert('RGB') 24 | im = im.resize([360, 360]) 25 | arr = np.array(im).astype(np.float32) / 255.0 26 | arr = arr * 2.0 - 1.0 27 | arr = arr.transpose(2, 0, 1).reshape([1, 3, 360, 360]) 28 | return arr 29 | 30 | 31 | def save_image(arr, path): 32 | arr = arr.reshape([3, 360, 360]).transpose(1, 2, 0) 33 | arr = (arr + 1.0) * (255.0 / 2.0) 34 | arr = arr.astype(np.uint8) 35 | im = Image.fromarray(arr) 36 | im.save(path) 37 | 38 | 39 | def hash_from_hex(hex_repr): 40 | n = int(hex_repr, 16) 41 | h = np.zeros(96) 42 | for i in range(96): 43 | h[i] = (n >> (95 - i)) & 1 44 | return h 45 | 46 | 47 | def hash_to_hex(h): 48 | bits = ''.join(['1' if i >= 0.5 else '0' for i in h]) 49 | return '{:0{}x}'.format(int(bits, 2), len(bits) // 4) 50 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # NeuralHash Collider 2 | 3 | Find target [hash collisions] for Apple's [NeuralHash] perceptual hash function. 4 | 5 | For example, starting from a picture of [this 6 | cat](https://github.com/anishathalye/neural-hash-collider/raw/assets/cat.jpg), 7 | we can find an adversarial image that has the same hash as the 8 | [picture](https://user-images.githubusercontent.com/1328/129860794-e7eb0132-d929-4c9d-b92e-4e4faba9e849.png) 9 | of the dog in [this post][hash collisions]: 10 | 11 | ```bash 12 | python collide.py --image cat.jpg --target 59a34eabe31910abfb06f308 13 | ``` 14 | 15 | ![Cat image with NeuralHash 59a34eabe31910abfb06f308](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/cat-adv.png) ![Dog image with NeuralHash 59a34eabe31910abfb06f308](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/dog.png) 16 | 17 | We can confirm the hash collision using `nnhash.py` from 18 | [AsuharietYgvar/AppleNeuralHash2ONNX]: 19 | 20 | ```console 21 | $ python nnhash.py dog.png 22 | 59a34eabe31910abfb06f308 23 | $ python nnhash.py adv.png 24 | 59a34eabe31910abfb06f308 25 | ``` 26 | 27 | [hash collisions]: https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1 28 | [NeuralHash]: https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_Summary.pdf 29 | 30 | ## How it works 31 | 32 | NeuralHash is a [perceptual hash 33 | function](https://en.wikipedia.org/wiki/Perceptual_hashing) that uses a neural 34 | network. Images are resized to 360x360 and passed through a neural network to 35 | produce a 128-dimensional feature vector. Then, the vector is projected onto 36 | R^96 using a 128x96 "seed" matrix. Finally, to produce a 96-bit hash, the 37 | 96-dimensional vector is thresholded: negative entries turn into a `0` bit, and 38 | non-negative entries turn into a `1` bit. 39 | 40 | This entire process, except for the thresholding, is differentiable, so we can 41 | use gradient descent to find hash collisions. This is a well-known property of 42 | neural networks, that they are vulnerable to [adversarial 43 | examples](https://arxiv.org/abs/1312.6199). 44 | 45 | We can define a loss that captures how close an image is to a given target 46 | hash: this loss is basically just the NeuralHash algorithm as described above, 47 | but with the final "hard" thresholding step tweaked so that it is "soft" (in 48 | particular, differentiable). Exactly how this is done (choices of activation 49 | functions, parameters, etc.) can affect convergence, so it can require some 50 | experimentation. After choosing the loss function, we can follow the standard 51 | method to find adversarial examples for neural networks: gradient descent. 52 | 53 | ### Details 54 | 55 | The implementation currently does an alternating projections style attack to 56 | find an adversarial example that has the intended hash and also looks similar 57 | to the original. See `collide.py` for the full details. The implementation uses 58 | two different loss functions: one measures the distance to the target hash, and 59 | the other measures the quality of the perturbation (l2 norm + total variation). 60 | We first optimize for a collision, focusing only on matching the target hash. 61 | Once we find a projection, we alternate between minimizing the perturbation and 62 | ensuring that the hash value does not change. The attack has a number of 63 | parameters; run `python collide.py --help` or refer to the code for a full 64 | list. Tweaking these parameters can make a big difference in convergence time 65 | and the quality of the output. 66 | 67 | The implementation also supports a flag `--blur [sigma]` that blurs the 68 | perturbation on every step of the search. This can slow down or break 69 | convergence, but on some examples, it can be helpful for getting results that 70 | look more natural and less like glitch art. 71 | 72 | ## Examples 73 | 74 | Reproducing the [Lena](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/lena.png)/[Barbara](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/barbara.png) result from [this post](https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issues/1#issuecomment-903094036): 75 | 76 | 77 | 78 | The first image above is the original Lena image. The second was produced with `--target a426dae78cc63799d01adc32` to collide with Barbara. The third was produced with the additional argument `--blur 1.0`. The fourth is the original Barbara image. Checking their hashes: 79 | 80 | ```console 81 | $ python nnhash.py lena.png 82 | 32dac883f7b91bbf45a48296 83 | $ python nnhash.py lena-adv.png 84 | a426dae78cc63799d01adc32 85 | $ python nnhash.py lena-adv-blur-1.0.png 86 | a426dae78cc63799d01adc32 87 | $ python nnhash.py barbara.png 88 | a426dae78cc63799d01adc32 89 | ``` 90 | 91 | Reproducing the [Picard](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/picard.png)/[Sidious](https://raw.githubusercontent.com/anishathalye/assets/master/neural-hash-collider/sidious.png) result from [this post](https://github.com/anishathalye/neural-hash-collider/issues/4): 92 | 93 | 94 | 95 | The first image above is the original Picard image. The second was produced with `--target e34b3da852103c3c0828fbd1 --tv-weight 3e-4` to collide with Sidious. The third was produced with the additional argument `--blur 0.5`. The fourth is the original Sidious image. Checking their hashes: 96 | 97 | ```console 98 | $ python nnhash.py picard.png 99 | 73fae120ad3191075efd5580 100 | $ python nnhash.py picard-adv.png 101 | e34b2da852103c3c0828fbd1 102 | $ python nnhash.py picard-adv-blur-0.5.png 103 | e34b2da852103c3c0828fbd1 104 | $ python nnhash.py sidious.png 105 | e34b2da852103c3c0828fbd1 106 | ``` 107 | 108 | ## Prerequisites 109 | 110 | - Get Apple's NeuralHash model following the instructions in 111 | [AsuharietYgvar/AppleNeuralHash2ONNX] and either put all the 112 | files in this directory or supply the `--model` / `--seed` arguments 113 | - Install Python dependencies: `pip install -r requirements.txt` 114 | 115 | [AsuharietYgvar/AppleNeuralHash2ONNX]: https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX 116 | 117 | ## Usage 118 | 119 | Run `python collide.py --image [path to image] --target [target hash]` to 120 | generate a hash collision. Run `python collide.py --help` to see all the 121 | options, including some knobs you can tweak, like the learning rate and some 122 | other parameters. 123 | 124 | ## Limitations 125 | 126 | The code in this repository is intended to be a demonstration, and perhaps a 127 | starting point for other exploration. Tweaking the implementation (choice of 128 | loss function, choice of parameters, etc.) might produce much better results 129 | than this code currently achieves. 130 | 131 | ## Citation 132 | 133 | ```bibtex 134 | @misc{athalye2021neuralhashcollider, 135 | author = {Anish Athalye}, 136 | title = {NeuralHash Collider}, 137 | year = {2021}, 138 | howpublished = {\url{https://github.com/anishathalye/neural-hash-collider}}, 139 | } 140 | ``` 141 | -------------------------------------------------------------------------------- /collide.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Anish Athalye. Released under the MIT license. 2 | 3 | import numpy as np 4 | import tensorflow as tf 5 | from scipy.ndimage.filters import gaussian_filter 6 | import argparse 7 | import os 8 | 9 | from util import * 10 | 11 | 12 | DEFAULT_MODEL_PATH = 'model.onnx' 13 | DEFAULT_SEED_PATH = 'neuralhash_128x96_seed1.dat' 14 | DEFAULT_TARGET_HASH = '59a34eabe31910abfb06f308' 15 | DEFAULT_ITERATIONS = 10000 16 | DEFAULT_SAVE_ITERATIONS = 0 17 | DEFAULT_LR = 2.0 18 | DEFAULT_COMBINED_THRESHOLD = 2 19 | DEFAULT_K = 10.0 20 | DEFAULT_CLIP_RANGE = 0.1 21 | DEFAULT_W_L2 = 2e-3 22 | DEFAULT_W_TV = 1e-4 23 | DEFAULT_W_HASH = 0.8 24 | DEFAULT_BLUR = 0 25 | 26 | 27 | def main(): 28 | tf.compat.v1.disable_eager_execution() 29 | options = get_options() 30 | 31 | model = load_model(options.model) 32 | image = model.tensor_dict['image'] 33 | logits = model.tensor_dict['leaf/logits'] 34 | seed = load_seed(options.seed) 35 | 36 | target = hash_from_hex(options.target) 37 | 38 | original = load_image(options.image) 39 | h = hash_from_hex(options.target) 40 | 41 | with model.graph.as_default(): 42 | with tf.compat.v1.Session() as sess: 43 | sess.run(tf.compat.v1.global_variables_initializer()) 44 | 45 | proj = tf.reshape(tf.linalg.matmul(seed, tf.reshape(logits, (128, 1))), (96,)) 46 | # proj is in R^96; it's interpreted as a 96-bit hash by mapping 47 | # entries < 0 to the bit '0', and entries >= 0 to the bit '1' 48 | normalized, _ = tf.linalg.normalize(proj) 49 | hash_output = tf.sigmoid(normalized * options.k) 50 | # now, hash_output has entries in (0, 1); it's interpreted by 51 | # mapping entries < 0.5 to the bit '0' and entries >= 0.5 to the 52 | # bit '1' 53 | 54 | # we clip hash_output to (clip_range, 1-clip_range); this seems to 55 | # improve the search (we don't "waste" perturbation tweaking 56 | # "strong" bits); the sigmoid already does this to some degree, but 57 | # this seems to help 58 | hash_output = tf.clip_by_value(hash_output, options.clip_range, 1.0 - options.clip_range) - 0.5 59 | hash_output = hash_output * (0.5 / (0.5 - options.clip_range)) 60 | hash_output = hash_output + 0.5 61 | 62 | # hash loss: how far away we are from the target hash 63 | hash_loss = tf.math.reduce_sum(tf.math.squared_difference(hash_output, h)) 64 | 65 | perturbation = image - original 66 | # image loss: how big / noticeable is the perturbation? 67 | img_loss = options.l2_weight * tf.nn.l2_loss(perturbation) + options.tv_weight * tf.image.total_variation(perturbation)[0] 68 | 69 | # combined loss: try to minimize both at once 70 | combined_loss = options.hash_weight * hash_loss + (1 - options.hash_weight) * img_loss 71 | 72 | # gradients of all the losses 73 | g_hash_loss, = tf.gradients(hash_loss, image) 74 | g_img_loss, = tf.gradients(img_loss, image) 75 | g_combined_loss, = tf.gradients(combined_loss, image) 76 | 77 | # perform attack 78 | 79 | x = original 80 | best = (float('inf'), 0) # (distance, image quality loss) 81 | dist = float('inf') 82 | 83 | for i in range(options.iterations): 84 | # we do an alternating projections style attack here; if we 85 | # haven't found a colliding image yet, only optimize for that; 86 | # if we have a colliding image, then minimize the size of the 87 | # perturbation; if we're close, then do both at once 88 | if dist == 0: 89 | loss_name, loss, g = 'image', img_loss, g_img_loss 90 | elif best[0] == 0 and dist <= options.combined_threshold: 91 | loss_name, loss, g = 'combined', combined_loss, g_combined_loss 92 | else: 93 | loss_name, loss, g = 'hash', hash_loss, g_hash_loss 94 | 95 | # compute loss values and gradient 96 | xq = quantize(x) # take derivatives wrt the quantized version of the image 97 | hash_output_v, img_loss_v, loss_v, g_v = sess.run([hash_output, img_loss, loss, g], feed_dict={image: xq}) 98 | dist = np.sum((hash_output_v >= 0.5) != (h >= 0.5)) 99 | 100 | # if it's better than any image found so far, save it 101 | score = (dist, img_loss_v) 102 | if score < best or (options.save_iterations > 0 and (i+1) % options.save_iterations == 0): 103 | save_image(x, os.path.join(options.save_directory, 'out_iter={:05d}_dist={:02d}_q={:.3f}.png'.format(i+1, dist, img_loss_v))) 104 | if score < best: 105 | best = score 106 | 107 | # gradient descent step 108 | g_v_norm = g_v / np.linalg.norm(g_v) 109 | x = x - options.learning_rate * g_v_norm 110 | if options.blur > 0: 111 | x = blur_perturbation(original, x, options.blur) 112 | x = x.clip(-1, 1) 113 | print('iteration: {}/{}, best: ({}, {:.3f}), hash: {}, distance: {}, loss: {:.3f} ({})'.format( 114 | i+1, 115 | options.iterations, 116 | best[0], 117 | best[1], 118 | hash_to_hex(hash_output_v), 119 | dist, 120 | loss_v, 121 | loss_name 122 | )) 123 | 124 | 125 | def quantize(x): 126 | x = (x + 1.0) * (255.0 / 2.0) 127 | x = x.astype(np.uint8).astype(np.float32) 128 | x = x / (255.0 / 2.0) - 1.0 129 | return x 130 | 131 | 132 | def blur_perturbation(original, x, sigma): 133 | perturbation = x - original 134 | perturbation = gaussian_filter_by_channel(perturbation, sigma=sigma) 135 | return original + perturbation 136 | 137 | 138 | def gaussian_filter_by_channel(x, sigma): 139 | return np.stack([gaussian_filter(x[0,ch,:,:], sigma) for ch in range(x.shape[1])])[np.newaxis] 140 | 141 | 142 | def get_options(): 143 | parser = argparse.ArgumentParser() 144 | parser.add_argument('--image', type=str, help='path to starting image', required=True) 145 | parser.add_argument('--model', type=str, help='path to model', default=DEFAULT_MODEL_PATH) 146 | parser.add_argument('--seed', type=str, help='path to seed', default=DEFAULT_SEED_PATH) 147 | parser.add_argument('--target', type=str, help='target hash', default=DEFAULT_TARGET_HASH) 148 | parser.add_argument('--learning-rate', type=float, help='learning rate', default=DEFAULT_LR) 149 | parser.add_argument('--combined-threshold', type=int, help='threshold to start using combined loss', default=DEFAULT_COMBINED_THRESHOLD) 150 | parser.add_argument('--k', type=float, help='k parameter', default=DEFAULT_K) 151 | parser.add_argument('--l2-weight', type=float, help='perturbation l2 loss weight', default=DEFAULT_W_L2) 152 | parser.add_argument('--tv-weight', type=float, help='perturbation total variation loss weight', default=DEFAULT_W_TV) 153 | parser.add_argument('--hash-weight', type=float, help='relative weight (0.0 to 1.0) of hash in combined loss', default=DEFAULT_W_HASH) 154 | parser.add_argument('--clip-range', type=float, help='clip range parameter', default=DEFAULT_CLIP_RANGE) 155 | parser.add_argument('--iterations', type=int, help='max number of iterations', default=DEFAULT_ITERATIONS) 156 | parser.add_argument('--save-directory', type=str, help='directory to save output images', default='.') 157 | parser.add_argument('--save-iterations', type=int, help='save this frequently, regardless of improvement', default=DEFAULT_SAVE_ITERATIONS) 158 | parser.add_argument('--blur', type=float, help='apply Gaussian blur with this sigma on every step', default=DEFAULT_BLUR) 159 | return parser.parse_args() 160 | 161 | 162 | if __name__ == '__main__': 163 | main() 164 | --------------------------------------------------------------------------------