├── nnhash.py ├── README.md └── LICENSE /nnhash.py: -------------------------------------------------------------------------------- 1 | # Copyright 2021 Asuhariet Ygvar 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express 12 | # or implied. See the License for the specific language governing 13 | # permissions and limitations under the License. 14 | 15 | import sys 16 | import onnxruntime 17 | import numpy as np 18 | from PIL import Image 19 | 20 | # Load ONNX model 21 | session = onnxruntime.InferenceSession(sys.argv[1]) 22 | 23 | # Load output hash matrix 24 | seed1 = open(sys.argv[2], 'rb').read()[128:] 25 | seed1 = np.frombuffer(seed1, dtype=np.float32) 26 | seed1 = seed1.reshape([96, 128]) 27 | 28 | # Preprocess image 29 | image = Image.open(sys.argv[3]).convert('RGB') 30 | image = image.resize([360, 360]) 31 | arr = np.array(image).astype(np.float32) / 255.0 32 | arr = arr * 2.0 - 1.0 33 | arr = arr.transpose(2, 0, 1).reshape([1, 3, 360, 360]) 34 | 35 | # Run model 36 | inputs = {session.get_inputs()[0].name: arr} 37 | outs = session.run(None, inputs) 38 | 39 | # Convert model output to hex hash 40 | hash_output = seed1.dot(outs[0].flatten()) 41 | hash_bits = ''.join(['1' if it >= 0 else '0' for it in hash_output]) 42 | hash_hex = '{:0{}x}'.format(int(hash_bits, 2), len(hash_bits) // 4) 43 | 44 | print(hash_hex) 45 | 46 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AppleNeuralHash2ONNX 2 | 3 | Convert Apple NeuralHash model for [CSAM Detection](https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_Summary.pdf) to [ONNX](https://github.com/onnx/onnx). 4 | 5 | ## Intro 6 | 7 | Apple NeuralHash is a [perceptual hashing](https://en.wikipedia.org/wiki/Perceptual_hashing) method for images based on neural networks. It can tolerate image resize and compression. The steps of hashing is as the following: 8 | 1. Convert image to RGB. 9 | 2. Resize image to `360x360`. 10 | 3. Normalize RGB values to `[-1, 1]` range. 11 | 4. Perform inference on the NeuralHash model. 12 | 5. Calculate dot product of a `96x128` matrix with the resulting vector of 128 floats. 13 | 6. Apply [binary step](https://en.wikipedia.org/wiki/Heaviside_step_function) to the resulting 96 float vector. 14 | 7. Convert the vector of 1.0 and 0.0 to bits, resulting in 96-bit binary data. 15 | 16 | In this project, we convert Apple's NeuralHash model to ONNX format. A demo script for testing the model is also included. 17 | 18 | ## Prerequisite 19 | 20 | ### OS 21 | 22 | Both macOS and Linux will work. In the following sections Debian is used for Linux example. 23 | 24 | ### LZFSE decoder 25 | 26 | - macOS: Install by running `brew install lzfse`. 27 | - Linux: Build and install from [lzfse](https://github.com/lzfse/lzfse) source. 28 | 29 | ### Python 30 | 31 | Python 3.6 and above should work. Install the following dependencies: 32 | ```bash 33 | pip install onnx coremltools 34 | ``` 35 | 36 | ## Conversion Guide 37 | 38 | ### Step 1: Get NeuralHash model 39 | 40 | You will need 4 files from a recent macOS or iOS build: 41 | - neuralhash_128x96_seed1.dat 42 | - NeuralHashv3b-current.espresso.net 43 | - NeuralHashv3b-current.espresso.shape 44 | - NeuralHashv3b-current.espresso.weights 45 | 46 | **Option 1: From macOS or jailbroken iOS device (Recommended)** 47 | 48 | If you have a recent version of macOS (11.4+) or jailbroken iOS (14.7+) installed, simply grab these files from `/System/Library/Frameworks/Vision.framework/Resources/` (on macOS) or `/System/Library/Frameworks/Vision.framework/` (on iOS). 49 | 50 |
51 | Option 2: From iOS IPSW (click to reveal) 52 | 53 | 1. Download any `.ipsw` of a recent iOS build (14.7+) from [ipsw.me](https://ipsw.me/). 54 | 2. Unpack the file: 55 | ```bash 56 | cd /path/to/ipsw/file 57 | mkdir unpacked_ipsw 58 | cd unpacked_ipsw 59 | unzip ../*.ipsw 60 | ``` 61 | 3. Locate system image: 62 | ```bash 63 | ls -lh 64 | ``` 65 | What you need is the largest `.dmg` file, for example `018-63036-003.dmg`. 66 | 67 | 4. Mount system image. On macOS simply open the file in Finder. On Linux run the following commands: 68 | ```bash 69 | # Build and install apfs-fuse 70 | sudo apt install fuse libfuse3-dev bzip2 libbz2-dev cmake g++ git libattr1-dev zlib1g-dev 71 | git clone https://github.com/sgan81/apfs-fuse.git 72 | cd apfs-fuse 73 | git submodule init 74 | git submodule update 75 | mkdir build 76 | cd build 77 | cmake .. 78 | make 79 | sudo make install 80 | sudo ln -s /bin/fusermount /bin/fusermount3 81 | # Mount image 82 | mkdir rootfs 83 | apfs-fuse 018-63036-003.dmg rootfs 84 | ``` 85 | Required files are under `/System/Library/Frameworks/Vision.framework/` in mounted path. 86 | 87 |
88 | 89 | Put them under the same directory: 90 | ```bash 91 | mkdir NeuralHash 92 | cd NeuralHash 93 | cp /System/Library/Frameworks/Vision.framework/Resources/NeuralHashv3b-current.espresso.* . 94 | cp /System/Library/Frameworks/Vision.framework/Resources/neuralhash_128x96_seed1.dat . 95 | ``` 96 | 97 | ### Step 2: Decode model structure and shapes 98 | 99 | Normally compiled Core ML models store structure in `model.espresso.net` and shapes in `model.espresso.shape`, both in JSON. It's the same for NeuralHash model but compressed with [LZFSE](https://en.wikipedia.org/wiki/LZFSE). 100 | 101 | ```bash 102 | dd if=NeuralHashv3b-current.espresso.net bs=4 skip=7 | lzfse -decode -o model.espresso.net 103 | dd if=NeuralHashv3b-current.espresso.shape bs=4 skip=7 | lzfse -decode -o model.espresso.shape 104 | cp NeuralHashv3b-current.espresso.weights model.espresso.weights 105 | ``` 106 | 107 | ### Step 3: Convert model to ONNX 108 | 109 | ```bash 110 | cd .. 111 | git clone https://github.com/AsuharietYgvar/TNN.git 112 | cd TNN 113 | python3 tools/onnx2tnn/onnx-coreml/coreml2onnx.py ../NeuralHash 114 | ``` 115 | 116 | The resulting model is `NeuralHash/model.onnx`. 117 | 118 | ## Usage 119 | 120 | ### Inspect model 121 | 122 | [Netron](https://github.com/lutzroeder/netron) is a perfect tool for this purpose. 123 | 124 | ### Calculate neural hash with [onnxruntime](https://github.com/microsoft/onnxruntime) 125 | 126 | 1. Install required libraries: 127 | ```bash 128 | pip install onnxruntime pillow 129 | ``` 130 | 2. Run `nnhash.py` on an image: 131 | ```bash 132 | python3 nnhash.py /path/to/model.onnx /path/to/neuralhash_128x96_seed1.dat image.jpg 133 | ``` 134 | 135 | Example output: 136 | ``` 137 | ab14febaa837b6c1484c35e6 138 | ``` 139 | 140 | **Note:** Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors. 141 | 142 | |Device|Hash| 143 | |---|---| 144 | |iPad Pro 10.5-inch|`2b186faa6b36ffcc4c4635e1`| 145 | |M1 Mac|`2b5c6faa6bb7bdcc4c4731a1`| 146 | |iOS Simulator|`2b5c6faa6bb6bdcc4c4731a1`| 147 | |ONNX Runtime|`2b5c6faa6bb6bdcc4c4735a1`| 148 | 149 | ## Credits 150 | 151 | - [nhcalc](https://github.com/KhaosT/nhcalc) for uncovering NeuralHash private API. 152 | - [TNN](https://github.com/Tencent/TNN) for compiled Core ML to ONNX script. 153 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, and 10 | distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by the 13 | copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all other 16 | entities that control, are controlled by, or are under common control with 17 | that entity. For the purposes of this definition, "control" means (i) the 18 | power, direct or indirect, to cause the direction or management of such 19 | entity, whether by contract or otherwise, or (ii) ownership of 20 | fifty percent (50%) or more of the outstanding shares, or (iii) beneficial 21 | ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity exercising 24 | permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation source, 28 | and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical transformation 31 | or translation of a Source form, including but not limited to compiled 32 | object code, generated documentation, and conversions to 33 | other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or Object 36 | form, made available under the License, as indicated by a copyright notice 37 | that is included in or attached to the work (an example is provided in the 38 | Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object form, 41 | that is based on (or derived from) the Work and for which the editorial 42 | revisions, annotations, elaborations, or other modifications represent, 43 | as a whole, an original work of authorship. For the purposes of this 44 | License, Derivative Works shall not include works that remain separable 45 | from, or merely link (or bind by name) to the interfaces of, the Work and 46 | Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including the original 49 | version of the Work and any modifications or additions to that Work or 50 | Derivative Works thereof, that is intentionally submitted to Licensor for 51 | inclusion in the Work by the copyright owner or by an individual or 52 | Legal Entity authorized to submit on behalf of the copyright owner. 53 | For the purposes of this definition, "submitted" means any form of 54 | electronic, verbal, or written communication sent to the Licensor or its 55 | representatives, including but not limited to communication on electronic 56 | mailing lists, source code control systems, and issue tracking systems 57 | that are managed by, or on behalf of, the Licensor for the purpose of 58 | discussing and improving the Work, but excluding communication that is 59 | conspicuously marked or otherwise designated in writing by the copyright 60 | owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity on 63 | behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. 67 | 68 | Subject to the terms and conditions of this License, each Contributor 69 | hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, 70 | royalty-free, irrevocable copyright license to reproduce, prepare 71 | Derivative Works of, publicly display, publicly perform, sublicense, 72 | and distribute the Work and such Derivative Works in 73 | Source or Object form. 74 | 75 | 3. Grant of Patent License. 76 | 77 | Subject to the terms and conditions of this License, each Contributor 78 | hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, 79 | royalty-free, irrevocable (except as stated in this section) patent 80 | license to make, have made, use, offer to sell, sell, import, and 81 | otherwise transfer the Work, where such license applies only to those 82 | patent claims licensable by such Contributor that are necessarily 83 | infringed by their Contribution(s) alone or by combination of their 84 | Contribution(s) with the Work to which such Contribution(s) was submitted. 85 | If You institute patent litigation against any entity (including a 86 | cross-claim or counterclaim in a lawsuit) alleging that the Work or a 87 | Contribution incorporated within the Work constitutes direct or 88 | contributory patent infringement, then any patent licenses granted to 89 | You under this License for that Work shall terminate as of the date such 90 | litigation is filed. 91 | 92 | 4. Redistribution. 93 | 94 | You may reproduce and distribute copies of the Work or Derivative Works 95 | thereof in any medium, with or without modifications, and in Source or 96 | Object form, provided that You meet the following conditions: 97 | 98 | 1. You must give any other recipients of the Work or Derivative Works a 99 | copy of this License; and 100 | 101 | 2. You must cause any modified files to carry prominent notices stating 102 | that You changed the files; and 103 | 104 | 3. You must retain, in the Source form of any Derivative Works that You 105 | distribute, all copyright, patent, trademark, and attribution notices from 106 | the Source form of the Work, excluding those notices that do not pertain 107 | to any part of the Derivative Works; and 108 | 109 | 4. If the Work includes a "NOTICE" text file as part of its distribution, 110 | then any Derivative Works that You distribute must include a readable copy 111 | of the attribution notices contained within such NOTICE file, excluding 112 | those notices that do not pertain to any part of the Derivative Works, 113 | in at least one of the following places: within a NOTICE text file 114 | distributed as part of the Derivative Works; within the Source form or 115 | documentation, if provided along with the Derivative Works; or, within a 116 | display generated by the Derivative Works, if and wherever such 117 | third-party notices normally appear. The contents of the NOTICE file are 118 | for informational purposes only and do not modify the License. 119 | You may add Your own attribution notices within Derivative Works that You 120 | distribute, alongside or as an addendum to the NOTICE text from the Work, 121 | provided that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and may 125 | provide additional or different license terms and conditions for use, 126 | reproduction, or distribution of Your modifications, or for any such 127 | Derivative Works as a whole, provided Your use, reproduction, and 128 | distribution of the Work otherwise complies with the conditions 129 | stated in this License. 130 | 131 | 5. Submission of Contributions. 132 | 133 | Unless You explicitly state otherwise, any Contribution intentionally 134 | submitted for inclusion in the Work by You to the Licensor shall be under 135 | the terms and conditions of this License, without any additional 136 | terms or conditions. Notwithstanding the above, nothing herein shall 137 | supersede or modify the terms of any separate license agreement you may 138 | have executed with Licensor regarding such Contributions. 139 | 140 | 6. Trademarks. 141 | 142 | This License does not grant permission to use the trade names, trademarks, 143 | service marks, or product names of the Licensor, except as required for 144 | reasonable and customary use in describing the origin of the Work and 145 | reproducing the content of the NOTICE file. 146 | 147 | 7. Disclaimer of Warranty. 148 | 149 | Unless required by applicable law or agreed to in writing, Licensor 150 | provides the Work (and each Contributor provides its Contributions) 151 | on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 152 | either express or implied, including, without limitation, any warranties 153 | or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS 154 | FOR A PARTICULAR PURPOSE. You are solely responsible for determining the 155 | appropriateness of using or redistributing the Work and assume any risks 156 | associated with Your exercise of permissions under this License. 157 | 158 | 8. Limitation of Liability. 159 | 160 | In no event and under no legal theory, whether in tort 161 | (including negligence), contract, or otherwise, unless required by 162 | applicable law (such as deliberate and grossly negligent acts) or agreed 163 | to in writing, shall any Contributor be liable to You for damages, 164 | including any direct, indirect, special, incidental, or consequential 165 | damages of any character arising as a result of this License or out of 166 | the use or inability to use the Work (including but not limited to damages 167 | for loss of goodwill, work stoppage, computer failure or malfunction, 168 | or any and all other commercial damages or losses), even if such 169 | Contributor has been advised of the possibility of such damages. 170 | 171 | 9. Accepting Warranty or Additional Liability. 172 | 173 | While redistributing the Work or Derivative Works thereof, You may choose 174 | to offer, and charge a fee for, acceptance of support, warranty, 175 | indemnity, or other liability obligations and/or rights consistent with 176 | this License. However, in accepting such obligations, You may act only 177 | on Your own behalf and on Your sole responsibility, not on behalf of any 178 | other Contributor, and only if You agree to indemnify, defend, and hold 179 | each Contributor harmless for any liability incurred by, or claims 180 | asserted against, such Contributor by reason of your accepting any such 181 | warranty or additional liability. 182 | 183 | END OF TERMS AND CONDITIONS 184 | 185 | APPENDIX: How to apply the Apache License to your work 186 | 187 | To apply the Apache License to your work, attach the following boilerplate 188 | notice, with the fields enclosed by brackets "[]" replaced with your own 189 | identifying information. (Don't include the brackets!) The text should be 190 | enclosed in the appropriate comment syntax for the file format. We also 191 | recommend that a file or class name and description of purpose be included 192 | on the same "printed page" as the copyright notice for easier 193 | identification within third-party archives. 194 | 195 | Copyright {{ year }} {{ organization }} 196 | 197 | Licensed under the Apache License, Version 2.0 (the "License"); 198 | you may not use this file except in compliance with the License. 199 | You may obtain a copy of the License at 200 | 201 | http://www.apache.org/licenses/LICENSE-2.0 202 | 203 | Unless required by applicable law or agreed to in writing, software 204 | distributed under the License is distributed on an "AS IS" BASIS, 205 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express 206 | or implied. See the License for the specific language governing 207 | permissions and limitations under the License. 208 | --------------------------------------------------------------------------------