├── var └── .gitignore ├── test ├── sample │ ├── orig.jpg │ ├── solid.png │ ├── control.jpg │ ├── orig-copy.jpg │ ├── attacked-gamma.jpg │ ├── attacked-noise.jpg │ ├── attacked-padded.jpg │ ├── attacked-scaled.jpg │ ├── attacked-contrast.jpg │ ├── attacked-rotated.jpg │ ├── attacked-sheared.jpg │ ├── attacked-compressed.jpg │ ├── attacked-grayscale.jpg │ ├── attacked-new-feature.jpg │ ├── attacked-color-curved.jpg │ ├── attacked-crop-centered.jpg │ └── attacked-gaussian-blur.jpg └── index.js ├── Makefile ├── package.json ├── .gitignore ├── hash_profile.js ├── index.js └── README.md /var/.gitignore: -------------------------------------------------------------------------------- 1 | * 2 | 3 | !.gitignore -------------------------------------------------------------------------------- /test/sample/orig.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/orig.jpg -------------------------------------------------------------------------------- /test/sample/solid.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/solid.png -------------------------------------------------------------------------------- /test/sample/control.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/control.jpg -------------------------------------------------------------------------------- /test/sample/orig-copy.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/orig-copy.jpg -------------------------------------------------------------------------------- /test/sample/attacked-gamma.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-gamma.jpg -------------------------------------------------------------------------------- /test/sample/attacked-noise.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-noise.jpg -------------------------------------------------------------------------------- /test/sample/attacked-padded.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-padded.jpg -------------------------------------------------------------------------------- /test/sample/attacked-scaled.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-scaled.jpg -------------------------------------------------------------------------------- /test/sample/attacked-contrast.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-contrast.jpg -------------------------------------------------------------------------------- /test/sample/attacked-rotated.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-rotated.jpg -------------------------------------------------------------------------------- /test/sample/attacked-sheared.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-sheared.jpg -------------------------------------------------------------------------------- /test/sample/attacked-compressed.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-compressed.jpg -------------------------------------------------------------------------------- /test/sample/attacked-grayscale.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-grayscale.jpg -------------------------------------------------------------------------------- /test/sample/attacked-new-feature.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-new-feature.jpg -------------------------------------------------------------------------------- /test/sample/attacked-color-curved.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-color-curved.jpg -------------------------------------------------------------------------------- /test/sample/attacked-crop-centered.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-crop-centered.jpg -------------------------------------------------------------------------------- /test/sample/attacked-gaussian-blur.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/skedastik/ghash/HEAD/test/sample/attacked-gaussian-blur.jpg -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | REPORTER = spec 2 | 3 | all: jshint test 4 | 5 | test: 6 | @NODE_ENV=test ./node_modules/.bin/mocha --recursive --reporter $(REPORTER) --timeout 3000 7 | 8 | jshint: 9 | jshint test index.js 10 | 11 | tests: test 12 | 13 | clean: 14 | rm -f var/* 15 | 16 | .PHONY: clean test jshint -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "ghash", 3 | "version": "0.1.0", 4 | "description": "Generate fuzzy hashes for images", 5 | "main": "index.js", 6 | "scripts": { 7 | "test": "mocha" 8 | }, 9 | "repository": { 10 | "type": "git", 11 | "url": "https://github.com/skedastik/ghash.git" 12 | }, 13 | "keywords": [ 14 | "fuzzy", 15 | "hash", 16 | "image", 17 | "search" 18 | ], 19 | "author": "Alaric Holloway", 20 | "license": "ISC", 21 | "bugs": { 22 | "url": "https://github.com/skedastik/ghash.git" 23 | }, 24 | "homepage": "https://github.com/skedastik/ghash", 25 | "devDependencies": { 26 | "chai": "^3.5.0", 27 | "chai-as-promised": "^5.3.0", 28 | "hamming-distance": "^1.0.0", 29 | "mocha": "^2.5.3", 30 | "sprintf-js": "^1.0.3" 31 | }, 32 | "dependencies": { 33 | "bluebird": "^3.4.1", 34 | "sharp": "^0.15.0" 35 | } 36 | } 37 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # node.js # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 2 | 3 | # Logs 4 | logs 5 | *.log 6 | 7 | # Runtime data 8 | pids 9 | *.pid 10 | *.seed 11 | 12 | # Directory for instrumented libs generated by jscoverage/JSCover 13 | lib-cov 14 | 15 | # Coverage directory used by tools like istanbul 16 | coverage 17 | 18 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 19 | .grunt 20 | 21 | # node-waf configuration 22 | .lock-wscript 23 | 24 | # Compiled binary addons (http://nodejs.org/api/addons.html) 25 | build/Release 26 | 27 | # Dependency directory 28 | # https://www.npmjs.org/doc/misc/npm-faq.html#should-i-check-my-node_modules-folder-into-git 29 | node_modules 30 | 31 | # OSX # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 32 | 33 | .DS_Store 34 | .AppleDouble 35 | .LSOverride 36 | # Icon must end with two \r 37 | Icon 38 | # Thumbnails 39 | ._* 40 | # Files that might appear in the root of a volume 41 | .DocumentRevisions-V100 42 | .fseventsd 43 | .Spotlight-V100 44 | .TemporaryItems 45 | .Trashes 46 | .VolumeIcon.icns 47 | # Directories potentially created on remote AFP share 48 | .AppleDB 49 | .AppleDesktop 50 | Network Trash Folder 51 | Temporary Items 52 | .apdisk 53 | 54 | # Textmate 55 | .tm_properties -------------------------------------------------------------------------------- /test/index.js: -------------------------------------------------------------------------------- 1 | // test 2 | 3 | var chai = require('chai'); 4 | var chaiAsPromised = require('chai-as-promised'); 5 | var Promise = require('bluebird'); 6 | var ghash = require('../index'); 7 | var hammingDistance = require('hamming-distance'); 8 | 9 | chai.use(chaiAsPromised); 10 | chai.should(); 11 | 12 | describe('ghash', function() { 13 | it('should fail when given a bad input image', function() { 14 | return ghash('test/sample/nonexistent.png').calculate().should.eventually.be.rejected; 15 | }); 16 | 17 | it('should succeed when given an a valid input image', function() { 18 | return ghash('test/sample/solid.png').calculate().should.eventually.be.fulfilled; 19 | }); 20 | 21 | it('should return a zeroed-out hash for a solid input image', function() { 22 | var zeroBuffer = new Buffer([0,0,0,0,0,0,0,0]); 23 | return ghash('test/sample/solid.png') 24 | .calculate() 25 | .call('compare', zeroBuffer).should.eventually.equal(0); 26 | }); 27 | 28 | it('should generate identical hashes for identical images', function() { 29 | return Promise.all([ 30 | ghash('test/sample/orig.jpg').calculate(), 31 | ghash('test/sample/orig-copy.jpg').calculate() 32 | ]).then(function(hashes) { 33 | return Buffer.compare(hashes[0], hashes[1]); 34 | }).should.eventually.equal(0); 35 | }); 36 | 37 | it('should generate very different hashes for very different images', function() { 38 | return Promise.all([ 39 | ghash('test/sample/orig.jpg').calculate(), 40 | ghash('test/sample/control.jpg').calculate() 41 | ]).then(function(hashes) { 42 | return hammingDistance(hashes[0].toString('hex'), hashes[1].toString('hex')); 43 | }).should.eventually.be.above(16); 44 | }); 45 | 46 | it('should generate similar hashes for similar images', function() { 47 | return Promise.all([ 48 | ghash('test/sample/orig.jpg').calculate(), 49 | ghash('test/sample/attacked-compressed.jpg').calculate() 50 | ]).then(function(hashes) { 51 | return hammingDistance(hashes[0].toString('hex'), hashes[1].toString('hex')); 52 | }).should.eventually.be.below(8); 53 | }); 54 | }); -------------------------------------------------------------------------------- /hash_profile.js: -------------------------------------------------------------------------------- 1 | // compare hashing results under various attacks and resolutions 2 | 3 | var hd = require('hamming-distance'); 4 | var sprintf = require('sprintf-js').sprintf; 5 | var Promise = require('bluebird'); 6 | var ghash = require('./index'); 7 | 8 | var BASE_PATH = 'test/sample/'; 9 | var EXTENSION = '.jpg'; 10 | 11 | var fuzzinesses = [0, 5, 10]; 12 | var resolutions = [8, 4, 3]; 13 | var files = [ 14 | 'orig', 15 | 'attacked-compressed', 16 | 'attacked-color-curved', 17 | 'attacked-grayscale', 18 | 'attacked-contrast', 19 | 'attacked-gamma', 20 | 'attacked-noise', 21 | 'attacked-gaussian-blur', 22 | 'attacked-scaled', 23 | 'attacked-new-feature', 24 | 'attacked-padded', 25 | 'attacked-sheared', 26 | 'attacked-crop-centered', 27 | 'attacked-rotated', 28 | 'control' 29 | ]; 30 | 31 | var hashes = Promise.map(fuzzinesses, function(fuzziness) { 32 | return Promise.map(resolutions, function(resolution) { 33 | return Promise.map(files, function(filename) { 34 | return ghash(BASE_PATH + filename + EXTENSION) 35 | .resolution(resolution) 36 | // .debugOut('var/' + filename) 37 | .fuzziness(fuzziness) 38 | .calculate(); 39 | }); 40 | }); 41 | }); 42 | 43 | Promise.all(hashes).then(function(hashSets) { 44 | console.log('(Hashes / Hamming distance) at...\n'); 45 | for (var j = 0; j < fuzzinesses.length; j++) { 46 | console.log('fuzziness = ' + fuzzinesses[j] + '\n'); 47 | console.log(sprintf(' %-32s %-24s %-24s %s', 'Input', 'resolution = 8', 'resolution = 4', 'resolution = 3')); 48 | console.log(sprintf(' -------------------------------------------------------------------------------------------------------')); 49 | for (var i = 0; i < files.length; i++) { 50 | console.log(sprintf( 51 | ' %-32s %s / %-5d %s / %-5d %s / %d', 52 | files[i], 53 | hashSets[j][0][i].toString('hex'), hd(hashSets[j][0][0], hashSets[j][0][i]), 54 | hashSets[j][1][i].toString('hex'), hd(hashSets[j][1][0], hashSets[j][1][i]), 55 | hashSets[j][2][i].toString('hex'), hd(hashSets[j][2][0], hashSets[j][2][i]) 56 | )); 57 | } 58 | console.log(''); 59 | } 60 | }); -------------------------------------------------------------------------------- /index.js: -------------------------------------------------------------------------------- 1 | var sharp = require('sharp'); 2 | var Promise = require('bluebird'); 3 | 4 | module.exports = GHash; 5 | 6 | var MIN_RESOLUTION = 2; 7 | var MAX_RESOLUTION = 8; 8 | var OUTPUT_BUF_SIZE = MAX_RESOLUTION * MAX_RESOLUTION / 8; 9 | var MIN_FUZZINESS = 0; 10 | var MAX_FUZZINESS = 255; 11 | 12 | function GHash(input) { 13 | if (!(this instanceof GHash)) { 14 | return new GHash(input); 15 | } 16 | this.input = input; 17 | this.options = { 18 | resolution: MAX_RESOLUTION, 19 | fuzziness: 0 20 | }; 21 | return this; 22 | } 23 | 24 | GHash.prototype.resolution = function(resolution) { 25 | if (resolution > MAX_RESOLUTION || resolution < MIN_RESOLUTION) { 26 | throw 'Invalid resolution (' + resolution + ') passed to ghash'; 27 | } 28 | this.options.resolution = resolution; 29 | return this; 30 | }; 31 | 32 | GHash.prototype.debugOut = function(pathAndPrefix) { 33 | this.options.debugOut = pathAndPrefix; 34 | return this; 35 | }; 36 | 37 | GHash.prototype.fuzziness = function(fuzziness) { 38 | if (fuzziness > MAX_FUZZINESS || fuzziness < MIN_FUZZINESS) { 39 | throw 'Invalid resolution (' + fuzziness + ') passed to ghash'; 40 | } 41 | this.options.fuzziness = fuzziness; 42 | return this; 43 | }; 44 | 45 | GHash.prototype.calculate = function(callback) { 46 | var that = this; 47 | var image = sharp(this.input) 48 | // note: choice of interpolator affects hash value 49 | .resize(this.options.resolution, this.options.resolution, { interpolator: sharp.interpolator.bilinear }) 50 | .flatten() 51 | .grayscale(); 52 | 53 | // TODO: An additional dynamic-range normalization pass (essentially zero-mean + feature-scaling) would probably be helpful here, as a static `fuzziness` value will have disproportionate effect for low vs. high contrast images. 54 | 55 | if (this.options.debugOut) { 56 | image.toFile(this.options.debugOut + '.png'); 57 | } 58 | 59 | var hash = image 60 | .raw() 61 | .toBuffer() 62 | .then(function(buf) { 63 | return calculateHash(buf, that.options.fuzziness); 64 | }); 65 | 66 | if (callback) { 67 | return hash.then(function(hash) { 68 | callback(null, hash); 69 | }) 70 | .error(callback); 71 | } 72 | 73 | return hash; 74 | }; 75 | 76 | function calculateHash(inputBuf, fuzziness) { 77 | var outputBuf = new Buffer(OUTPUT_BUF_SIZE); 78 | var octet = 0; 79 | var bit = 0; 80 | var iters = inputBuf.length - 1; 81 | outputBuf.fill(0); 82 | // calculate hash from luminance gradient 83 | for (var i = 0; i < iters; i++) { 84 | if (inputBuf[i + 1] - inputBuf[i] > fuzziness) { 85 | outputBuf[octet] |= 1 << bit; 86 | } 87 | if (++bit == 8) { 88 | octet++; 89 | bit = 0; 90 | } 91 | } 92 | // wrap to first pixel 93 | if (inputBuf[0] - inputBuf[i] > fuzziness) { 94 | outputBuf[octet] |= 1 << bit; 95 | } 96 | return outputBuf; 97 | } 98 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ghash 2 | 3 | ghash generates fuzzy hashes from images. That is, it generates similar hashes for perceptually similar images. 4 | 5 | ## Usage 6 | 7 | ghash makes use of features from [sharp][sharp-url] requiring [libvips][libvips-url] v7.42.0+. 8 | 9 | ### Overview 10 | 11 | Generate a 64-bit hash (Buffer) from a local file in the form of a Promise: 12 | 13 | ```javascript 14 | ghash('path/to/image.jpg') 15 | .calculate() 16 | .then(function (hash) { 17 | console.log(hash.toString('hex')); 18 | }); 19 | ``` 20 | 21 | Generate a hash from an image stored in a Buffer, using a callback: 22 | 23 | ```javascript 24 | ghash(imageBuffer).calculate(function (err, hash) { 25 | if (!err) console.log(hash.toString('hex')); 26 | }); 27 | ``` 28 | 29 | Specify resolution and fuzziness: 30 | 31 | ```javascript 32 | ghash('path/to/image.jpg') 33 | .resolution(4) 34 | .fuzziness(10) 35 | .calculate(); 36 | ``` 37 | 38 | ### Options 39 | 40 | #### Resolution 41 | 42 | Valid values are in the range [2-8], inclusive. ghash always generates a 64-bit Buffer, but depending on resolution, not all bits are utilized. The default resolution is 8, utilizing all 64 bits. Smaller resolutions increase fuzziness (see Supplemental). 43 | 44 | #### Fuzziness 45 | 46 | Valid values are in the range [0-255], inclusive. The default fuzziness value is 0. Increasing this value increases fuzziness (see Supplemental). 47 | 48 | ## Supplemental 49 | 50 | ### Why ghash? 51 | 52 | * ghash is simple (see Internals). 53 | * ghash works well for tiny images. 54 | - [pHash](http://www.phash.org/), which is fantastic, doesn't work well for tiny images in my experience. Even for 4x4-pixel images that are nearly perceptually identical, pHash generates vastly different hashes. Theoretically, ghash could synergize nicely with pHash. 55 | * ghash's "fuzziness" can be tuned (see Hash Characteristics). 56 | - ~~For instance, similar input images can be reduced to _identical_ "archetypes" (potentially useful for similarity search space reduction~~). In its current form, ghash generates too few hash collisions to reliably "bucket" images into archetypal hashes. This makes it unsuitable for similarity search space reduction. See [this study][study-url] for details. 57 | * ghash is fairly resilient to various attacks (see Resilience). 58 | 59 | ### Hash Characteristics 60 | 61 | Following are the results of running ghash (via included hash_profile.js) against various attacks on an input image (see test/sample) at various resolution and fuzziness values. 62 | 63 | Numbered columns indicate Hamming distance from the original image's hash value at respective resolutions (8, 4, 3). 64 | 65 | ``` 66 | fuzziness = 0 67 | 68 | Input 8 4 3 69 | -------------------------------------------------- 70 | orig 0 0 0 71 | attacked-compressed 0 1 0 72 | attacked-color-curved 0 0 0 73 | attacked-grayscale 1 0 0 74 | attacked-contrast 1 1 0 75 | attacked-gamma 1 1 0 76 | attacked-noise 0 1 0 77 | attacked-gaussian-blur 1 1 0 78 | attacked-scaled 1 1 0 79 | attacked-new-feature 2 2 0 80 | attacked-padded 33 8 3 81 | attacked-sheared 10 2 0 82 | attacked-crop-centered 17 10 2 83 | attacked-rotated 27 7 2 84 | control 38 13 6 85 | 86 | fuzziness = 5 87 | 88 | Input 8 4 3 89 | -------------------------------------------------- 90 | orig 0 0 0 91 | attacked-compressed 0 0 0 92 | attacked-color-curved 2 1 0 93 | attacked-grayscale 0 0 0 94 | attacked-contrast 2 2 0 95 | attacked-gamma 3 0 1 96 | attacked-noise 0 1 0 97 | attacked-gaussian-blur 0 0 0 98 | attacked-scaled 0 0 0 99 | attacked-new-feature 2 1 0 100 | attacked-padded 34 8 3 101 | attacked-sheared 12 1 0 102 | attacked-crop-centered 17 11 3 103 | attacked-rotated 21 4 3 104 | control 35 9 5 105 | 106 | fuzziness = 10 107 | 108 | Input 8 4 3 109 | -------------------------------------------------- 110 | orig 0 0 0 111 | attacked-compressed 0 0 0 112 | attacked-color-curved 2 0 0 113 | attacked-grayscale 1 0 0 114 | attacked-contrast 0 0 0 115 | attacked-gamma 2 0 0 116 | attacked-noise 0 0 0 117 | attacked-gaussian-blur 0 0 0 118 | attacked-scaled 0 0 0 119 | attacked-new-feature 1 0 0 120 | attacked-padded 29 6 3 121 | attacked-sheared 7 1 0 122 | attacked-crop-centered 14 9 2 123 | attacked-rotated 13 4 4 124 | control 27 8 5 125 | ``` 126 | 127 | As you can see, _decreasing_ the `resolution` or _increasing_ the `fuzziness` value tends to generate more hash collisions--that is, Hamming distances of 0 (though not monotonically). This is important for eliminating false negatives in a similarity search, for instance. 128 | 129 | #### Resilience 130 | 131 | Going by the above results... 132 | 133 | * ghash is resilient to the following attacks... 134 | - Compression 135 | - Color curving 136 | - Contrast/gamma adjustment 137 | - Noise 138 | - Gaussian blurring 139 | - Grayscale conversion 140 | - Scaling 141 | * ghash has some resilience to... 142 | - Small feature changes 143 | - Slight shearing 144 | * ghash has virtually no resilience to... 145 | - Padding 146 | - Cropping 147 | - Rotation 148 | 149 | ### Internals 150 | 151 | ghash works in two stages. 152 | 153 | #### 1. Image preprocessing 154 | 155 | This involves two steps: conversion to grayscale, and down-scaling. 156 | 157 | ghash converts to grayscale not only to simplify the algorithm, but because human sight relies almost entirely on luminance to recognize images. 158 | 159 | Downscaling is necessary to compress the hash to 64 bits or less. The `resolution` value determines the final size of the downsized image. 160 | 161 | #### 2. Hash calculation via luminance gradient 162 | 163 | Now that ghash has a preprocessed image, it can begin calculating the hash value. It does so by calculating the difference in luminance between every pixel in the image and its immediate neighbor. If the neighboring pixel is "brighter" by some threshold (the `fuzziness` value), the corresponding bit in the output buffer is set to 1, otherwise 0. 164 | 165 | By examining these "deltas", the algorithm is essentially performing edge-detection, the basis of image recognition. A higher `fuzziness` value equates to a higher edge-detection threshold. 166 | 167 | ### Caveats 168 | 169 | * The above results are not scientific. The sample size used to explore ghash is too small to form serious conclusions. See [this study][study-url] for more comprehensive coverage. 170 | 171 | * If ghash is used for search-space reduction, even a single false negative is bad. False positives are okay. This means you should tune aggressively for minimizing false negatives, even at the expense of increasing false positives. Elimination of all false negatives cannot be guaranteed. :( 172 | 173 | * Because ghash uses [sharp][sharp-url]/[libvips][libvips-url] to preprocess images, changes to either library may result in tiny changes to the hashes produced. 174 | 175 | * Because of ghash's simplicity, it has the potential to be lightning-fast. This implementation is slow. Ideally, ghash would be a native, multithreaded library. 176 | 177 | [study-url]: https://github.com/skedastik/ghash-profile/blob/master/README.md 178 | [sharp-url]: https://github.com/lovell/sharp 179 | [libvips-url]: https://github.com/jcupitt/libvips --------------------------------------------------------------------------------