├── README.md ├── examples ├── colours.png ├── face.png ├── landmarks.png ├── shibuya.png ├── text.png └── what.png └── vision.coffee /README.md: -------------------------------------------------------------------------------- 1 | # vision-bot 2 | 3 | ![shibuya query](https://github.com/ryanbateman/vision-bot/raw/master/examples/shibuya.png) 4 | 5 | A Hubot script that allows it to query Google's Vision API using an image at some public URL and the display the results. 6 | 7 | ### Requirements 8 | 9 | At the moment this script requires a Google Vision API key, an S3 instance (to hold the results of facial detection), and, if you're using Heroku, an instance with a multiple-buildpack setup. In future, I'd prefer not to rely on S3. 10 | 11 | This was intended as a quick weekend project and has since turned into something with a few potentially interesting applications. 12 | 13 | #### Buildpacks 14 | 15 | When running on Heroku, this script relies on [Cairo](http://cairographics.org/), a 2d graphics library that can added to Heroku with [this handy buildpack](https://github.com/mojodna/heroku-buildpack-cairo). You'll need to have this buildpack set up using Heroku's native [multi-buildpack](https://devcenter.heroku.com/articles/using-multiple-buildpacks-for-an-app) support. Running `heroku buildpacks` should show you the following buildpacks. 16 | 17 | ``` 18 | 1. https://github.com/mojodna/heroku-buildpack-cairo.git 19 | 2. https://github.com/heroku/heroku-buildpack-nodejs 20 | ``` 21 | 22 | ### Commands 23 | 24 | 25 | `whats`/`vi ` - Do a basic LABEL query, returning the top 10 results 26 | `text ` - Perform a TEXT query, returning all text found in the image 27 | `face ` - Perform a FACE query, returning all the faces found, an image showing where, and the Vision API's assessment of their emotions 28 | `properties ` - Perform a IMAGE_PROPERTIES query, returning the dominant colours in the image 29 | `landmark ` - Perform a LANDMARK query, returning the landmarks found in the image and Google Maps links to their locations 30 | 31 | 32 | #### whats 33 | ![what](https://github.com/ryanbateman/vision-bot/raw/master/examples/what.png) 34 | 35 | #### text 36 | ![text](https://github.com/ryanbateman/vision-bot/raw/master/examples/text.png) 37 | 38 | #### face 39 | ![face](https://github.com/ryanbateman/vision-bot/raw/master/examples/face.png) 40 | 41 | #### properties 42 | ![colours](https://github.com/ryanbateman/vision-bot/raw/master/examples/colours.png) 43 | 44 | #### landmark 45 | ![landmark](https://github.com/ryanbateman/vision-bot/raw/master/examples/landmarks.png) 46 | 47 | ### Notes 48 | 49 | There's minimal error handling and my Coffeescript knowledge is basic, so there's a lot that can be improved. PRs welcomed. 50 | 51 | ### Future updates 52 | 53 | Assuming I do any further work on this, it'd be nice to add googly eyes to detected faces, support message attachments rather than URLs (for quick photo queries straight from Slack on your phone) and (as ever) a lot nicer error handling/sytax. 54 | -------------------------------------------------------------------------------- /examples/colours.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/colours.png -------------------------------------------------------------------------------- /examples/face.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/face.png -------------------------------------------------------------------------------- /examples/landmarks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/landmarks.png -------------------------------------------------------------------------------- /examples/shibuya.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/shibuya.png -------------------------------------------------------------------------------- /examples/text.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/text.png -------------------------------------------------------------------------------- /examples/what.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/what.png -------------------------------------------------------------------------------- /vision.coffee: -------------------------------------------------------------------------------- 1 | # Description: 2 | # Query the Google Vision API from Hubot. 3 | # 4 | # Dependencies: 5 | # This script currently requires an s3 setup, an environment that supports Cairo/Cavnas (Heroku with the relevant multi-pack works) 6 | # and the following node packages: node-base64-image, canvas, node-gyp, aws-sdk. 7 | # 8 | # Configuration: 9 | # AWS_ACCESS_KEY_ID - AWS Access Key ID with S3 permissions 10 | # AWS_SECRET_ACCESS_KEY - AWS Secret Access Key for ID 11 | # S3_REGION - S3 region 12 | # AWS_BUCKET - Bucket to store temporary images for facial recognition 13 | # 14 | # Commands: 15 | # hubot vi/what's/the hell is - Replies with a list of things it knows about the image 16 | # hubot text - Replies with some text it OCR'ed from the image 17 | # hubot face - Replies with some details about the faces it's detected in the image 18 | # hubot landmarks - Replies with any landmarks it finds and a link to the location on Google Maps 19 | # hubot properties - Replies with the dominant colours in an image 20 | # 21 | # Author: 22 | # Ryan Bateman (@rynbtmn) 23 | 24 | base64 = require 'node-base64-image' 25 | util = require 'util' 26 | Canvas = require 'canvas' 27 | 28 | vision_key = process.env.VISION_API_KEY 29 | 30 | aws = require 'aws-sdk' 31 | bucket = process.env.S3_BUCKET_NAME 32 | aws.config.update 33 | accessKeyId: process.env.AWS_ACCESS_KEY_ID 34 | secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY 35 | region: process.env.S3_REGION 36 | s3 = new aws.S3() 37 | 38 | # 39 | # Listen for the queries related to the Vision API 40 | # 41 | 42 | module.exports = (robot) -> 43 | 44 | robot.respond /(vi|the hell is|what's|what is|whats) (.*)/i, (msg) -> 45 | msg.send "Okay, taking a look. Bear with me - this may take a few moments..." 46 | url = msg.match[2] 47 | type = "LABEL_DETECTION" 48 | response = setupCall(robot, msg, url, type) 49 | 50 | robot.respond /(landmark|landmarks) (.*)/i, (msg) -> 51 | msg.send "Okay, looking for landmarks. Bear with me - this may take a few moments..." 52 | url = msg.match[2] 53 | type = "LANDMARK_DETECTION" 54 | response = setupCall(robot, msg, url, type) 55 | 56 | robot.respond /properties (.*)/i, (msg) -> 57 | msg.send "Okay, looking for image properties. Bear with me - this may take a few moments..." 58 | url = msg.match[1] 59 | type = "IMAGE_PROPERTIES" 60 | response = setupCall(robot, msg, url, type) 61 | 62 | robot.respond /text (.*)/i, (msg) -> 63 | msg.send "Okay, looking for text. Bear with me - this may take a few moments..." 64 | url = msg.match[1] 65 | type = "TEXT_DETECTION" 66 | response = setupCall(robot, msg, url, type) 67 | 68 | robot.respond /face (.*)/i, (msg) -> 69 | msg.send "Okay, looking for faces. Bear with me - this may take a few moments..." 70 | url = msg.match[1] 71 | type = "FACE_DETECTION" 72 | response = setupCall(robot, msg, url, type) 73 | 74 | # 75 | # Make the call to the Vision API 76 | # 77 | 78 | setupCall = (robot, msg, url, type) -> 79 | options = { string: true } 80 | base64.base64encoder url, options, (err, image) -> 81 | if err 82 | console.log err 83 | else 84 | requestJson = getJson(image, type) 85 | robot.http("https://vision.googleapis.com/v1/images:annotate?key=#{vision_key}") 86 | .header('Content-type', 'application/json') 87 | .post(JSON.stringify requestJson) (err, res, body) -> 88 | jsonBody = JSON.parse body 89 | parseResponseAndDisplayData(msg, jsonBody, image) 90 | 91 | # 92 | # Check the response and format it for display to the user 93 | # 94 | 95 | parseResponseAndDisplayData = (msg, body, image) -> 96 | console.log JSON.stringify body 97 | responseText = "" 98 | apiResponse = body.responses[0] 99 | if Object.keys( apiResponse ).length == 0 100 | responseText += "Sorry, I couldn't find anything" 101 | if apiResponse.labelAnnotations? 102 | responseText += "Here's what I see: " 103 | for labelAnnotation, index in apiResponse.labelAnnotations 104 | responseText += "*#{labelAnnotation.description}* [" + Math.round(labelAnnotation.score * 100) + "%]" 105 | if index != apiResponse.labelAnnotations.length - 1 106 | responseText += ", " 107 | if apiResponse.imagePropertiesAnnotation? 108 | responseText += "\nDominant colours: " 109 | colors = apiResponse.imagePropertiesAnnotation.dominantColors.colors 110 | for dominantColor, index in colors 111 | rgb = [dominantColor.color.red, dominantColor.color.green, dominantColor.color.blue] 112 | hexColour = hexify rgb 113 | responseText += "#{hexColour}" 114 | if index != colors.length - 1 115 | responseText += ", " 116 | if apiResponse.textAnnotations? 117 | responseText += "Here's the text I can make out: " 118 | responseText += "```" 119 | for textAnnotation in apiResponse.textAnnotations 120 | responseText += "" + textAnnotation.description 121 | responseText += "```" 122 | if apiResponse.landmarkAnnotations? 123 | responseText += "Here are the landmarks I can make out: \n" 124 | for landmark, index in apiResponse.landmarkAnnotations 125 | if landmark.description 126 | responseText += "*#{landmark.description}* " 127 | else 128 | responseText += "A place I think is here: " 129 | if landmark.locations 130 | responseText += "http://maps.google.com/maps?z=12&t=m&q=loc:#{landmark.locations[0].latLng.latitude}+#{landmark.locations[0].latLng.longitude}\n" 131 | if apiResponse.faceAnnotations? 132 | responseText += "Here are some faces I can make out: \n" 133 | img = new Canvas.Image 134 | img.onload = -> 135 | canvas = new Canvas(img.width, img.height) 136 | context = canvas.getContext("2d") 137 | context.drawImage(img, 0, 0, img.width, img.height) 138 | for face, index in apiResponse.faceAnnotations 139 | drawFace face, index, context 140 | responseText += "Face #{index}\n\tJoy: #{face.joyLikelihood}\n\tSorrow: #{face.sorrowLikelihood}\n\tAnger: #{face.angerLikelihood}\n\tSurprise: #{face.surpriseLikelihood}\n\n" 141 | s3.upload { Key: Date.now() + ".jpg", Bucket: bucket, ACL: "public-read", Body: canvas.toBuffer()}, (err, output) -> 142 | if err 143 | msg.send "Seems there was an error detecting any faces" 144 | console.log "Location " + output.Location 145 | msg.send "Here's what they look like #{output.Location}" 146 | img.src = new Buffer image, "base64" 147 | console.log "setting source" 148 | msg.send responseText 149 | 150 | # 151 | # Draw the faces 152 | # 153 | 154 | drawFace = (face, index, context) -> 155 | poly = face.fdBoundingPoly.vertices 156 | context.lineWidth = 5 157 | context.strokeStyle = "rgba(255,0,0,1)" 158 | context.beginPath() 159 | context.lineTo(poly[0].x, poly[0].y) 160 | context.lineTo(poly[1].x, poly[1].y) 161 | context.lineTo(poly[2].x, poly[2].y) 162 | context.lineTo(poly[3].x, poly[3].y) 163 | context.lineTo(poly[0].x, poly[0].y) 164 | context.stroke() 165 | context.lineWidth = 1 166 | context.font = 'bold 50px Impact, serif' 167 | context.fillStyle = "#f00" 168 | context.fillText "#{index}", poly[3].x + 15, poly[3].y - 15 169 | context.strokeStyle = "#fff" 170 | context.strokeText "#{index}", poly[3].x + 15, poly[3].y - 15 171 | 172 | # 173 | # Hexify colours 174 | # 175 | hexify = (rgb) -> 176 | colour = '#' 177 | colour += pad Math.floor(rgb[0]).toString(16) 178 | colour += pad Math.floor(rgb[1]).toString(16) 179 | colour += pad Math.floor(rgb[2]).toString(16) 180 | colour 181 | 182 | pad = (str) -> 183 | if str.length < 2 184 | return "0" + str 185 | else 186 | return str 187 | 188 | # 189 | # Format some JSON for the Vision API call 190 | # 191 | 192 | getJson = (image, type) -> 193 | return requestJson = { 194 | requests: [ 195 | { 196 | image: 197 | content: image 198 | features: [ 199 | { 200 | type: type 201 | maxResults: 10 202 | } 203 | ] 204 | } 205 | ] 206 | } --------------------------------------------------------------------------------