├── README.md
├── examples
    ├── colours.png
    ├── face.png
    ├── landmarks.png
    ├── shibuya.png
    ├── text.png
    └── what.png
└── vision.coffee


/README.md:
--------------------------------------------------------------------------------
 1 | # vision-bot
 2 | 
 3 | ![shibuya query](https://github.com/ryanbateman/vision-bot/raw/master/examples/shibuya.png)  
 4 |   
 5 | A Hubot script that allows it to query Google's Vision API using an image at some public URL and the display the results.
 6 | 
 7 | ### Requirements
 8 | 
 9 | At the moment this script requires a Google Vision API key, an S3 instance (to hold the results of facial detection), and, if you're using Heroku, an instance with a multiple-buildpack setup. In future, I'd prefer not to rely on S3. 
10 | 
11 | This was intended as a quick weekend project and has since turned into something with a few potentially interesting applications.  
12 | 
13 | #### Buildpacks
14 | 
15 | When running on Heroku, this script relies on [Cairo](http://cairographics.org/), a 2d graphics library that can added to Heroku with [this handy buildpack](https://github.com/mojodna/heroku-buildpack-cairo). You'll need to have this buildpack set up using Heroku's native [multi-buildpack](https://devcenter.heroku.com/articles/using-multiple-buildpacks-for-an-app) support. Running `heroku buildpacks` should show you the following buildpacks. 
16 | 
17 | ```
18 | 1. https://github.com/mojodna/heroku-buildpack-cairo.git
19 | 2. https://github.com/heroku/heroku-buildpack-nodejs
20 | ```
21 | 
22 | ### Commands
23 | 
24 | 
25 | `whats`/`vi <url>` - Do a basic LABEL query, returning the top 10 results  
26 | `text <url>` - Perform a TEXT query, returning all text found in the image  
27 | `face <url>` - Perform a FACE query, returning all the faces found, an image showing where, and the Vision API's assessment of their emotions  
28 | `properties <url>` - Perform a IMAGE_PROPERTIES query, returning the dominant colours in the image  
29 | `landmark <url>` - Perform a LANDMARK query, returning the landmarks found in the image and Google Maps links to their locations  
30 | 
31 | 
32 | #### whats
33 | ![what](https://github.com/ryanbateman/vision-bot/raw/master/examples/what.png)
34 | 
35 | #### text
36 | ![text](https://github.com/ryanbateman/vision-bot/raw/master/examples/text.png)
37 | 
38 | #### face
39 | ![face](https://github.com/ryanbateman/vision-bot/raw/master/examples/face.png)
40 | 
41 | #### properties
42 | ![colours](https://github.com/ryanbateman/vision-bot/raw/master/examples/colours.png)
43 | 
44 | #### landmark
45 | ![landmark](https://github.com/ryanbateman/vision-bot/raw/master/examples/landmarks.png)
46 | 
47 | ### Notes
48 | 
49 | There's minimal error handling and my Coffeescript knowledge is basic, so there's a lot that can be improved. PRs welcomed.  
50 | 
51 | ### Future updates
52 | 
53 | Assuming I do any further work on this, it'd be nice to add googly eyes to detected faces, support message attachments rather than URLs (for quick photo queries straight from Slack on your phone) and (as ever) a lot nicer error handling/sytax. 
54 | 


--------------------------------------------------------------------------------
/examples/colours.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/colours.png


--------------------------------------------------------------------------------
/examples/face.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/face.png


--------------------------------------------------------------------------------
/examples/landmarks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/landmarks.png


--------------------------------------------------------------------------------
/examples/shibuya.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/shibuya.png


--------------------------------------------------------------------------------
/examples/text.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/text.png


--------------------------------------------------------------------------------
/examples/what.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ryanbateman/vision-bot/008f78c9c8f590e66c6b064ebfe4a39debe9fe32/examples/what.png


--------------------------------------------------------------------------------
/vision.coffee:
--------------------------------------------------------------------------------
  1 | # Description:
  2 | #   Query the Google Vision API from Hubot.
  3 | #
  4 | # Dependencies:
  5 | #   This script currently requires an s3 setup, an environment that supports Cairo/Cavnas (Heroku with the relevant multi-pack works)
  6 | #   and the following node packages: node-base64-image, canvas, node-gyp, aws-sdk. 
  7 | #
  8 | # Configuration:
  9 | #   AWS_ACCESS_KEY_ID               - AWS Access Key ID with S3 permissions
 10 | #   AWS_SECRET_ACCESS_KEY      - AWS Secret Access Key for ID
 11 | #   S3_REGION                                 - S3 region
 12 | #   AWS_BUCKET                             - Bucket to store temporary images for facial recognition
 13 | #
 14 | # Commands:
 15 | #   hubot vi/what's/the hell is <url> - Replies with a list of things it knows about the image
 16 | #   hubot text <url> - Replies with some text it OCR'ed from the image
 17 | #   hubot face <url> - Replies with some details about the faces it's detected in the image
 18 | #   hubot landmarks <url> - Replies with any landmarks it finds and a link to the location on Google Maps
 19 | #   hubot properties <url> - Replies with the dominant colours in an image
 20 | #
 21 | # Author:
 22 | #   Ryan Bateman (@rynbtmn)
 23 | 
 24 | base64 = require 'node-base64-image'
 25 | util  = require 'util'
 26 | Canvas = require 'canvas'
 27 | 
 28 | vision_key = process.env.VISION_API_KEY
 29 | 
 30 | aws   = require 'aws-sdk'
 31 | bucket = process.env.S3_BUCKET_NAME
 32 | aws.config.update
 33 |   accessKeyId:     process.env.AWS_ACCESS_KEY_ID
 34 |   secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
 35 |   region:          process.env.S3_REGION
 36 | s3 = new aws.S3()
 37 | 
 38 | #
 39 | # Listen for the queries related to the Vision API
 40 | #
 41 | 
 42 | module.exports = (robot) ->
 43 | 
 44 |   robot.respond /(vi|the hell is|what's|what is|whats) (.*)/i, (msg) ->
 45 |     msg.send "Okay, taking a look. Bear with me - this may take a few moments..."
 46 |     url = msg.match[2]
 47 |     type = "LABEL_DETECTION"
 48 |     response = setupCall(robot, msg, url, type)
 49 | 
 50 |   robot.respond /(landmark|landmarks) (.*)/i, (msg) ->
 51 |     msg.send "Okay, looking for landmarks. Bear with me - this may take a few moments..."
 52 |     url = msg.match[2]
 53 |     type = "LANDMARK_DETECTION"
 54 |     response = setupCall(robot, msg, url, type)    
 55 | 
 56 |   robot.respond /properties (.*)/i, (msg) ->
 57 |     msg.send "Okay, looking for image properties. Bear with me - this may take a few moments..."
 58 |     url = msg.match[1]
 59 |     type = "IMAGE_PROPERTIES"
 60 |     response = setupCall(robot, msg, url, type)    
 61 | 
 62 |   robot.respond /text (.*)/i, (msg) ->
 63 |     msg.send "Okay, looking for text. Bear with me - this may take a few moments..."
 64 |     url = msg.match[1]
 65 |     type = "TEXT_DETECTION"
 66 |     response = setupCall(robot, msg, url, type)    
 67 | 
 68 |   robot.respond /face (.*)/i, (msg) ->
 69 |     msg.send "Okay, looking for faces. Bear with me - this may take a few moments..."
 70 |     url = msg.match[1]
 71 |     type = "FACE_DETECTION"
 72 |     response = setupCall(robot, msg, url, type)    
 73 | 
 74 | #
 75 | # Make the call to the Vision API
 76 | #
 77 | 
 78 | setupCall = (robot, msg, url, type) ->
 79 |   options = { string: true }
 80 |   base64.base64encoder url, options, (err, image) ->
 81 |     if err
 82 |       console.log err
 83 |     else 
 84 |       requestJson = getJson(image, type)
 85 |       robot.http("https://vision.googleapis.com/v1/images:annotate?key=#{vision_key}")
 86 |         .header('Content-type', 'application/json')
 87 |         .post(JSON.stringify requestJson) (err, res, body) ->    
 88 |           jsonBody = JSON.parse body
 89 |           parseResponseAndDisplayData(msg, jsonBody, image)
 90 | 
 91 | #
 92 | # Check the response and format it for display to the user
 93 | #
 94 | 
 95 | parseResponseAndDisplayData = (msg, body, image) ->
 96 |   console.log JSON.stringify body
 97 |   responseText = ""
 98 |   apiResponse = body.responses[0]
 99 |   if Object.keys( apiResponse ).length == 0
100 |     responseText += "Sorry, I couldn't find anything"
101 |   if apiResponse.labelAnnotations?
102 |     responseText += "Here's what I see: "
103 |     for labelAnnotation, index in apiResponse.labelAnnotations
104 |       responseText += "*#{labelAnnotation.description}* [" + Math.round(labelAnnotation.score * 100) + "%]"
105 |       if index != apiResponse.labelAnnotations.length - 1
106 |         responseText += ", "
107 |   if apiResponse.imagePropertiesAnnotation?
108 |     responseText += "\nDominant colours: "
109 |     colors = apiResponse.imagePropertiesAnnotation.dominantColors.colors
110 |     for dominantColor, index in colors
111 |       rgb = [dominantColor.color.red, dominantColor.color.green, dominantColor.color.blue] 
112 |       hexColour = hexify rgb
113 |       responseText += "#{hexColour}"
114 |       if index != colors.length - 1
115 |         responseText += ", "
116 |   if apiResponse.textAnnotations?
117 |     responseText += "Here's the text I can make out: "
118 |     responseText += "```"
119 |     for textAnnotation in apiResponse.textAnnotations
120 |       responseText += "" + textAnnotation.description  
121 |     responseText += "```"
122 |   if apiResponse.landmarkAnnotations?
123 |     responseText += "Here are the landmarks I can make out: \n"
124 |     for landmark, index in apiResponse.landmarkAnnotations
125 |       if landmark.description
126 |         responseText += "*#{landmark.description}* "
127 |       else 
128 |         responseText += "A place I think is here: "
129 |       if landmark.locations
130 |         responseText += "http://maps.google.com/maps?z=12&t=m&q=loc:#{landmark.locations[0].latLng.latitude}+#{landmark.locations[0].latLng.longitude}\n"  
131 |   if apiResponse.faceAnnotations?
132 |     responseText += "Here are some faces I can make out: \n"
133 |     img = new Canvas.Image
134 |     img.onload = ->
135 |       canvas = new Canvas(img.width, img.height)
136 |       context = canvas.getContext("2d")
137 |       context.drawImage(img, 0, 0, img.width, img.height)
138 |       for face, index in apiResponse.faceAnnotations
139 |         drawFace face, index, context
140 |         responseText += "Face #{index}\n\tJoy: #{face.joyLikelihood}\n\tSorrow: #{face.sorrowLikelihood}\n\tAnger: #{face.angerLikelihood}\n\tSurprise: #{face.surpriseLikelihood}\n\n"
141 |       s3.upload { Key: Date.now() + ".jpg", Bucket: bucket, ACL: "public-read", Body: canvas.toBuffer()}, (err, output) ->
142 |         if err
143 |           msg.send "Seems there was an error detecting any faces"
144 |         console.log "Location " + output.Location
145 |         msg.send "Here's what they look like #{output.Location}"
146 |     img.src = new Buffer image, "base64"
147 |     console.log "setting source"
148 |   msg.send responseText
149 | 
150 | #
151 | # Draw the faces
152 | #
153 | 
154 | drawFace = (face, index, context) ->
155 |   poly = face.fdBoundingPoly.vertices
156 |   context.lineWidth = 5
157 |   context.strokeStyle = "rgba(255,0,0,1)"
158 |   context.beginPath()
159 |   context.lineTo(poly[0].x, poly[0].y)
160 |   context.lineTo(poly[1].x, poly[1].y)
161 |   context.lineTo(poly[2].x, poly[2].y)
162 |   context.lineTo(poly[3].x, poly[3].y)
163 |   context.lineTo(poly[0].x, poly[0].y)
164 |   context.stroke()
165 |   context.lineWidth = 1
166 |   context.font = 'bold 50px Impact, serif'
167 |   context.fillStyle = "#f00"
168 |   context.fillText "#{index}", poly[3].x + 15, poly[3].y - 15
169 |   context.strokeStyle = "#fff"
170 |   context.strokeText "#{index}", poly[3].x + 15, poly[3].y - 15
171 | 
172 | #
173 | # Hexify colours
174 | #
175 | hexify = (rgb) ->
176 |     colour = '#'
177 |     colour += pad Math.floor(rgb[0]).toString(16)
178 |     colour += pad Math.floor(rgb[1]).toString(16)
179 |     colour += pad Math.floor(rgb[2]).toString(16)
180 |     colour
181 | 
182 | pad = (str) ->
183 |   if str.length < 2 
184 |     return "0" + str
185 |   else
186 |     return str
187 | 
188 | #
189 | # Format some JSON for the Vision API call 
190 | #
191 | 
192 | getJson = (image, type) ->
193 |   return requestJson = {
194 |         requests: [
195 |           {
196 |             image:
197 |               content: image
198 |             features: [ 
199 |               {
200 |                 type: type
201 |                 maxResults: 10
202 |               }
203 |             ]
204 |           }
205 |         ]
206 |       }


--------------------------------------------------------------------------------