├── .gitignore ├── Face Detection ├── Facial Recognition in R.Rmd ├── Facial Recognition in R._pub.html ├── Facial Recognition in R.md ├── Facial Recognition.Rmd ├── Facial_Recognition_in_R.html ├── LICENSE ├── ML-Image-Processing-R.Rproj ├── facialRecognition.py ├── haarcascade_eye.xml ├── haarcascade_frontalface_default.xml ├── imageFunctions.R ├── main.R ├── modifiedWebcamShot.png └── originalWebcamShot.png ├── Google Vision API ├── Google Vision API in R.Rmd ├── Google Vision API in R._pub.html ├── Google Vision API in R.md ├── Google Vision API.Rproj ├── Google_Vision_API_in_R.html ├── dog_mountain.jpg ├── figure │ ├── unnamed-chunk-10-1.png │ ├── unnamed-chunk-11-1.png │ ├── unnamed-chunk-14-1.png │ ├── unnamed-chunk-15-1.png │ ├── unnamed-chunk-17-1.png │ ├── unnamed-chunk-19-1.png │ ├── unnamed-chunk-199-1.png │ ├── unnamed-chunk-2-1.png │ ├── unnamed-chunk-4-1.png │ ├── unnamed-chunk-6-1.png │ └── unnamed-chunk-8-1.png ├── my_face.jpg ├── originalWebcamShot.jpg ├── snacks_logos.JPG ├── us_castle.jpg ├── us_castle_2.jpg ├── us_dog_mountain.jpg ├── us_hats.jpg └── wrigley_text.jpg ├── Microsoft Vision API ├── Microsoft Vision API.Rproj ├── R - Microsoft Vision API.Rmd ├── R_-_Microsoft_Vision_API.html ├── SnoozeGenius.jpg ├── df.rds └── sandbox.R └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # History files 2 | .Rhistory 3 | .Rapp.history 4 | 5 | # Session Data files 6 | .RData 7 | 8 | # Example code in package build process 9 | *-Ex.R 10 | 11 | # Output files from R CMD build 12 | /*.tar.gz 13 | 14 | # Output files from R CMD check 15 | /*.Rcheck/ 16 | 17 | # RStudio files 18 | .Rproj.user/ 19 | 20 | # produced vignettes 21 | vignettes/*.html 22 | vignettes/*.pdf 23 | 24 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 25 | .httr-oauth 26 | 27 | # knitr and R markdown default cache directories 28 | /*_cache/ 29 | /cache/ 30 | 31 | # Temporary files created by R markdown 32 | *.utf8.md 33 | *.knit.md 34 | .Rproj.user 35 | 36 | # Files with credentials 37 | *.csv 38 | *.json 39 | *.httr-oauth 40 | *.txt -------------------------------------------------------------------------------- /Face Detection/Facial Recognition in R.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Facial Recognition in R" 3 | author: "Scott Stoltzman" 4 | date: "6/22/2017" 5 | output: html_document 6 | --- 7 | 8 | ### Facial Recognition in R 9 | 10 | ![Original](originalWebcamShot.png) ![FaceDetection](modifiedWebcamShot.png) 11 | 12 | OpenCV is an incredibly powerful tool to have in your toolbox. I have had a lot of success using it in Python but very little success in R. I haven't done too much other than searching Google but it seems as if "imager" and "videoplayR" provide a lot of the functionality but not all of it. 13 | 14 | I have never actually called Python functions from R before. Initially, I tried the "rPython" library - that has a lot of advantages, but was completely unecessary for me so system() worked absolutly fine. While this example is extremely simple, it should help to illustrate how easy it is to utilize the power of Python from within R. 15 | 16 | Using videoplayR I created a function which would take a picture with my webcam and save it as "originalWebcamShot.png" 17 | 18 | **Note:** saving images and then loading them isn't very efficient but works in this case and is extremely easy to implement. It saves us from passing variables, functions, objects, and/or methods between R and Python in this case. 19 | 20 | I'll trace my steps backward through this post (I think it's easier to understand what's going on in this case). 21 | 22 | #### The main.R file: 23 | 24 | 1. Calls my user-defined function 25 | * Turns on the camera 26 | * Takes a picture 27 | * Saves it as "originalWebcamShot.png" 28 | 2. Runs the Python script 29 | * Loads the previously saved image 30 | * Loads the Haar Cascade algorithms 31 | * Detects faces and eyes 32 | * Draws colored rectangles around them 33 | * Saves the new image as "modifiedWebcamShot.png" 34 | 3. Reads new image into R 35 | 4. Displays both images 36 | 37 | 38 | ```{r mainCode,warning=FALSE,message=FALSE,eval=FALSE} 39 | source('imageFunctions.R') 40 | library("videoplayR") 41 | 42 | # Take a picture and save it 43 | img = webcamImage(rollFrames = 10, 44 | showImage = FALSE, 45 | saveImageToWD = 'originalWebcamShot.png') 46 | 47 | # Run Python script to detect faces, draw rectangles, return new image 48 | system('python3 facialRecognition.py') 49 | 50 | # Read in new image 51 | img.face = readImg("modifiedWebcamShot.png") 52 | 53 | # Display images 54 | imshow(img) 55 | imshow(img.face) 56 | ``` 57 | 58 | 59 | The user-defined function: 60 | 61 | 1. Function inputs 62 | * rollFrames is the number of pictures to take (allows the camera to adjust) 63 | * showImage gives the option to display the image 64 | * saveImageToWD saves the image generated to the current working directory 65 | 2. Turns the webcam on 66 | 3. Takes pictures (number of rollFrames) 67 | 4. Uses basic logic to determine to show images and/or save them 68 | 5. Returns the image 69 | 70 | 71 | 72 | ```{r imageFunctions, eval=FALSE} 73 | library("videoplayR") 74 | 75 | webcamImage = function(rollFrames = 4, showImage = FALSE, saveImageToWD = NA){ 76 | 77 | # rollFrames runs through multiple pictures - allows camera to adjust 78 | # showImage allows opportunity to display image within function 79 | 80 | # Turn on webcam 81 | stream = readStream(0) 82 | 83 | # Take pictures 84 | print("Video stream initiated.") 85 | for(i in seq(rollFrames)){ 86 | img = nextFrame(stream) 87 | } 88 | 89 | # Turn off camera 90 | release(stream) 91 | 92 | # Display image if requested 93 | if(showImage == TRUE){ 94 | imshow(img) 95 | } 96 | 97 | if(!is.na(saveImageToWD)){ 98 | fileName = paste(getwd(),"/",saveImageToWD,sep='') 99 | print(paste("Saving Image To: ",fileName, sep='')) 100 | writeImg(fileName,img) 101 | } 102 | 103 | return(img) 104 | 105 | } 106 | ``` 107 | 108 | 109 | The Python script: 110 | 111 | 1. Loads the algorithms from xml files 112 | 2. Loads the image from "originalWebcamShot.png" 113 | 3. Converts the image to grayscale 114 | 4. Runs the facial detection algorithm 115 | 5. Runs the eye detection algorithm (within the face) 116 | 6. Draws rectangles around the face and eyes (different colors) 117 | 7. Saves the new image as "modifiedWebcamShot.png" 118 | 119 | 120 | ```{python PythonScript, eval=FALSE} 121 | import numpy as np 122 | import cv2 123 | 124 | def main(): 125 | 126 | # I followed Harrison Kingsley's work for this 127 | # Much of the source code is found https://pythonprogramming.net/haar-cascade-face-eye-detection-python-opencv-tutorial/ 128 | 129 | face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') 130 | eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml') 131 | 132 | img = cv2.imread('originalWebcamShot.png') 133 | 134 | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 135 | faces = face_cascade.detectMultiScale(gray, 1.3, 5) 136 | 137 | for (x,y,w,h) in faces: 138 | cv2.rectangle(img,(x,y),(x+w,y+h),(0,0,255),2) 139 | roi_gray = gray[y:y+h, x:x+w] 140 | roi_color = img[y:y+h, x:x+w] 141 | 142 | eyes = eye_cascade.detectMultiScale(roi_gray) 143 | for (ex,ey,ew,eh) in eyes: 144 | cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2) 145 | 146 | cv2.imwrite('modifiedWebcamShot.png',img) 147 | 148 | if __name__ == '__main__': 149 | main() 150 | ``` 151 | 152 | The Python code was entirely based off of Harrison Kingsley's work: 153 | 154 | * @sentdex [Twitter](https://twitter.com/Sentdex) | [YouTube](https://www.youtube.com/sentdex) 155 | * Website: [PythonProgramming.net](https://pythonprogramming.net/haar-cascade-face-eye-detection-python-opencv-tutorial/) 156 | -------------------------------------------------------------------------------- /Face Detection/Facial Recognition in R.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Facial Recognition in R" 3 | author: "Scott Stoltzman" 4 | date: "6/22/2017" 5 | output: html_document 6 | --- 7 | 8 | ### Facial Recognition in R 9 | 10 | ![Original](originalWebcamShot.png) ![FaceDetection](modifiedWebcamShot.png) 11 | 12 | OpenCV is an incredibly powerful tool to have in your toolbox. I have had a lot of success using it in Python but very little success in R. I haven't done too much other than searching Google but it seems as if "imager" and "videoplayR" provide a lot of the functionality but not all of it. 13 | 14 | I have never actually called Python functions from R before. Initially, I tried the "rPython" library - that has a lot of advantages, but was completely unecessary for me so system() worked absolutly fine. While this example is extremely simple, it should help to illustrate how easy it is to utilize the power of Python from within R. 15 | 16 | Using videoplayR I created a function which would take a picture with my webcam and save it as "originalWebcamShot.png" 17 | 18 | **Note:** saving images and then loading them isn't very efficient but works in this case and is extremely easy to implement. It saves us from passing variables, functions, objects, and/or methods between R and Python in this case. 19 | 20 | I'll trace my steps backward through this post (I think it's easier to understand what's going on in this case). 21 | 22 | #### The main.R file: 23 | 24 | 1. Calls my user-defined function 25 | * Turns on the camera 26 | * Takes a picture 27 | * Saves it as "originalWebcamShot.png" 28 | 2. Runs the Python script 29 | * Loads the previously saved image 30 | * Loads the Haar Cascade algorithms 31 | * Detects faces and eyes 32 | * Draws colored rectangles around them 33 | * Saves the new image as "modifiedWebcamShot.png" 34 | 3. Reads new image into R 35 | 4. Displays both images 36 | 37 | 38 | 39 | ```r 40 | source('imageFunctions.R') 41 | library("videoplayR") 42 | 43 | # Take a picture and save it 44 | img = webcamImage(rollFrames = 10, 45 | showImage = FALSE, 46 | saveImageToWD = 'originalWebcamShot.png') 47 | 48 | # Run Python script to detect faces, draw rectangles, return new image 49 | system('python3 facialRecognition.py') 50 | 51 | # Read in new image 52 | img.face = readImg("modifiedWebcamShot.png") 53 | 54 | # Display images 55 | imshow(img) 56 | imshow(img.face) 57 | ``` 58 | 59 | 60 | The user-defined function: 61 | 62 | 1. Function inputs 63 | * rollFrames is the number of pictures to take (allows the camera to adjust) 64 | * showImage gives the option to display the image 65 | * saveImageToWD saves the image generated to the current working directory 66 | 2. Turns the webcam on 67 | 3. Takes pictures (number of rollFrames) 68 | 4. Uses basic logic to determine to show images and/or save them 69 | 5. Returns the image 70 | 71 | 72 | 73 | 74 | ```r 75 | library("videoplayR") 76 | 77 | webcamImage = function(rollFrames = 4, showImage = FALSE, saveImageToWD = NA){ 78 | 79 | # rollFrames runs through multiple pictures - allows camera to adjust 80 | # showImage allows opportunity to display image within function 81 | 82 | # Turn on webcam 83 | stream = readStream(0) 84 | 85 | # Take pictures 86 | print("Video stream initiated.") 87 | for(i in seq(rollFrames)){ 88 | img = nextFrame(stream) 89 | } 90 | 91 | # Turn off camera 92 | release(stream) 93 | 94 | # Display image if requested 95 | if(showImage == TRUE){ 96 | imshow(img) 97 | } 98 | 99 | if(!is.na(saveImageToWD)){ 100 | fileName = paste(getwd(),"/",saveImageToWD,sep='') 101 | print(paste("Saving Image To: ",fileName, sep='')) 102 | writeImg(fileName,img) 103 | } 104 | 105 | return(img) 106 | 107 | } 108 | ``` 109 | 110 | 111 | The Python script: 112 | 113 | 1. Loads the algorithms from xml files 114 | 2. Loads the image from "originalWebcamShot.png" 115 | 3. Converts the image to grayscale 116 | 4. Runs the facial detection algorithm 117 | 5. Runs the eye detection algorithm (within the face) 118 | 6. Draws rectangles around the face and eyes (different colors) 119 | 7. Saves the new image as "modifiedWebcamShot.png" 120 | 121 | 122 | 123 | ```python 124 | import numpy as np 125 | import cv2 126 | 127 | def main(): 128 | 129 | # I followed Harrison Kingsley's work for this 130 | # Much of the source code is found https://pythonprogramming.net/haar-cascade-face-eye-detection-python-opencv-tutorial/ 131 | 132 | face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') 133 | eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml') 134 | 135 | img = cv2.imread('originalWebcamShot.png') 136 | 137 | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 138 | faces = face_cascade.detectMultiScale(gray, 1.3, 5) 139 | 140 | for (x,y,w,h) in faces: 141 | cv2.rectangle(img,(x,y),(x+w,y+h),(0,0,255),2) 142 | roi_gray = gray[y:y+h, x:x+w] 143 | roi_color = img[y:y+h, x:x+w] 144 | 145 | eyes = eye_cascade.detectMultiScale(roi_gray) 146 | for (ex,ey,ew,eh) in eyes: 147 | cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2) 148 | 149 | cv2.imwrite('modifiedWebcamShot.png',img) 150 | 151 | if __name__ == '__main__': 152 | main() 153 | ``` 154 | 155 | The Python code was entirely based off of Harrison Kingsley's work: 156 | 157 | * @sentdex [Twitter](https://twitter.com/Sentdex) | [YouTube](https://www.youtube.com/sentdex) 158 | * Website: [PythonProgramming.net](https://pythonprogramming.net/haar-cascade-face-eye-detection-python-opencv-tutorial/) 159 | -------------------------------------------------------------------------------- /Face Detection/Facial Recognition.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Facial Recognition in R" 3 | author: "Scott Stoltzman" 4 | date: "6/22/2017" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | ## R Markdown 13 | 14 | This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see . 15 | 16 | When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: 17 | 18 | ```{r cars} 19 | summary(cars) 20 | ``` 21 | 22 | ## Including Plots 23 | 24 | You can also embed plots, for example: 25 | 26 | ```{r pressure, echo=FALSE} 27 | plot(pressure) 28 | ``` 29 | 30 | Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot. 31 | -------------------------------------------------------------------------------- /Face Detection/LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Scott Stoltzman 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Face Detection/ML-Image-Processing-R.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /Face Detection/facialRecognition.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | 4 | def main(): 5 | 6 | # I followed Harrison Kingsley's work for this 7 | # Much of the source code is found https://pythonprogramming.net/haar-cascade-face-eye-detection-python-opencv-tutorial/ 8 | 9 | face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') 10 | eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml') 11 | 12 | img = cv2.imread('originalWebcamShot.png') 13 | 14 | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 15 | faces = face_cascade.detectMultiScale(gray, 1.3, 5) 16 | 17 | for (x,y,w,h) in faces: 18 | cv2.rectangle(img,(x,y),(x+w,y+h),(0,0,255),2) 19 | roi_gray = gray[y:y+h, x:x+w] 20 | roi_color = img[y:y+h, x:x+w] 21 | 22 | eyes = eye_cascade.detectMultiScale(roi_gray) 23 | for (ex,ey,ew,eh) in eyes: 24 | cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2) 25 | 26 | cv2.imwrite('modifiedWebcamShot.png',img) 27 | 28 | if __name__ == '__main__': 29 | main() 30 | -------------------------------------------------------------------------------- /Face Detection/imageFunctions.R: -------------------------------------------------------------------------------- 1 | library("videoplayR") 2 | 3 | webcamImage = function(rollFrames = 4, showImage = FALSE, saveImageToWD = NA){ 4 | 5 | # rollFrames runs through multiple pictures - allows camera to adjust 6 | # showImage allows opportunity to display image within function 7 | 8 | # Turn on webcam 9 | stream = readStream(0) 10 | 11 | # Take pictures 12 | print("Video stream initiated.") 13 | for(i in seq(rollFrames)){ 14 | img = nextFrame(stream) 15 | } 16 | 17 | # Turn off camera 18 | release(stream) 19 | 20 | # Display image if requested 21 | if(showImage == TRUE){ 22 | imshow(img) 23 | } 24 | 25 | if(!is.na(saveImageToWD)){ 26 | fileName = paste(getwd(),"/",saveImageToWD,sep='') 27 | print(paste("Saving Image To: ",fileName, sep='')) 28 | writeImg(fileName,img) 29 | } 30 | 31 | return(img) 32 | 33 | } -------------------------------------------------------------------------------- /Face Detection/main.R: -------------------------------------------------------------------------------- 1 | source('imageFunctions.R') 2 | library("videoplayR") 3 | 4 | # Take a picture and save it 5 | img = webcamImage(rollFrames = 10, 6 | showImage = FALSE, 7 | saveImageToWD = 'originalWebcamShot.png') 8 | 9 | # Run Python script to detect faces, draw rectangles, return new image 10 | system('python3 facialRecognition.py') 11 | 12 | # Read in new image 13 | img.face = readImg("modifiedWebcamShot.png") 14 | 15 | # Display images 16 | imshow(img) 17 | imshow(img.face) -------------------------------------------------------------------------------- /Face Detection/modifiedWebcamShot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Face Detection/modifiedWebcamShot.png -------------------------------------------------------------------------------- /Face Detection/originalWebcamShot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Face Detection/originalWebcamShot.png -------------------------------------------------------------------------------- /Google Vision API/Google Vision API in R.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Google Vision API in R" 3 | author: "Scott Stoltzman" 4 | date: "7/29/2017" 5 | output: html_document 6 | --- 7 | 8 | ## Using the Google Vision API in R 9 | 10 | ### Utilizing RoogleVision 11 | 12 | After doing my post last month on OpenCV and face detection, I started looking into other algorithms used for pattern detection in images. As it turns out, Google has done a phenomenal job with their Vision API. It's absolutely incredible the amount of information it can spit back to you by simply sending it a picture. 13 | 14 | Also, it's 100% free! I believe that includes 1000 images per month. Amazing! 15 | 16 | In this post I'm going to walk you through the absolute basics of accessing the power of the Google Vision API using the RoogleVision package in R. 17 | 18 | As always, we'll start off loading some libraries. I wrote some extra notation around where you can install them within the code. 19 | 20 | ```{r setup, message=FALSE, warning=FALSE} 21 | # Normal Libraries 22 | library(tidyverse) 23 | 24 | # devtools::install_github("flovv/RoogleVision") 25 | library(RoogleVision) 26 | library(jsonlite) # to import credentials 27 | 28 | # For image processing 29 | # source("http://bioconductor.org/biocLite.R") 30 | # biocLite("EBImage") 31 | library(EBImage) 32 | 33 | # For Latitude Longitude Map 34 | library(leaflet) 35 | ``` 36 | 37 | #### Google Authentication 38 | 39 | In order to use the API, you have to authenticate. There is plenty of documentation out there about how to setup an account, create a project, download credentials, etc. Head over to [Google Cloud Console](https://console.cloud.google.com) if you don't have an account already. 40 | 41 | ```{r} 42 | # Credentials file I downloaded from the cloud console 43 | creds = fromJSON('credentials.json') 44 | 45 | # Google Authentication - Use Your Credentials 46 | # options("googleAuthR.client_id" = "xxx.apps.googleusercontent.com") 47 | # options("googleAuthR.client_secret" = "") 48 | 49 | options("googleAuthR.client_id" = creds$installed$client_id) 50 | options("googleAuthR.client_secret" = creds$installed$client_secret) 51 | options("googleAuthR.scopes.selected" = c("https://www.googleapis.com/auth/cloud-platform")) 52 | googleAuthR::gar_auth() 53 | ``` 54 | 55 | 56 | ### Now You're Ready to Go 57 | 58 | The function getGoogleVisionResponse takes three arguments: 59 | 60 | 1. imagePath 61 | 2. feature 62 | 3. numResults 63 | 64 | Numbers 1 and 3 are self-explanatory, "feature" has 5 options: 65 | 66 | * LABEL_DETECTION 67 | * LANDMARK_DETECTION 68 | * FACE_DETECTION 69 | * LOGO_DETECTION 70 | * TEXT_DETECTION 71 | 72 | These are self-explanatory but it's nice to see each one in action. 73 | 74 | As a side note: there are also other features that the API has which aren't included (yet) in the RoogleVision package such as "Safe Search" which identifies inappropriate content, "Properties" which identifies dominant colors and aspect ratios and a few others can be found at the [Cloud Vision website](https://cloud.google.com/vision/) 75 | 76 | ---- 77 | 78 | #### Label Detection 79 | 80 | This is used to help determine content within the photo. It can basically add a level of metadata around the image. 81 | 82 | Here is a photo of our dog when we hiked up to Audubon Peak in Colorado: 83 | 84 | ```{r echo=FALSE} 85 | dog_mountain_image <- readImage('dog_mountain.jpg') 86 | plot(dog_mountain_image) 87 | ``` 88 | 89 | 90 | 91 | ```{r} 92 | dog_mountain_label = getGoogleVisionResponse('dog_mountain.jpg', 93 | feature = 'LABEL_DETECTION') 94 | head(dog_mountain_label) 95 | ``` 96 | 97 | 98 | All 5 responses were incredibly accurate! The "score" that is returned is how confident the Google Vision algorithms are, so there's a 91.9% chance a mountain is prominent in this photo. I like "dog hiking" the best - considering that's what we were doing at the time. Kind of a little bit too accurate... 99 | 100 | ---- 101 | 102 | #### Landmark Detection 103 | 104 | This is a feature designed to specifically pick out a recognizable landmark! It provides the position in the image along with the geolocation of the landmark (in longitude and latitude). 105 | 106 | My wife and I took this selfie in at the Linderhof Castle in Bavaria, Germany. 107 | 108 | ```{r} 109 | us_castle <- readImage('us_castle_2.jpg') 110 | plot(us_castle) 111 | ``` 112 | 113 | 114 | The response from the Google Vision API was spot on. It returned "Linderhof Palace" as the description. It also provided a score (I reduced the resolution of the image which hurt the score), a boundingPoly field and locations. 115 | 116 | * Bounding Poly - gives x,y coordinates for a polygon around the landmark in the image 117 | * Locations - provides longitude,latitude coordinates 118 | 119 | ```{r} 120 | us_landmark = getGoogleVisionResponse('us_castle_2.jpg', 121 | feature = 'LANDMARK_DETECTION') 122 | head(us_landmark) 123 | ``` 124 | 125 | I plotted the polygon over the image using the coordinates returned. It does a great job (certainly not perfect) of getting the castle identified. It's a bit tough to say what the actual "landmark" would be in this case due to the fact the fountains, stairs and grounds are certainly important and are a key part of the castle. 126 | 127 | ```{r} 128 | us_castle <- readImage('us_castle_2.jpg') 129 | plot(us_castle) 130 | xs = us_landmark$boundingPoly$vertices[[1]][1][[1]] 131 | ys = us_landmark$boundingPoly$vertices[[1]][2][[1]] 132 | polygon(x=xs,y=ys,border='red',lwd=4) 133 | ``` 134 | 135 | 136 | Turning to the locations - I plotted this using the leaflet library. If you haven't used leaflet, start doing so immediately. I'm a huge fan of it due to speed and simplicity. There are a lot of customization options available as well that you can check out. 137 | 138 | The location = spot on! While it isn't a shock to me that Google could provide the location of "Linderhof Castle" - it is amazing to me that I don't have to write a web crawler search function to find it myself! That's just one of many little luxuries they have built into this API. 139 | 140 | ```{r} 141 | latt = us_landmark$locations[[1]][[1]][[1]] 142 | lon = us_landmark$locations[[1]][[1]][[2]] 143 | m = leaflet() %>% 144 | addProviderTiles(providers$CartoDB.Positron) %>% 145 | setView(lng = lon, lat = latt, zoom = 5) %>% 146 | addMarkers(lng = lon, lat = latt) 147 | m 148 | ``` 149 | 150 | 151 | ---- 152 | 153 | #### Face Detection 154 | 155 | My last blog post showed the OpenCV package utilizing the haar cascade algorithm in action. I didn't dig into Google's algorithms to figure out what is under the hood, but it provides similar results. However, rather than layering in each subsequent "find the eyes" and "find the mouth" and ...etc... it returns more than you ever needed to know. 156 | 157 | * Bounding Poly = highest level polygon 158 | * FD Bounding Poly = polygon surrounding each face 159 | * Landmarks = (funny name) includes each feature of the face (left eye, right eye, etc.) 160 | * Roll Angle, Pan Angle, Tilt Angle = all of the different angles you'd need per face 161 | * Confidence (detection and landmarking) = how certain the algorithm is that it's accurate 162 | * Joy, sorrow, anger, surprise, under exposed, blurred, headwear likelihoods = how likely it is that each face contains that emotion or characteristic 163 | 164 | The likelihoods is another amazing piece of information returned! I have run about 20 images through this API and every single one has been accurate - very impressive! 165 | 166 | I wanted to showcase the face detection and headwear first. Here's a picture of my wife and I at "The Bean" in Chicago (side note: it's awesome! I thought it was going to be really silly, but you can really have a lot of fun with all of the angles and reflections): 167 | 168 | ```{r} 169 | us_hats_pic <- readImage('us_hats.jpg') 170 | plot(us_hats_pic) 171 | ``` 172 | 173 | 174 | ```{r} 175 | us_hats = getGoogleVisionResponse('us_hats.jpg', 176 | feature = 'FACE_DETECTION') 177 | head(us_hats) 178 | ``` 179 | 180 | 181 | ```{r} 182 | us_hats_pic <- readImage('us_hats.jpg') 183 | plot(us_hats_pic) 184 | 185 | xs1 = us_hats$fdBoundingPoly$vertices[[1]][1][[1]] 186 | ys1 = us_hats$fdBoundingPoly$vertices[[1]][2][[1]] 187 | 188 | xs2 = us_hats$fdBoundingPoly$vertices[[2]][1][[1]] 189 | ys2 = us_hats$fdBoundingPoly$vertices[[2]][2][[1]] 190 | 191 | polygon(x=xs1,y=ys1,border='red',lwd=4) 192 | polygon(x=xs2,y=ys2,border='green',lwd=4) 193 | ``` 194 | 195 | 196 | Here's a shot that should be familiar (copied directly from my last blog) - and I wanted to highlight the different features that can be detected. Look at how many points are perfectly placed: 197 | 198 | ```{r} 199 | my_face_pic <- readImage('my_face.jpg') 200 | plot(my_face_pic) 201 | ``` 202 | 203 | 204 | 205 | ```{r} 206 | my_face = getGoogleVisionResponse('my_face.jpg', 207 | feature = 'FACE_DETECTION') 208 | head(my_face) 209 | ``` 210 | 211 | 212 | 213 | ```{r} 214 | head(my_face$landmarks) 215 | ``` 216 | 217 | 218 | 219 | ```{r} 220 | my_face_pic <- readImage('my_face.jpg') 221 | plot(my_face_pic) 222 | 223 | xs1 = my_face$fdBoundingPoly$vertices[[1]][1][[1]] 224 | ys1 = my_face$fdBoundingPoly$vertices[[1]][2][[1]] 225 | 226 | xs2 = my_face$landmarks[[1]][[2]][[1]] 227 | ys2 = my_face$landmarks[[1]][[2]][[2]] 228 | 229 | polygon(x=xs1,y=ys1,border='red',lwd=4) 230 | points(x=xs2,y=ys2,lwd=2, col='lightblue') 231 | ``` 232 | 233 | ---- 234 | 235 | #### Logo Detection 236 | 237 | To continue along the Chicago trip, we drove by Wrigley field and I took a really bad photo of the sign from a moving car as it was under construction. It's nice because it has a lot of different lines and writing the Toyota logo isn't incredibly prominent or necessarily fit to brand colors. 238 | 239 | This call returns: 240 | 241 | * Description = Brand name of the logo detected 242 | * Score = Confidence of prediction accuracy 243 | * Bounding Poly = (Again) coordinates of the logo 244 | 245 | 246 | ```{r} 247 | wrigley_image <- readImage('wrigley_text.jpg') 248 | plot(wrigley_image) 249 | ``` 250 | 251 | 252 | 253 | ```{r} 254 | wrigley_logo = getGoogleVisionResponse('wrigley_text.jpg', 255 | feature = 'LOGO_DETECTION') 256 | head(wrigley_logo) 257 | ``` 258 | 259 | 260 | ```{r} 261 | wrigley_image <- readImage('wrigley_text.jpg') 262 | plot(wrigley_image) 263 | xs = wrigley_logo$boundingPoly$vertices[[1]][[1]] 264 | ys = wrigley_logo$boundingPoly$vertices[[1]][[2]] 265 | polygon(x=xs,y=ys,border='green',lwd=4) 266 | ``` 267 | 268 | ---- 269 | 270 | #### Text Detection 271 | 272 | I'll continue using the Wrigley Field picture. There is text all over the place and it's fun to see what is captured and what isn't. It appears as if the curved text at the top "field" isn't easily interpreted as text. However, the rest is caught and the words are captured. 273 | 274 | The response sent back is a bit more difficult to interpret than the rest of the API calls - it breaks things apart by word but also returns everything as one line. Here's what comes back: 275 | 276 | * Locale = language, returned as source 277 | * Description = the text (the first line is everything, and then the rest are indiviudal words) 278 | * Bounding Poly = I'm sure you can guess by now 279 | 280 | ```{r} 281 | wrigley_text = getGoogleVisionResponse('wrigley_text.jpg', 282 | feature = 'TEXT_DETECTION') 283 | head(wrigley_text) 284 | ``` 285 | 286 | ```{r} 287 | wrigley_image <- readImage('wrigley_text.jpg') 288 | plot(wrigley_image) 289 | 290 | for(i in 1:length(wrigley_text$boundingPoly$vertices)){ 291 | xs = wrigley_text$boundingPoly$vertices[[i]]$x 292 | ys = wrigley_text$boundingPoly$vertices[[i]]$y 293 | polygon(x=xs,y=ys,border='green',lwd=2) 294 | } 295 | ``` 296 | 297 | ---- 298 | 299 | That's about it for the basics of using the Google Vision API with the RoogleVision library. I highly recommend tinkering around with it a bit, especially because it won't cost you a dime. 300 | 301 | While I do enjoy the math under the hood and the thinking required to understand alrgorithms, I do think these sorts of API's will become the way of the future for data science. Outside of specific use cases or special industries, it seems hard to imagine wanting to try and create algorithms that would be better than ones created for mass consumption. As long as they're fast, free and accurate, I'm all about making my life easier! From the hiring perspective, I much prefer someone who can get the job done over someone who can slightly improve performance (as always, there are many cases where this doesn't apply). 302 | 303 | Please comment if you are utilizing any of the Google API's for business purposes, I would love to hear it! 304 | 305 | As always you can find this on my [GitHub](https://github.com/stoltzmaniac/ML-Image-Processing-R) 306 | 307 | 308 | 309 | 310 | -------------------------------------------------------------------------------- /Google Vision API/Google Vision API in R._pub.html: -------------------------------------------------------------------------------- 1 |

Using the Google Vision API in R

2 | 3 |

Utilizing RoogleVision

4 | 5 |

After doing my post last month on OpenCV and face detection, I started looking into other algorithms used for pattern detection in images. As it turns out, Google has done a phenomenal job with their Vision API. It's absolutely incredible the amount of information it can spit back to you by simply sending it a picture.

6 | 7 |

Also, it's 100% free! I believe that includes 1000 images per month. Amazing!

8 | 9 |

In this post I'm going to walk you through the absolute basics of accessing the power of the Google Vision API using the RoogleVision package in R.

10 | 11 |

As always, we'll start off loading some libraries. I wrote some extra notation around where you can install them within the code.

12 | 13 |
# Normal Libraries
 14 | library(tidyverse)
 15 | 
 16 | # devtools::install_github("flovv/RoogleVision")
 17 | library(RoogleVision)
 18 | library(jsonlite) # to import credentials
 19 | 
 20 | # For image processing
 21 | # source("http://bioconductor.org/biocLite.R")
 22 | # biocLite("EBImage")
 23 | library(EBImage)
 24 | 
 25 | # For Latitude Longitude Map
 26 | library(leaflet)
 27 | 
28 | 29 |

Google Authentication

30 | 31 |

In order to use the API, you have to authenticate. There is plenty of documentation out there about how to setup an account, create a project, download credentials, etc. Head over to Google Cloud Console if you don't have an account already.

32 | 33 |
# Credentials file I downloaded from the cloud console
 34 | creds = fromJSON('credentials.json')
 35 | 
 36 | # Google Authentication - Use Your Credentials
 37 | # options("googleAuthR.client_id" = "xxx.apps.googleusercontent.com")
 38 | # options("googleAuthR.client_secret" = "")
 39 | 
 40 | options("googleAuthR.client_id" = creds$installed$client_id)
 41 | options("googleAuthR.client_secret" = creds$installed$client_secret)
 42 | options("googleAuthR.scopes.selected" = c("https://www.googleapis.com/auth/cloud-platform"))
 43 | googleAuthR::gar_auth()
 44 | 
45 | 46 |
## 2017-07-31 11:30:34> Token cache file: .httr-oauth
 47 | 
48 | 49 |
## 2017-07-31 11:30:34> Scopes: https://www.googleapis.com/auth/cloud-platform
 50 | 
51 | 52 |

Now You're Ready to Go

53 | 54 |

The function getGoogleVisionResponse takes three arguments:

55 | 56 |
    57 |
  1. imagePath
  2. 58 |
  3. feature
  4. 59 |
  5. numResults
  6. 60 |
61 | 62 |

Numbers 1 and 3 are self-explanatory, “feature” has 5 options:

63 | 64 | 71 | 72 |

These are self-explanatory but it's nice to see each one in action.

73 | 74 |

As a side note: there are also other features that the API has which aren't included (yet) in the RoogleVision package such as “Safe Search” which identifies inappropriate content, “Properties” which identifies dominant colors and aspect ratios and a few others can be found at the Cloud Vision website

75 | 76 |
77 | 78 |

Label Detection

79 | 80 |

This is used to help determine content within the photo. It can basically add a level of metadata around the image.

81 | 82 |

Here is a photo of our dog when we hiked up to Audubon Peak in Colorado:

83 | 84 |

plot of chunk unnamed-chunk-2

85 | 86 |
dog_mountain_label = getGoogleVisionResponse('dog_mountain.jpg',
 87 |                                               feature = 'LABEL_DETECTION')
 88 | head(dog_mountain_label)
 89 | 
90 | 91 |
##            mid           description     score
 92 | ## 1     /m/09d_r              mountain 0.9188690
 93 | ## 2 /g/11jxkqbpp mountainous landforms 0.9009549
 94 | ## 3    /m/023bbt            wilderness 0.8733696
 95 | ## 4     /m/0kpmf             dog breed 0.8398435
 96 | ## 5    /m/0d4djn            dog hiking 0.8352048
 97 | 
98 | 99 |

All 5 responses were incredibly accurate! The “score” that is returned is how confident the Google Vision algorithms are, so there's a 91.9% chance a mountain is prominent in this photo. I like “dog hiking” the best - considering that's what we were doing at the time. Kind of a little bit too accurate…

100 | 101 |
102 | 103 |

Landmark Detection

104 | 105 |

This is a feature designed to specifically pick out a recognizable landmark! It provides the position in the image along with the geolocation of the landmark (in longitude and latitude).

106 | 107 |

My wife and I took this selfie in at the Linderhof Castle in Bavaria, Germany.

108 | 109 |
us_castle <- readImage('us_castle_2.jpg')
110 | plot(us_castle)
111 | 
112 | 113 |

plot of chunk unnamed-chunk-4

114 | 115 |

The response from the Google Vision API was spot on. It returned “Linderhof Palace” as the description. It also provided a score (I reduced the resolution of the image which hurt the score), a boundingPoly field and locations.

116 | 117 | 121 | 122 |
us_landmark = getGoogleVisionResponse('us_castle_2.jpg',
123 |                                       feature = 'LANDMARK_DETECTION')
124 | head(us_landmark)
125 | 
126 | 127 |
##         mid      description     score
128 | ## 1 /m/066h19 Linderhof Palace 0.4665011
129 | ##                               vertices          locations
130 | ## 1 25, 382, 382, 25, 178, 178, 659, 659 47.57127, 10.96072
131 | 
132 | 133 |

I plotted the polygon over the image using the coordinates returned. It does a great job (certainly not perfect) of getting the castle identified. It's a bit tough to say what the actual “landmark” would be in this case due to the fact the fountains, stairs and grounds are certainly important and are a key part of the castle.

134 | 135 |
us_castle <- readImage('us_castle_2.jpg')
136 | plot(us_castle)
137 | xs = us_landmark$boundingPoly$vertices[[1]][1][[1]]
138 | ys = us_landmark$boundingPoly$vertices[[1]][2][[1]]
139 | polygon(x=xs,y=ys,border='red',lwd=4)
140 | 
141 | 142 |

plot of chunk unnamed-chunk-6

143 | 144 |

Turning to the locations - I plotted this using the leaflet library. If you haven't used leaflet, start doing so immediately. I'm a huge fan of it due to speed and simplicity. There are a lot of customization options available as well that you can check out.

145 | 146 |

The location = spot on! While it isn't a shock to me that Google could provide the location of “Linderhof Castle” - it is amazing to me that I don't have to write a web crawler search function to find it myself! That's just one of many little luxuries they have built into this API.

147 | 148 |
latt = us_landmark$locations[[1]][[1]][[1]]
149 | lon = us_landmark$locations[[1]][[1]][[2]]
150 | m = leaflet() %>%
151 |   addProviderTiles(providers$CartoDB.Positron) %>%
152 |   setView(lng = lon, lat = latt, zoom = 5) %>%
153 |   addMarkers(lng = lon, lat = latt)
154 | m
155 | 
156 | 157 |
## Error in loadNamespace(name): there is no package called 'webshot'
158 | 
159 | 160 |
161 | 162 |

Face Detection

163 | 164 |

My last blog post showed the OpenCV package utilizing the haar cascade algorithm in action. I didn't dig into Google's algorithms to figure out what is under the hood, but it provides similar results. However, rather than layering in each subsequent “find the eyes” and “find the mouth” and …etc… it returns more than you ever needed to know.

165 | 166 | 174 | 175 |

The likelihoods is another amazing piece of information returned! I have run about 20 images through this API and every single one has been accurate - very impressive!

176 | 177 |

I wanted to showcase the face detection and headwear first. Here's a picture of my wife and I at “The Bean” in Chicago (side note: it's awesome! I thought it was going to be really silly, but you can really have a lot of fun with all of the angles and reflections):

178 | 179 |
us_hats_pic <- readImage('us_hats.jpg')
180 | plot(us_hats_pic)
181 | 
182 | 183 |

plot of chunk unnamed-chunk-8

184 | 185 |
us_hats = getGoogleVisionResponse('us_hats.jpg',
186 |                                       feature = 'FACE_DETECTION')
187 | head(us_hats)
188 | 
189 | 190 |
##                                 vertices
191 | ## 1 295, 410, 410, 295, 164, 164, 297, 297
192 | ## 2 353, 455, 455, 353, 261, 261, 381, 381
193 | ##                                 vertices
194 | ## 1 327, 402, 402, 327, 206, 206, 280, 280
195 | ## 2 368, 439, 439, 368, 298, 298, 370, 370
196 | ##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           landmarks
197 | ## 1 LEFT_EYE, RIGHT_EYE, LEFT_OF_LEFT_EYEBROW, RIGHT_OF_LEFT_EYEBROW, LEFT_OF_RIGHT_EYEBROW, RIGHT_OF_RIGHT_EYEBROW, MIDPOINT_BETWEEN_EYES, NOSE_TIP, UPPER_LIP, LOWER_LIP, MOUTH_LEFT, MOUTH_RIGHT, MOUTH_CENTER, NOSE_BOTTOM_RIGHT, NOSE_BOTTOM_LEFT, NOSE_BOTTOM_CENTER, LEFT_EYE_TOP_BOUNDARY, LEFT_EYE_RIGHT_CORNER, LEFT_EYE_BOTTOM_BOUNDARY, LEFT_EYE_LEFT_CORNER, LEFT_EYE_PUPIL, RIGHT_EYE_TOP_BOUNDARY, RIGHT_EYE_RIGHT_CORNER, RIGHT_EYE_BOTTOM_BOUNDARY, RIGHT_EYE_LEFT_CORNER, RIGHT_EYE_PUPIL, LEFT_EYEBROW_UPPER_MIDPOINT, RIGHT_EYEBROW_UPPER_MIDPOINT, LEFT_EAR_TRAGION, RIGHT_EAR_TRAGION, FOREHEAD_GLABELLA, CHIN_GNATHION, CHIN_LEFT_GONION, CHIN_RIGHT_GONION, 352.00974, 380.68124, 340.27664, 363.16348, 378.64938, 393.6553, 370.78906, 371.99802, 366.30664, 364.23642, 349.47012, 377.17905, 364.7603, 375.62842, 357.7237, 367.20822, 352.4306, 358.9425, 351.23474, 343.64124, 351.10004, 384.32953, 388.21667, 382.08743, 375.90262, 383.87732, 353.08627, 387.7416, 312.06622, 384.56946, 371.5381, 360.62714, 318.48486, 383.87354, 225.86505, 229.50423, 216.51169, 219.28635, 220.92139, 222.02762, 227.35771, 248.94884, 259.51044, 272.9798, 261.36096, 263.89874, 265.76828, 251.38408, 248.46135, 253.8837, 223.93387, 227.20102, 228.33765, 225.31805, 226.13412, 227.22661, 229.89023, 231.8548, 229.34843, 229.54358, 213.85588, 217.43123, 236.95158, 244.45172, 219.76247, 287.00592, 260.99124, 267.68896, -0.0009269835, 12.904515, -2.3585303, -3.3569832, 3.4166863, 20.891703, -0.10083569, -8.568332, -0.32282636, 2.7426949, 3.1502135, 14.79839, 2.401884, 8.115268, 0.19641992, -0.7506992, -2.5084567, 2.8466656, -0.38294473, -0.05908208, -1.2792722, 11.411656, 19.373985, 13.421982, 10.900102, 12.992137, -5.2635217, 9.859322, 31.33588, 62.9466, -1.213793, 7.9232774, 20.887934, 49.40408
198 | ## 2 LEFT_EYE, RIGHT_EYE, LEFT_OF_LEFT_EYEBROW, RIGHT_OF_LEFT_EYEBROW, LEFT_OF_RIGHT_EYEBROW, RIGHT_OF_RIGHT_EYEBROW, MIDPOINT_BETWEEN_EYES, NOSE_TIP, UPPER_LIP, LOWER_LIP, MOUTH_LEFT, MOUTH_RIGHT, MOUTH_CENTER, NOSE_BOTTOM_RIGHT, NOSE_BOTTOM_LEFT, NOSE_BOTTOM_CENTER, LEFT_EYE_TOP_BOUNDARY, LEFT_EYE_RIGHT_CORNER, LEFT_EYE_BOTTOM_BOUNDARY, LEFT_EYE_LEFT_CORNER, LEFT_EYE_PUPIL, RIGHT_EYE_TOP_BOUNDARY, RIGHT_EYE_RIGHT_CORNER, RIGHT_EYE_BOTTOM_BOUNDARY, RIGHT_EYE_LEFT_CORNER, RIGHT_EYE_PUPIL, LEFT_EYEBROW_UPPER_MIDPOINT, RIGHT_EYEBROW_UPPER_MIDPOINT, LEFT_EAR_TRAGION, RIGHT_EAR_TRAGION, FOREHEAD_GLABELLA, CHIN_GNATHION, CHIN_LEFT_GONION, CHIN_RIGHT_GONION, 389.67215, 419.01474, 378.68497, 397.29074, 411.57373, 430.68024, 404.34882, 402.9257, 402.77734, 402.28552, 388.3598, 418.2969, 402.50266, 411.8417, 394.88547, 403.11188, 388.78043, 395.5202, 389.06342, 382.62268, 388.21332, 419.86707, 425.98645, 419.21088, 413.45447, 420.1578, 387.80508, 421.56183, 369.29388, 439.9703, 404.44498, 401.90457, 371.50647, 435.39258, 319.11594, 320.14157, 310.8753, 313.32437, 313.59402, 313.3107, 318.78964, 337.8581, 347.607, 360.56134, 350.5411, 351.10315, 354.41702, 339.4301, 339.11786, 342.46072, 317.141, 320.19537, 321.22644, 318.473, 319.0922, 318.58655, 320.64886, 322.44992, 320.4701, 320.58286, 308.29236, 309.85825, 328.25885, 331.54816, 312.85385, 372.5355, 349.82388, 352.82462, 0.00018637085, -0.63476753, 2.1497552, -7.1008844, -7.460493, 1.0756116, -7.027477, -13.670173, -5.1229305, -1.0671108, 4.793461, 4.4603314, -1.8998832, -1.677745, -1.2933732, -5.9320116, -2.3477247, 0.03832738, 0.0054741018, 2.9924247, -0.7714207, -2.9816942, 2.079318, -0.6419869, -0.3527427, -1.4552351, -5.0709085, -5.7559977, 40.608036, 39.10855, -8.547456, 4.8426514, 30.500828, 29.191824
199 | ##   rollAngle panAngle tiltAngle detectionConfidence landmarkingConfidence
200 | ## 1  7.103324 23.46835 -2.816312           0.9877176             0.7072066
201 | ## 2  2.510939 -1.17956 -7.393063           0.9997375             0.7268016
202 | ##   joyLikelihood sorrowLikelihood angerLikelihood surpriseLikelihood
203 | ## 1   VERY_LIKELY    VERY_UNLIKELY   VERY_UNLIKELY      VERY_UNLIKELY
204 | ## 2   VERY_LIKELY    VERY_UNLIKELY   VERY_UNLIKELY      VERY_UNLIKELY
205 | ##   underExposedLikelihood blurredLikelihood headwearLikelihood
206 | ## 1          VERY_UNLIKELY     VERY_UNLIKELY        VERY_LIKELY
207 | ## 2          VERY_UNLIKELY     VERY_UNLIKELY        VERY_LIKELY
208 | 
209 | 210 |
us_hats_pic <- readImage('us_hats.jpg')
211 | plot(us_hats_pic)
212 | 
213 | xs1 = us_hats$fdBoundingPoly$vertices[[1]][1][[1]]
214 | ys1 = us_hats$fdBoundingPoly$vertices[[1]][2][[1]]
215 | 
216 | xs2 = us_hats$fdBoundingPoly$vertices[[2]][1][[1]]
217 | ys2 = us_hats$fdBoundingPoly$vertices[[2]][2][[1]]
218 | 
219 | polygon(x=xs1,y=ys1,border='red',lwd=4)
220 | polygon(x=xs2,y=ys2,border='green',lwd=4)
221 | 
222 | 223 |

plot of chunk unnamed-chunk-10

224 | 225 |

Here's a shot that should be familiar (copied directly from my last blog) - and I wanted to highlight the different features that can be detected. Look at how many points are perfectly placed:

226 | 227 |
my_face_pic <- readImage('my_face.jpg')
228 | plot(my_face_pic)
229 | 
230 | 231 |

plot of chunk unnamed-chunk-11

232 | 233 |
my_face = getGoogleVisionResponse('my_face.jpg',
234 |                                       feature = 'FACE_DETECTION')
235 | head(my_face)
236 | 
237 | 238 |
##                               vertices
239 | ## 1 456, 877, 877, 456, NA, NA, 473, 473
240 | ##                               vertices
241 | ## 1 515, 813, 813, 515, 98, 98, 395, 395
242 | ##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             landmarks
243 | ## 1 LEFT_EYE, RIGHT_EYE, LEFT_OF_LEFT_EYEBROW, RIGHT_OF_LEFT_EYEBROW, LEFT_OF_RIGHT_EYEBROW, RIGHT_OF_RIGHT_EYEBROW, MIDPOINT_BETWEEN_EYES, NOSE_TIP, UPPER_LIP, LOWER_LIP, MOUTH_LEFT, MOUTH_RIGHT, MOUTH_CENTER, NOSE_BOTTOM_RIGHT, NOSE_BOTTOM_LEFT, NOSE_BOTTOM_CENTER, LEFT_EYE_TOP_BOUNDARY, LEFT_EYE_RIGHT_CORNER, LEFT_EYE_BOTTOM_BOUNDARY, LEFT_EYE_LEFT_CORNER, LEFT_EYE_PUPIL, RIGHT_EYE_TOP_BOUNDARY, RIGHT_EYE_RIGHT_CORNER, RIGHT_EYE_BOTTOM_BOUNDARY, RIGHT_EYE_LEFT_CORNER, RIGHT_EYE_PUPIL, LEFT_EYEBROW_UPPER_MIDPOINT, RIGHT_EYEBROW_UPPER_MIDPOINT, LEFT_EAR_TRAGION, RIGHT_EAR_TRAGION, FOREHEAD_GLABELLA, CHIN_GNATHION, CHIN_LEFT_GONION, CHIN_RIGHT_GONION, 598.7636, 723.16125, 556.1954, 628.8224, 693.0257, 767.7514, 661.2344, 661.9072, 662.7698, 662.2978, 603.21814, 722.5995, 662.66486, 700.5242, 626.14417, 663.0441, 597.7986, 624.5084, 597.13776, 572.32404, 596.0174, 725.61145, 751.531, 725.60315, 701.6699, 727.3262, 591.74457, 730.4487, 525.0554, 814.0723, 660.71075, 664.25146, 536.8293, 798.8593, 192.19489, 192.49554, 165.28363, 159.90292, 160.66797, 164.28062, 185.05746, 260.90063, 310.77585, 348.6693, 322.57773, 317.35153, 325.9983, 274.28345, 275.32834, 284.49515, 185.37177, 194.59952, 203.61258, 197.56845, 194.79561, 183.56104, 195.62381, 203.60477, 194.94687, 193.0094, 147.70262, 145.74747, 276.10037, 270.00323, 158.95798, 409.86185, 350.27, 346.58624, -0.0018592946, -4.8054757, 15.825399, -23.345352, -25.614508, 7.637372, -29.068363, -74.15371, -48.44018, -43.53211, -10.572805, -14.504428, -40.966953, -26.340576, -23.933197, -46.457916, -8.027897, -0.8318569, -2.181139, 12.514983, -3.5412567, -12.764345, 5.530805, -7.038474, -3.6184528, -8.517615, -10.674338, -15.85011, 152.71716, 142.93324, -29.311995, -31.410963, 93.14353, 83.41843
244 | ##    rollAngle  panAngle tiltAngle detectionConfidence landmarkingConfidence
245 | ## 1 -0.6375801 -2.120439  5.706552            0.996818             0.8222974
246 | ##   joyLikelihood sorrowLikelihood angerLikelihood surpriseLikelihood
247 | ## 1   VERY_LIKELY    VERY_UNLIKELY   VERY_UNLIKELY      VERY_UNLIKELY
248 | ##   underExposedLikelihood blurredLikelihood headwearLikelihood
249 | ## 1          VERY_UNLIKELY     VERY_UNLIKELY      VERY_UNLIKELY
250 | 
251 | 252 |
head(my_face$landmarks)
253 | 
254 | 255 |
## [[1]]
256 | ##                            type position.x position.y    position.z
257 | ## 1                      LEFT_EYE   598.7636   192.1949  -0.001859295
258 | ## 2                     RIGHT_EYE   723.1612   192.4955  -4.805475700
259 | ## 3          LEFT_OF_LEFT_EYEBROW   556.1954   165.2836  15.825399000
260 | ## 4         RIGHT_OF_LEFT_EYEBROW   628.8224   159.9029 -23.345352000
261 | ## 5         LEFT_OF_RIGHT_EYEBROW   693.0257   160.6680 -25.614508000
262 | ## 6        RIGHT_OF_RIGHT_EYEBROW   767.7514   164.2806   7.637372000
263 | ## 7         MIDPOINT_BETWEEN_EYES   661.2344   185.0575 -29.068363000
264 | ## 8                      NOSE_TIP   661.9072   260.9006 -74.153710000
265 | ## 9                     UPPER_LIP   662.7698   310.7758 -48.440180000
266 | ## 10                    LOWER_LIP   662.2978   348.6693 -43.532110000
267 | ## 11                   MOUTH_LEFT   603.2181   322.5777 -10.572805000
268 | ## 12                  MOUTH_RIGHT   722.5995   317.3515 -14.504428000
269 | ## 13                 MOUTH_CENTER   662.6649   325.9983 -40.966953000
270 | ## 14            NOSE_BOTTOM_RIGHT   700.5242   274.2835 -26.340576000
271 | ## 15             NOSE_BOTTOM_LEFT   626.1442   275.3283 -23.933197000
272 | ## 16           NOSE_BOTTOM_CENTER   663.0441   284.4952 -46.457916000
273 | ## 17        LEFT_EYE_TOP_BOUNDARY   597.7986   185.3718  -8.027897000
274 | ## 18        LEFT_EYE_RIGHT_CORNER   624.5084   194.5995  -0.831856900
275 | ## 19     LEFT_EYE_BOTTOM_BOUNDARY   597.1378   203.6126  -2.181139000
276 | ## 20         LEFT_EYE_LEFT_CORNER   572.3240   197.5685  12.514983000
277 | ## 21               LEFT_EYE_PUPIL   596.0174   194.7956  -3.541256700
278 | ## 22       RIGHT_EYE_TOP_BOUNDARY   725.6114   183.5610 -12.764345000
279 | ## 23       RIGHT_EYE_RIGHT_CORNER   751.5310   195.6238   5.530805000
280 | ## 24    RIGHT_EYE_BOTTOM_BOUNDARY   725.6032   203.6048  -7.038474000
281 | ## 25        RIGHT_EYE_LEFT_CORNER   701.6699   194.9469  -3.618452800
282 | ## 26              RIGHT_EYE_PUPIL   727.3262   193.0094  -8.517615000
283 | ## 27  LEFT_EYEBROW_UPPER_MIDPOINT   591.7446   147.7026 -10.674338000
284 | ## 28 RIGHT_EYEBROW_UPPER_MIDPOINT   730.4487   145.7475 -15.850110000
285 | ## 29             LEFT_EAR_TRAGION   525.0554   276.1004 152.717160000
286 | ## 30            RIGHT_EAR_TRAGION   814.0723   270.0032 142.933240000
287 | ## 31            FOREHEAD_GLABELLA   660.7107   158.9580 -29.311995000
288 | ## 32                CHIN_GNATHION   664.2515   409.8619 -31.410963000
289 | ## 33             CHIN_LEFT_GONION   536.8293   350.2700  93.143530000
290 | ## 34            CHIN_RIGHT_GONION   798.8593   346.5862  83.418430000
291 | 
292 | 293 |
my_face_pic <- readImage('my_face.jpg')
294 | plot(my_face_pic)
295 | 
296 | xs1 = my_face$fdBoundingPoly$vertices[[1]][1][[1]]
297 | ys1 = my_face$fdBoundingPoly$vertices[[1]][2][[1]]
298 | 
299 | xs2 = my_face$landmarks[[1]][[2]][[1]]
300 | ys2 = my_face$landmarks[[1]][[2]][[2]]
301 | 
302 | polygon(x=xs1,y=ys1,border='red',lwd=4)
303 | points(x=xs2,y=ys2,lwd=2, col='lightblue')
304 | 
305 | 306 |

plot of chunk unnamed-chunk-14

307 | 308 |
309 | 310 |

Logo Detection

311 | 312 |

To continue along the Chicago trip, we drove by Wrigley field and I took a really bad photo of the sign from a moving car as it was under construction. It's nice because it has a lot of different lines and writing the Toyota logo isn't incredibly prominent or necessarily fit to brand colors.

313 | 314 |

This call returns:

315 | 316 | 321 | 322 |
wrigley_image <- readImage('wrigley_text.jpg')
323 | plot(wrigley_image)
324 | 
325 | 326 |

plot of chunk unnamed-chunk-15

327 | 328 |
wrigley_logo = getGoogleVisionResponse('wrigley_text.jpg',
329 |                                    feature = 'LOGO_DETECTION')
330 | head(wrigley_logo)
331 | 
332 | 333 |
##           mid description     score                               vertices
334 | ## 1 /g/1tk6469q      Toyota 0.3126611 435, 551, 551, 435, 449, 449, 476, 476
335 | 
336 | 337 |
wrigley_image <- readImage('wrigley_text.jpg')
338 | plot(wrigley_image)
339 | xs = wrigley_logo$boundingPoly$vertices[[1]][[1]]
340 | ys = wrigley_logo$boundingPoly$vertices[[1]][[2]]
341 | polygon(x=xs,y=ys,border='green',lwd=4)
342 | 
343 | 344 |

plot of chunk unnamed-chunk-17

345 | 346 |
347 | 348 |

Text Detection

349 | 350 |

I'll continue using the Wrigley Field picture. There is text all over the place and it's fun to see what is captured and what isn't. It appears as if the curved text at the top “field” isn't easily interpreted as text. However, the rest is caught and the words are captured.

351 | 352 |

The response sent back is a bit more difficult to interpret than the rest of the API calls - it breaks things apart by word but also returns everything as one line. Here's what comes back:

353 | 354 | 359 | 360 |
wrigley_text = getGoogleVisionResponse('wrigley_text.jpg',
361 |                                    feature = 'TEXT_DETECTION')
362 | head(wrigley_text)
363 | 
364 | 365 |
##   locale
366 | ## 1     en
367 | ## 2   <NA>
368 | ## 3   <NA>
369 | ## 4   <NA>
370 | ## 5   <NA>
371 | ## 6   <NA>
372 | ##                                                                                                        description
373 | ## 1 RIGLEY F\nICHICAGO CUBS\nORDER ONLINE AT GIORDANOS.COM\nTOYOTA\nMIDWEST\nFENCE\n773-722-6616\nCAUTION\nCAUTION\n
374 | ## 2                                                                                                           RIGLEY
375 | ## 3                                                                                                                F
376 | ## 4                                                                                                         ICHICAGO
377 | ## 5                                                                                                             CUBS
378 | ## 6                                                                                                            ORDER
379 | ##                                 vertices
380 | ## 1   55, 657, 657, 55, 210, 210, 852, 852
381 | ## 2 343, 482, 484, 345, 217, 211, 260, 266
382 | ## 3 501, 524, 526, 503, 211, 210, 259, 260
383 | ## 4 222, 503, 501, 220, 295, 307, 348, 336
384 | ## 5 527, 627, 625, 525, 308, 312, 353, 349
385 | ## 6 310, 384, 384, 310, 374, 374, 391, 391
386 | 
387 | 388 |
wrigley_image <- readImage('wrigley_text.jpg')
389 | plot(wrigley_image)
390 | 
391 | for(i in 1:length(wrigley_text$boundingPoly$vertices)){
392 |   xs = wrigley_text$boundingPoly$vertices[[i]]$x
393 |   ys = wrigley_text$boundingPoly$vertices[[i]]$y
394 |   polygon(x=xs,y=ys,border='green',lwd=2)
395 | }
396 | 
397 | 398 |

plot of chunk unnamed-chunk-19

399 | 400 |
401 | 402 |

That's about it for the basics of using the Google Vision API with the RoogleVision library. I highly recommend tinkering around with it a bit, especially because it won't cost you a dime.

403 | 404 |

While I do enjoy the math under the hood and the thinking required to understand alrgorithms, I do think these sorts of API's will become the way of the future for data science. Outside of specific use cases or special industries, it seems hard to imagine wanting to try and create algorithms that would be better than ones created for mass consumption. As long as they're fast, free and accurate, I'm all about making my life easier! From the hiring perspective, I much prefer someone who can get the job done over someone who can slightly improve performance (as always, there are many cases where this doesn't apply).

405 | 406 |

Please comment if you are utilizing any of the Google API's for business purposes, I would love to hear it!

407 | 408 |

As always you can find this on my GitHub

409 | 410 | -------------------------------------------------------------------------------- /Google Vision API/Google Vision API in R.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Google Vision API in R" 3 | author: "Scott Stoltzman" 4 | date: "7/29/2017" 5 | output: html_document 6 | --- 7 | 8 | ## Using the Google Vision API in R 9 | 10 | ### Utilizing RoogleVision 11 | 12 | After doing my post last month on OpenCV and face detection, I started looking into other algorithms used for pattern detection in images. As it turns out, Google has done a phenomenal job with their Vision API. It's absolutely incredible the amount of information it can spit back to you by simply sending it a picture. 13 | 14 | Also, it's 100% free! I believe that includes 1000 images per month. Amazing! 15 | 16 | In this post I'm going to walk you through the absolute basics of accessing the power of the Google Vision API using the RoogleVision package in R. 17 | 18 | As always, we'll start off loading some libraries. I wrote some extra notation around where you can install them within the code. 19 | 20 | 21 | ```r 22 | # Normal Libraries 23 | library(tidyverse) 24 | 25 | # devtools::install_github("flovv/RoogleVision") 26 | library(RoogleVision) 27 | library(jsonlite) # to import credentials 28 | 29 | # For image processing 30 | # source("http://bioconductor.org/biocLite.R") 31 | # biocLite("EBImage") 32 | library(EBImage) 33 | 34 | # For Latitude Longitude Map 35 | library(leaflet) 36 | ``` 37 | 38 | #### Google Authentication 39 | 40 | In order to use the API, you have to authenticate. There is plenty of documentation out there about how to setup an account, create a project, download credentials, etc. Head over to [Google Cloud Console](https://console.cloud.google.com) if you don't have an account already. 41 | 42 | 43 | ```r 44 | # Credentials file I downloaded from the cloud console 45 | creds = fromJSON('credentials.json') 46 | 47 | # Google Authentication - Use Your Credentials 48 | # options("googleAuthR.client_id" = "xxx.apps.googleusercontent.com") 49 | # options("googleAuthR.client_secret" = "") 50 | 51 | options("googleAuthR.client_id" = creds$installed$client_id) 52 | options("googleAuthR.client_secret" = creds$installed$client_secret) 53 | options("googleAuthR.scopes.selected" = c("https://www.googleapis.com/auth/cloud-platform")) 54 | googleAuthR::gar_auth() 55 | ``` 56 | 57 | ``` 58 | ## 2017-07-31 11:30:34> Token cache file: .httr-oauth 59 | ``` 60 | 61 | ``` 62 | ## 2017-07-31 11:30:34> Scopes: https://www.googleapis.com/auth/cloud-platform 63 | ``` 64 | 65 | 66 | ### Now You're Ready to Go 67 | 68 | The function getGoogleVisionResponse takes three arguments: 69 | 70 | 1. imagePath 71 | 2. feature 72 | 3. numResults 73 | 74 | Numbers 1 and 3 are self-explanatory, "feature" has 5 options: 75 | 76 | * LABEL_DETECTION 77 | * LANDMARK_DETECTION 78 | * FACE_DETECTION 79 | * LOGO_DETECTION 80 | * TEXT_DETECTION 81 | 82 | These are self-explanatory but it's nice to see each one in action. 83 | 84 | As a side note: there are also other features that the API has which aren't included (yet) in the RoogleVision package such as "Safe Search" which identifies inappropriate content, "Properties" which identifies dominant colors and aspect ratios and a few others can be found at the [Cloud Vision website](https://cloud.google.com/vision/) 85 | 86 | ---- 87 | 88 | #### Label Detection 89 | 90 | This is used to help determine content within the photo. It can basically add a level of metadata around the image. 91 | 92 | Here is a photo of our dog when we hiked up to Audubon Peak in Colorado: 93 | 94 | ![plot of chunk unnamed-chunk-2](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-2-1.png) 95 | 96 | 97 | 98 | 99 | ```r 100 | dog_mountain_label = getGoogleVisionResponse('dog_mountain.jpg', 101 | feature = 'LABEL_DETECTION') 102 | head(dog_mountain_label) 103 | ``` 104 | 105 | ``` 106 | ## mid description score 107 | ## 1 /m/09d_r mountain 0.9188690 108 | ## 2 /g/11jxkqbpp mountainous landforms 0.9009549 109 | ## 3 /m/023bbt wilderness 0.8733696 110 | ## 4 /m/0kpmf dog breed 0.8398435 111 | ## 5 /m/0d4djn dog hiking 0.8352048 112 | ``` 113 | 114 | 115 | All 5 responses were incredibly accurate! The "score" that is returned is how confident the Google Vision algorithms are, so there's a 91.9% chance a mountain is prominent in this photo. I like "dog hiking" the best - considering that's what we were doing at the time. Kind of a little bit too accurate... 116 | 117 | ---- 118 | 119 | #### Landmark Detection 120 | 121 | This is a feature designed to specifically pick out a recognizable landmark! It provides the position in the image along with the geolocation of the landmark (in longitude and latitude). 122 | 123 | My wife and I took this selfie in at the Linderhof Castle in Bavaria, Germany. 124 | 125 | 126 | ```r 127 | us_castle <- readImage('us_castle_2.jpg') 128 | plot(us_castle) 129 | ``` 130 | 131 | ![plot of chunk unnamed-chunk-4](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-4-1.png) 132 | 133 | 134 | The response from the Google Vision API was spot on. It returned "Linderhof Palace" as the description. It also provided a score (I reduced the resolution of the image which hurt the score), a boundingPoly field and locations. 135 | 136 | * Bounding Poly - gives x,y coordinates for a polygon around the landmark in the image 137 | * Locations - provides longitude,latitude coordinates 138 | 139 | 140 | ```r 141 | us_landmark = getGoogleVisionResponse('us_castle_2.jpg', 142 | feature = 'LANDMARK_DETECTION') 143 | head(us_landmark) 144 | ``` 145 | 146 | ``` 147 | ## mid description score 148 | ## 1 /m/066h19 Linderhof Palace 0.4665011 149 | ## vertices locations 150 | ## 1 25, 382, 382, 25, 178, 178, 659, 659 47.57127, 10.96072 151 | ``` 152 | 153 | I plotted the polygon over the image using the coordinates returned. It does a great job (certainly not perfect) of getting the castle identified. It's a bit tough to say what the actual "landmark" would be in this case due to the fact the fountains, stairs and grounds are certainly important and are a key part of the castle. 154 | 155 | 156 | ```r 157 | us_castle <- readImage('us_castle_2.jpg') 158 | plot(us_castle) 159 | xs = us_landmark$boundingPoly$vertices[[1]][1][[1]] 160 | ys = us_landmark$boundingPoly$vertices[[1]][2][[1]] 161 | polygon(x=xs,y=ys,border='red',lwd=4) 162 | ``` 163 | 164 | ![plot of chunk unnamed-chunk-6](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-6-1.png) 165 | 166 | 167 | Turning to the locations - I plotted this using the leaflet library. If you haven't used leaflet, start doing so immediately. I'm a huge fan of it due to speed and simplicity. There are a lot of customization options available as well that you can check out. 168 | 169 | The location = spot on! While it isn't a shock to me that Google could provide the location of "Linderhof Castle" - it is amazing to me that I don't have to write a web crawler search function to find it myself! That's just one of many little luxuries they have built into this API. 170 | 171 | 172 | ```r 173 | latt = us_landmark$locations[[1]][[1]][[1]] 174 | lon = us_landmark$locations[[1]][[1]][[2]] 175 | m = leaflet() %>% 176 | addProviderTiles(providers$CartoDB.Positron) %>% 177 | setView(lng = lon, lat = latt, zoom = 5) %>% 178 | addMarkers(lng = lon, lat = latt) 179 | m 180 | ``` 181 | 182 | ``` 183 | ## Error in loadNamespace(name): there is no package called 'webshot' 184 | ``` 185 | 186 | 187 | ---- 188 | 189 | #### Face Detection 190 | 191 | My last blog post showed the OpenCV package utilizing the haar cascade algorithm in action. I didn't dig into Google's algorithms to figure out what is under the hood, but it provides similar results. However, rather than layering in each subsequent "find the eyes" and "find the mouth" and ...etc... it returns more than you ever needed to know. 192 | 193 | * Bounding Poly = highest level polygon 194 | * FD Bounding Poly = polygon surrounding each face 195 | * Landmarks = (funny name) includes each feature of the face (left eye, right eye, etc.) 196 | * Roll Angle, Pan Angle, Tilt Angle = all of the different angles you'd need per face 197 | * Confidence (detection and landmarking) = how certain the algorithm is that it's accurate 198 | * Joy, sorrow, anger, surprise, under exposed, blurred, headwear likelihoods = how likely it is that each face contains that emotion or characteristic 199 | 200 | The likelihoods is another amazing piece of information returned! I have run about 20 images through this API and every single one has been accurate - very impressive! 201 | 202 | I wanted to showcase the face detection and headwear first. Here's a picture of my wife and I at "The Bean" in Chicago (side note: it's awesome! I thought it was going to be really silly, but you can really have a lot of fun with all of the angles and reflections): 203 | 204 | 205 | ```r 206 | us_hats_pic <- readImage('us_hats.jpg') 207 | plot(us_hats_pic) 208 | ``` 209 | 210 | ![plot of chunk unnamed-chunk-8](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-8-1.png) 211 | 212 | 213 | 214 | ```r 215 | us_hats = getGoogleVisionResponse('us_hats.jpg', 216 | feature = 'FACE_DETECTION') 217 | head(us_hats) 218 | ``` 219 | 220 | ``` 221 | ## vertices 222 | ## 1 295, 410, 410, 295, 164, 164, 297, 297 223 | ## 2 353, 455, 455, 353, 261, 261, 381, 381 224 | ## vertices 225 | ## 1 327, 402, 402, 327, 206, 206, 280, 280 226 | ## 2 368, 439, 439, 368, 298, 298, 370, 370 227 | ## landmarks 228 | ## 1 LEFT_EYE, RIGHT_EYE, LEFT_OF_LEFT_EYEBROW, RIGHT_OF_LEFT_EYEBROW, LEFT_OF_RIGHT_EYEBROW, RIGHT_OF_RIGHT_EYEBROW, MIDPOINT_BETWEEN_EYES, NOSE_TIP, UPPER_LIP, LOWER_LIP, MOUTH_LEFT, MOUTH_RIGHT, MOUTH_CENTER, NOSE_BOTTOM_RIGHT, NOSE_BOTTOM_LEFT, NOSE_BOTTOM_CENTER, LEFT_EYE_TOP_BOUNDARY, LEFT_EYE_RIGHT_CORNER, LEFT_EYE_BOTTOM_BOUNDARY, LEFT_EYE_LEFT_CORNER, LEFT_EYE_PUPIL, RIGHT_EYE_TOP_BOUNDARY, RIGHT_EYE_RIGHT_CORNER, RIGHT_EYE_BOTTOM_BOUNDARY, RIGHT_EYE_LEFT_CORNER, RIGHT_EYE_PUPIL, LEFT_EYEBROW_UPPER_MIDPOINT, RIGHT_EYEBROW_UPPER_MIDPOINT, LEFT_EAR_TRAGION, RIGHT_EAR_TRAGION, FOREHEAD_GLABELLA, CHIN_GNATHION, CHIN_LEFT_GONION, CHIN_RIGHT_GONION, 352.00974, 380.68124, 340.27664, 363.16348, 378.64938, 393.6553, 370.78906, 371.99802, 366.30664, 364.23642, 349.47012, 377.17905, 364.7603, 375.62842, 357.7237, 367.20822, 352.4306, 358.9425, 351.23474, 343.64124, 351.10004, 384.32953, 388.21667, 382.08743, 375.90262, 383.87732, 353.08627, 387.7416, 312.06622, 384.56946, 371.5381, 360.62714, 318.48486, 383.87354, 225.86505, 229.50423, 216.51169, 219.28635, 220.92139, 222.02762, 227.35771, 248.94884, 259.51044, 272.9798, 261.36096, 263.89874, 265.76828, 251.38408, 248.46135, 253.8837, 223.93387, 227.20102, 228.33765, 225.31805, 226.13412, 227.22661, 229.89023, 231.8548, 229.34843, 229.54358, 213.85588, 217.43123, 236.95158, 244.45172, 219.76247, 287.00592, 260.99124, 267.68896, -0.0009269835, 12.904515, -2.3585303, -3.3569832, 3.4166863, 20.891703, -0.10083569, -8.568332, -0.32282636, 2.7426949, 3.1502135, 14.79839, 2.401884, 8.115268, 0.19641992, -0.7506992, -2.5084567, 2.8466656, -0.38294473, -0.05908208, -1.2792722, 11.411656, 19.373985, 13.421982, 10.900102, 12.992137, -5.2635217, 9.859322, 31.33588, 62.9466, -1.213793, 7.9232774, 20.887934, 49.40408 229 | ## 2 LEFT_EYE, RIGHT_EYE, LEFT_OF_LEFT_EYEBROW, RIGHT_OF_LEFT_EYEBROW, LEFT_OF_RIGHT_EYEBROW, RIGHT_OF_RIGHT_EYEBROW, MIDPOINT_BETWEEN_EYES, NOSE_TIP, UPPER_LIP, LOWER_LIP, MOUTH_LEFT, MOUTH_RIGHT, MOUTH_CENTER, NOSE_BOTTOM_RIGHT, NOSE_BOTTOM_LEFT, NOSE_BOTTOM_CENTER, LEFT_EYE_TOP_BOUNDARY, LEFT_EYE_RIGHT_CORNER, LEFT_EYE_BOTTOM_BOUNDARY, LEFT_EYE_LEFT_CORNER, LEFT_EYE_PUPIL, RIGHT_EYE_TOP_BOUNDARY, RIGHT_EYE_RIGHT_CORNER, RIGHT_EYE_BOTTOM_BOUNDARY, RIGHT_EYE_LEFT_CORNER, RIGHT_EYE_PUPIL, LEFT_EYEBROW_UPPER_MIDPOINT, RIGHT_EYEBROW_UPPER_MIDPOINT, LEFT_EAR_TRAGION, RIGHT_EAR_TRAGION, FOREHEAD_GLABELLA, CHIN_GNATHION, CHIN_LEFT_GONION, CHIN_RIGHT_GONION, 389.67215, 419.01474, 378.68497, 397.29074, 411.57373, 430.68024, 404.34882, 402.9257, 402.77734, 402.28552, 388.3598, 418.2969, 402.50266, 411.8417, 394.88547, 403.11188, 388.78043, 395.5202, 389.06342, 382.62268, 388.21332, 419.86707, 425.98645, 419.21088, 413.45447, 420.1578, 387.80508, 421.56183, 369.29388, 439.9703, 404.44498, 401.90457, 371.50647, 435.39258, 319.11594, 320.14157, 310.8753, 313.32437, 313.59402, 313.3107, 318.78964, 337.8581, 347.607, 360.56134, 350.5411, 351.10315, 354.41702, 339.4301, 339.11786, 342.46072, 317.141, 320.19537, 321.22644, 318.473, 319.0922, 318.58655, 320.64886, 322.44992, 320.4701, 320.58286, 308.29236, 309.85825, 328.25885, 331.54816, 312.85385, 372.5355, 349.82388, 352.82462, 0.00018637085, -0.63476753, 2.1497552, -7.1008844, -7.460493, 1.0756116, -7.027477, -13.670173, -5.1229305, -1.0671108, 4.793461, 4.4603314, -1.8998832, -1.677745, -1.2933732, -5.9320116, -2.3477247, 0.03832738, 0.0054741018, 2.9924247, -0.7714207, -2.9816942, 2.079318, -0.6419869, -0.3527427, -1.4552351, -5.0709085, -5.7559977, 40.608036, 39.10855, -8.547456, 4.8426514, 30.500828, 29.191824 230 | ## rollAngle panAngle tiltAngle detectionConfidence landmarkingConfidence 231 | ## 1 7.103324 23.46835 -2.816312 0.9877176 0.7072066 232 | ## 2 2.510939 -1.17956 -7.393063 0.9997375 0.7268016 233 | ## joyLikelihood sorrowLikelihood angerLikelihood surpriseLikelihood 234 | ## 1 VERY_LIKELY VERY_UNLIKELY VERY_UNLIKELY VERY_UNLIKELY 235 | ## 2 VERY_LIKELY VERY_UNLIKELY VERY_UNLIKELY VERY_UNLIKELY 236 | ## underExposedLikelihood blurredLikelihood headwearLikelihood 237 | ## 1 VERY_UNLIKELY VERY_UNLIKELY VERY_LIKELY 238 | ## 2 VERY_UNLIKELY VERY_UNLIKELY VERY_LIKELY 239 | ``` 240 | 241 | 242 | 243 | ```r 244 | us_hats_pic <- readImage('us_hats.jpg') 245 | plot(us_hats_pic) 246 | 247 | xs1 = us_hats$fdBoundingPoly$vertices[[1]][1][[1]] 248 | ys1 = us_hats$fdBoundingPoly$vertices[[1]][2][[1]] 249 | 250 | xs2 = us_hats$fdBoundingPoly$vertices[[2]][1][[1]] 251 | ys2 = us_hats$fdBoundingPoly$vertices[[2]][2][[1]] 252 | 253 | polygon(x=xs1,y=ys1,border='red',lwd=4) 254 | polygon(x=xs2,y=ys2,border='green',lwd=4) 255 | ``` 256 | 257 | ![plot of chunk unnamed-chunk-10](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-10-1.png) 258 | 259 | 260 | Here's a shot that should be familiar (copied directly from my last blog) - and I wanted to highlight the different features that can be detected. Look at how many points are perfectly placed: 261 | 262 | 263 | ```r 264 | my_face_pic <- readImage('my_face.jpg') 265 | plot(my_face_pic) 266 | ``` 267 | 268 | ![plot of chunk unnamed-chunk-11](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-11-1.png) 269 | 270 | 271 | 272 | 273 | ```r 274 | my_face = getGoogleVisionResponse('my_face.jpg', 275 | feature = 'FACE_DETECTION') 276 | head(my_face) 277 | ``` 278 | 279 | ``` 280 | ## vertices 281 | ## 1 456, 877, 877, 456, NA, NA, 473, 473 282 | ## vertices 283 | ## 1 515, 813, 813, 515, 98, 98, 395, 395 284 | ## landmarks 285 | ## 1 LEFT_EYE, RIGHT_EYE, LEFT_OF_LEFT_EYEBROW, RIGHT_OF_LEFT_EYEBROW, LEFT_OF_RIGHT_EYEBROW, RIGHT_OF_RIGHT_EYEBROW, MIDPOINT_BETWEEN_EYES, NOSE_TIP, UPPER_LIP, LOWER_LIP, MOUTH_LEFT, MOUTH_RIGHT, MOUTH_CENTER, NOSE_BOTTOM_RIGHT, NOSE_BOTTOM_LEFT, NOSE_BOTTOM_CENTER, LEFT_EYE_TOP_BOUNDARY, LEFT_EYE_RIGHT_CORNER, LEFT_EYE_BOTTOM_BOUNDARY, LEFT_EYE_LEFT_CORNER, LEFT_EYE_PUPIL, RIGHT_EYE_TOP_BOUNDARY, RIGHT_EYE_RIGHT_CORNER, RIGHT_EYE_BOTTOM_BOUNDARY, RIGHT_EYE_LEFT_CORNER, RIGHT_EYE_PUPIL, LEFT_EYEBROW_UPPER_MIDPOINT, RIGHT_EYEBROW_UPPER_MIDPOINT, LEFT_EAR_TRAGION, RIGHT_EAR_TRAGION, FOREHEAD_GLABELLA, CHIN_GNATHION, CHIN_LEFT_GONION, CHIN_RIGHT_GONION, 598.7636, 723.16125, 556.1954, 628.8224, 693.0257, 767.7514, 661.2344, 661.9072, 662.7698, 662.2978, 603.21814, 722.5995, 662.66486, 700.5242, 626.14417, 663.0441, 597.7986, 624.5084, 597.13776, 572.32404, 596.0174, 725.61145, 751.531, 725.60315, 701.6699, 727.3262, 591.74457, 730.4487, 525.0554, 814.0723, 660.71075, 664.25146, 536.8293, 798.8593, 192.19489, 192.49554, 165.28363, 159.90292, 160.66797, 164.28062, 185.05746, 260.90063, 310.77585, 348.6693, 322.57773, 317.35153, 325.9983, 274.28345, 275.32834, 284.49515, 185.37177, 194.59952, 203.61258, 197.56845, 194.79561, 183.56104, 195.62381, 203.60477, 194.94687, 193.0094, 147.70262, 145.74747, 276.10037, 270.00323, 158.95798, 409.86185, 350.27, 346.58624, -0.0018592946, -4.8054757, 15.825399, -23.345352, -25.614508, 7.637372, -29.068363, -74.15371, -48.44018, -43.53211, -10.572805, -14.504428, -40.966953, -26.340576, -23.933197, -46.457916, -8.027897, -0.8318569, -2.181139, 12.514983, -3.5412567, -12.764345, 5.530805, -7.038474, -3.6184528, -8.517615, -10.674338, -15.85011, 152.71716, 142.93324, -29.311995, -31.410963, 93.14353, 83.41843 286 | ## rollAngle panAngle tiltAngle detectionConfidence landmarkingConfidence 287 | ## 1 -0.6375801 -2.120439 5.706552 0.996818 0.8222974 288 | ## joyLikelihood sorrowLikelihood angerLikelihood surpriseLikelihood 289 | ## 1 VERY_LIKELY VERY_UNLIKELY VERY_UNLIKELY VERY_UNLIKELY 290 | ## underExposedLikelihood blurredLikelihood headwearLikelihood 291 | ## 1 VERY_UNLIKELY VERY_UNLIKELY VERY_UNLIKELY 292 | ``` 293 | 294 | 295 | 296 | 297 | ```r 298 | head(my_face$landmarks) 299 | ``` 300 | 301 | ``` 302 | ## [[1]] 303 | ## type position.x position.y position.z 304 | ## 1 LEFT_EYE 598.7636 192.1949 -0.001859295 305 | ## 2 RIGHT_EYE 723.1612 192.4955 -4.805475700 306 | ## 3 LEFT_OF_LEFT_EYEBROW 556.1954 165.2836 15.825399000 307 | ## 4 RIGHT_OF_LEFT_EYEBROW 628.8224 159.9029 -23.345352000 308 | ## 5 LEFT_OF_RIGHT_EYEBROW 693.0257 160.6680 -25.614508000 309 | ## 6 RIGHT_OF_RIGHT_EYEBROW 767.7514 164.2806 7.637372000 310 | ## 7 MIDPOINT_BETWEEN_EYES 661.2344 185.0575 -29.068363000 311 | ## 8 NOSE_TIP 661.9072 260.9006 -74.153710000 312 | ## 9 UPPER_LIP 662.7698 310.7758 -48.440180000 313 | ## 10 LOWER_LIP 662.2978 348.6693 -43.532110000 314 | ## 11 MOUTH_LEFT 603.2181 322.5777 -10.572805000 315 | ## 12 MOUTH_RIGHT 722.5995 317.3515 -14.504428000 316 | ## 13 MOUTH_CENTER 662.6649 325.9983 -40.966953000 317 | ## 14 NOSE_BOTTOM_RIGHT 700.5242 274.2835 -26.340576000 318 | ## 15 NOSE_BOTTOM_LEFT 626.1442 275.3283 -23.933197000 319 | ## 16 NOSE_BOTTOM_CENTER 663.0441 284.4952 -46.457916000 320 | ## 17 LEFT_EYE_TOP_BOUNDARY 597.7986 185.3718 -8.027897000 321 | ## 18 LEFT_EYE_RIGHT_CORNER 624.5084 194.5995 -0.831856900 322 | ## 19 LEFT_EYE_BOTTOM_BOUNDARY 597.1378 203.6126 -2.181139000 323 | ## 20 LEFT_EYE_LEFT_CORNER 572.3240 197.5685 12.514983000 324 | ## 21 LEFT_EYE_PUPIL 596.0174 194.7956 -3.541256700 325 | ## 22 RIGHT_EYE_TOP_BOUNDARY 725.6114 183.5610 -12.764345000 326 | ## 23 RIGHT_EYE_RIGHT_CORNER 751.5310 195.6238 5.530805000 327 | ## 24 RIGHT_EYE_BOTTOM_BOUNDARY 725.6032 203.6048 -7.038474000 328 | ## 25 RIGHT_EYE_LEFT_CORNER 701.6699 194.9469 -3.618452800 329 | ## 26 RIGHT_EYE_PUPIL 727.3262 193.0094 -8.517615000 330 | ## 27 LEFT_EYEBROW_UPPER_MIDPOINT 591.7446 147.7026 -10.674338000 331 | ## 28 RIGHT_EYEBROW_UPPER_MIDPOINT 730.4487 145.7475 -15.850110000 332 | ## 29 LEFT_EAR_TRAGION 525.0554 276.1004 152.717160000 333 | ## 30 RIGHT_EAR_TRAGION 814.0723 270.0032 142.933240000 334 | ## 31 FOREHEAD_GLABELLA 660.7107 158.9580 -29.311995000 335 | ## 32 CHIN_GNATHION 664.2515 409.8619 -31.410963000 336 | ## 33 CHIN_LEFT_GONION 536.8293 350.2700 93.143530000 337 | ## 34 CHIN_RIGHT_GONION 798.8593 346.5862 83.418430000 338 | ``` 339 | 340 | 341 | 342 | 343 | ```r 344 | my_face_pic <- readImage('my_face.jpg') 345 | plot(my_face_pic) 346 | 347 | xs1 = my_face$fdBoundingPoly$vertices[[1]][1][[1]] 348 | ys1 = my_face$fdBoundingPoly$vertices[[1]][2][[1]] 349 | 350 | xs2 = my_face$landmarks[[1]][[2]][[1]] 351 | ys2 = my_face$landmarks[[1]][[2]][[2]] 352 | 353 | polygon(x=xs1,y=ys1,border='red',lwd=4) 354 | points(x=xs2,y=ys2,lwd=2, col='lightblue') 355 | ``` 356 | 357 | ![plot of chunk unnamed-chunk-14](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-14-1.png) 358 | 359 | ---- 360 | 361 | #### Logo Detection 362 | 363 | To continue along the Chicago trip, we drove by Wrigley field and I took a really bad photo of the sign from a moving car as it was under construction. It's nice because it has a lot of different lines and writing the Toyota logo isn't incredibly prominent or necessarily fit to brand colors. 364 | 365 | This call returns: 366 | 367 | * Description = Brand name of the logo detected 368 | * Score = Confidence of prediction accuracy 369 | * Bounding Poly = (Again) coordinates of the logo 370 | 371 | 372 | 373 | ```r 374 | wrigley_image <- readImage('wrigley_text.jpg') 375 | plot(wrigley_image) 376 | ``` 377 | 378 | ![plot of chunk unnamed-chunk-15](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-15-1.png) 379 | 380 | 381 | 382 | 383 | ```r 384 | wrigley_logo = getGoogleVisionResponse('wrigley_text.jpg', 385 | feature = 'LOGO_DETECTION') 386 | head(wrigley_logo) 387 | ``` 388 | 389 | ``` 390 | ## mid description score vertices 391 | ## 1 /g/1tk6469q Toyota 0.3126611 435, 551, 551, 435, 449, 449, 476, 476 392 | ``` 393 | 394 | 395 | 396 | ```r 397 | wrigley_image <- readImage('wrigley_text.jpg') 398 | plot(wrigley_image) 399 | xs = wrigley_logo$boundingPoly$vertices[[1]][[1]] 400 | ys = wrigley_logo$boundingPoly$vertices[[1]][[2]] 401 | polygon(x=xs,y=ys,border='green',lwd=4) 402 | ``` 403 | 404 | ![plot of chunk unnamed-chunk-17](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-17-1.png) 405 | 406 | ---- 407 | 408 | #### Text Detection 409 | 410 | I'll continue using the Wrigley Field picture. There is text all over the place and it's fun to see what is captured and what isn't. It appears as if the curved text at the top "field" isn't easily interpreted as text. However, the rest is caught and the words are captured. 411 | 412 | The response sent back is a bit more difficult to interpret than the rest of the API calls - it breaks things apart by word but also returns everything as one line. Here's what comes back: 413 | 414 | * Locale = language, returned as source 415 | * Description = the text (the first line is everything, and then the rest are indiviudal words) 416 | * Bounding Poly = I'm sure you can guess by now 417 | 418 | 419 | ```r 420 | wrigley_text = getGoogleVisionResponse('wrigley_text.jpg', 421 | feature = 'TEXT_DETECTION') 422 | head(wrigley_text) 423 | ``` 424 | 425 | ``` 426 | ## locale 427 | ## 1 en 428 | ## 2 429 | ## 3 430 | ## 4 431 | ## 5 432 | ## 6 433 | ## description 434 | ## 1 RIGLEY F\nICHICAGO CUBS\nORDER ONLINE AT GIORDANOS.COM\nTOYOTA\nMIDWEST\nFENCE\n773-722-6616\nCAUTION\nCAUTION\n 435 | ## 2 RIGLEY 436 | ## 3 F 437 | ## 4 ICHICAGO 438 | ## 5 CUBS 439 | ## 6 ORDER 440 | ## vertices 441 | ## 1 55, 657, 657, 55, 210, 210, 852, 852 442 | ## 2 343, 482, 484, 345, 217, 211, 260, 266 443 | ## 3 501, 524, 526, 503, 211, 210, 259, 260 444 | ## 4 222, 503, 501, 220, 295, 307, 348, 336 445 | ## 5 527, 627, 625, 525, 308, 312, 353, 349 446 | ## 6 310, 384, 384, 310, 374, 374, 391, 391 447 | ``` 448 | 449 | 450 | ```r 451 | wrigley_image <- readImage('wrigley_text.jpg') 452 | plot(wrigley_image) 453 | 454 | for(i in 1:length(wrigley_text$boundingPoly$vertices)){ 455 | xs = wrigley_text$boundingPoly$vertices[[i]]$x 456 | ys = wrigley_text$boundingPoly$vertices[[i]]$y 457 | polygon(x=xs,y=ys,border='green',lwd=2) 458 | } 459 | ``` 460 | 461 | ![plot of chunk unnamed-chunk-19](https://www.stoltzmaniac.com/wp-content/uploads/2017/07/unnamed-chunk-19-1.png) 462 | 463 | ---- 464 | 465 | That's about it for the basics of using the Google Vision API with the RoogleVision library. I highly recommend tinkering around with it a bit, especially because it won't cost you a dime. 466 | 467 | While I do enjoy the math under the hood and the thinking required to understand alrgorithms, I do think these sorts of API's will become the way of the future for data science. Outside of specific use cases or special industries, it seems hard to imagine wanting to try and create algorithms that would be better than ones created for mass consumption. As long as they're fast, free and accurate, I'm all about making my life easier! From the hiring perspective, I much prefer someone who can get the job done over someone who can slightly improve performance (as always, there are many cases where this doesn't apply). 468 | 469 | Please comment if you are utilizing any of the Google API's for business purposes, I would love to hear it! 470 | 471 | As always you can find this on my [GitHub](https://github.com/stoltzmaniac/ML-Image-Processing-R) 472 | 473 | 474 | 475 | 476 | -------------------------------------------------------------------------------- /Google Vision API/Google Vision API.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /Google Vision API/dog_mountain.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/dog_mountain.jpg -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-10-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-10-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-11-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-11-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-14-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-14-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-15-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-15-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-17-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-17-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-19-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-19-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-199-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-199-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-2-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-2-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-4-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-4-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-6-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-6-1.png -------------------------------------------------------------------------------- /Google Vision API/figure/unnamed-chunk-8-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/figure/unnamed-chunk-8-1.png -------------------------------------------------------------------------------- /Google Vision API/my_face.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/my_face.jpg -------------------------------------------------------------------------------- /Google Vision API/originalWebcamShot.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/originalWebcamShot.jpg -------------------------------------------------------------------------------- /Google Vision API/snacks_logos.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/snacks_logos.JPG -------------------------------------------------------------------------------- /Google Vision API/us_castle.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/us_castle.jpg -------------------------------------------------------------------------------- /Google Vision API/us_castle_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/us_castle_2.jpg -------------------------------------------------------------------------------- /Google Vision API/us_dog_mountain.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/us_dog_mountain.jpg -------------------------------------------------------------------------------- /Google Vision API/us_hats.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/us_hats.jpg -------------------------------------------------------------------------------- /Google Vision API/wrigley_text.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Google Vision API/wrigley_text.jpg -------------------------------------------------------------------------------- /Microsoft Vision API/Microsoft Vision API.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /Microsoft Vision API/R - Microsoft Vision API.Rmd: -------------------------------------------------------------------------------- 1 | ```{r setup, include=FALSE} 2 | knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE) 3 | ``` 4 | 5 | # Microsoft Cognitive Services Vision API in R 6 | 7 | ---- 8 | 9 | A little while ago I did a brief tutorial of the [Google Vision API using RoogleVision](https://www.stoltzmaniac.com/google-vision-api-in-r-rooglevision/) created by Mark Edmonson. I couldn't find anything similar to that in R for the [Microsoft Cognitive Services API](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/python#AnalyzeImage) so I thought I would give it a shot. I whipped this example together quickly to give it a proof-of-concept but I could certainly see myself building an R package to support this (unless someone can point to one - and please do if one exists)! 10 | 11 | 12 | The API is extremely easy to access using RCurl and httr. There are **a lot** of options which can be accessed. In this example, I'll just cover the basics of image detection and descriptions. 13 | 14 | 15 | ### Getting Started With Microsoft Cognitive Services 16 | 17 | In order to get started, all you need is an [Azure Account](https://azure.microsoft.com/en-us/free/) which is **free** if you can keep yourself under certain thresholds and limits. There is even a free trial period (at the time this was written, at least). 18 | 19 | Once that is taken care of there are a few things you need to do: 20 | 21 | 1. Login to [portal.azure.com](https://portal.azure.com) 22 | 2. On the lefthand menu click "Add" 23 | ![Figure 1](https://i.imgur.com/OPihH39.png) 24 | 25 | 26 | 3. Click on "AI + Cognitive Services" and then the "Computer Vision API" 27 | ![Figure 2](https://i.imgur.com/GMy2LFZ.png) 28 | 29 | 30 | 4. Fill out the information required. You may have "Free Trial" under Subscription. Pay special attention to **Location** because this will be used in your API script 31 | ![Figure 3](https://i.imgur.com/t7vg4vH.png) 32 | 33 | 34 | 5. In the lefthand menu, click "Keys" underneath "Resource Management" and you will find what you need for credentials. Underneath your Endpoint URL, click on "Show access keys..." - **copy your key** and use it in your script (do not make this publicly accessible) 35 | ![Figure 4](https://i.imgur.com/CKkC2nx.png) 36 | 37 | 38 | 39 | ```{r libraries_and_credentials} 40 | library(tidyverse) 41 | library(RCurl) 42 | library(httr) 43 | library(EBImage) 44 | 45 | credentials = read_csv('credentials.csv') 46 | api_key = as.character(credentials$subscription_id) #api key is not subscription id 47 | api_endpoint_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/analyze" 48 | ``` 49 | 50 | ```{r} 51 | image_url = 'https://imgur.com/rapIn0u.jpg' 52 | visualFeatures = "Description,Tags,Categories,Faces" 53 | # options = "Categories, Tags, Description, Faces, ImageType, Color, Adult" 54 | 55 | details = "Landmarks" 56 | # options = Landmarks, Celebrities 57 | 58 | reqURL = paste(api_endpoint_url, 59 | "?visualFeatures=", 60 | visualFeatures, 61 | "&details=", 62 | details, 63 | sep="") 64 | 65 | APIresponse = POST(url = reqURL, 66 | content_type('application/json'), 67 | add_headers(.headers = c('Ocp-Apim-Subscription-Key' = api_key)), 68 | body=list(url = image_url), 69 | encode = "json") 70 | 71 | df = content(APIresponse) 72 | ``` 73 | 74 | ```{r} 75 | my_image <- readImage('SnoozeGenius.jpg') 76 | plot(my_image) 77 | ``` 78 | 79 | 80 | ```{r} 81 | description_tags = df$description$tags 82 | description_tags_tib = tibble(tag = character()) 83 | for(tag in description_tags){ 84 | for(text in tag){ 85 | if(class(tag) != "list"){ ## To remove the extra caption from being included 86 | tmp = tibble(tag = tag) 87 | description_tags_tib = description_tags_tib %>% bind_rows(tmp) 88 | } 89 | } 90 | } 91 | 92 | knitr::kable(description_tags_tib[1:5,]) 93 | ``` 94 | 95 | ```{r} 96 | captions = df$description$captions 97 | captions_tib = tibble(text = character(), confidence = numeric()) 98 | for(caption in captions){ 99 | tmp = tibble(text = caption$text, confidence = caption$confidence) 100 | captions_tib = captions_tib %>% bind_rows(tmp) 101 | } 102 | knitr::kable(captions_tib) 103 | ``` 104 | 105 | ```{r} 106 | metadata = df$metadata 107 | metadata_tib = tibble(width = metadata$width, height = metadata$height, format = metadata$format) 108 | knitr::kable(metadata_tib) 109 | ``` 110 | 111 | ```{r} 112 | faces = df$faces 113 | faces_tib = tibble(faceID = numeric(), 114 | age = numeric(), 115 | gender = character(), 116 | x1 = numeric(), 117 | x2 = numeric(), 118 | y1 = numeric(), 119 | y2 = numeric()) 120 | 121 | n = 0 122 | for(face in faces){ 123 | n = n + 1 124 | tmp = tibble(faceID = n, 125 | age = face$age, 126 | gender = face$gender, 127 | x1 = face$faceRectangle$left, 128 | y1 = face$faceRectangle$top, 129 | x2 = face$faceRectangle$left + face$faceRectangle$width, 130 | y2 = face$faceRectangle$top + face$faceRectangle$height) 131 | faces_tib = faces_tib %>% bind_rows(tmp) 132 | } 133 | faces_tib 134 | knitr::kable(faces_tib) 135 | ``` 136 | 137 | ```{r} 138 | my_image <- readImage('SnoozeGenius.jpg') 139 | plot(my_image) 140 | 141 | coords = faces_tib %>% select(x1, y1, x2, y2) 142 | for(i in 1:nrow(coords)){ 143 | print(i) 144 | xs = c(coords$x1[i], coords$x1[i], coords$x2[i], coords$x2[i]) 145 | ys = c(coords$y1[i], coords$y2[i], coords$y2[i], coords$y1[i]) 146 | polygon(x = xs, y = ys, border = i+1, lwd = 4) 147 | } 148 | ``` 149 | 150 | Image Caption = `r print(captions_tib$text)` 151 | 152 | 153 | ```{r} 154 | str(df) 155 | ``` 156 | 157 | -------------------------------------------------------------------------------- /Microsoft Vision API/SnoozeGenius.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Microsoft Vision API/SnoozeGenius.jpg -------------------------------------------------------------------------------- /Microsoft Vision API/df.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stoltzmaniac/ML-Image-Processing-R/33ff327eb94fb404aea9c27ff5aa419e65cdf7cf/Microsoft Vision API/df.rds -------------------------------------------------------------------------------- /Microsoft Vision API/sandbox.R: -------------------------------------------------------------------------------- 1 | library(tidyverse) 2 | library(RCurl) 3 | library(httr) 4 | library(EBImage) 5 | 6 | credentials = read_csv('credentials.csv') 7 | api_key = as.character(credentials$subscription_id) #api key is not subscription id 8 | api_endpoint_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/analyze" 9 | 10 | image_url = 'https://imgur.com/rapIn0u.jpg' 11 | visualFeatures = "Description,Tags,Categories,Faces" 12 | # options = "Categories, Tags, Description, Faces, ImageType, Color, Adult" 13 | 14 | details = "Landmarks" 15 | # options = Landmarks, Celebrities 16 | 17 | reqURL = paste(api_endpoint_url, 18 | "?visualFeatures=", 19 | visualFeatures, 20 | "&details=", 21 | details, 22 | sep="") 23 | 24 | APIresponse = POST(url = reqURL, 25 | content_type('application/json'), 26 | add_headers(.headers = c('Ocp-Apim-Subscription-Key' = api_key)), 27 | body=list(url = image_url), 28 | encode = "json") 29 | 30 | df = content(APIresponse) 31 | str(df) 32 | 33 | 34 | 35 | description_tags = df$description$tags 36 | description_tags_tib = tibble(tag = character()) 37 | for(tag in description_tags){ 38 | for(text in tag){ 39 | if(class(tag) != "list"){ ## To remove the extra caption from being included 40 | tmp = tibble(tag = tag) 41 | description_tags_tib = description_tags_tib %>% bind_rows(tmp) 42 | } 43 | } 44 | } 45 | 46 | knitr::kable(description_tags_tib[1:5,]) 47 | 48 | 49 | captions = df$description$captions 50 | captions_tib = tibble(text = character(), confidence = numeric()) 51 | for(caption in captions){ 52 | tmp = tibble(text = caption$text, confidence = caption$confidence) 53 | captions_tib = captions_tib %>% bind_rows(tmp) 54 | } 55 | knitr::kable(captions_tib) 56 | 57 | 58 | metadata = df$metadata 59 | metadata_tib = tibble(width = metadata$width, height = metadata$height, format = metadata$format) 60 | knitr::kable(metadata_tib) 61 | 62 | 63 | 64 | faces = df$faces 65 | faces_tib = tibble(faceID = numeric(), 66 | age = numeric(), 67 | gender = character(), 68 | x1 = numeric(), 69 | x2 = numeric(), 70 | y1 = numeric(), 71 | y2 = numeric()) 72 | 73 | n = 0 74 | for(face in faces){ 75 | n = n + 1 76 | tmp = tibble(faceID = n, 77 | age = face$age, 78 | gender = face$gender, 79 | x1 = face$faceRectangle$left, 80 | y1 = face$faceRectangle$top, 81 | x2 = face$faceRectangle$left + face$faceRectangle$width, 82 | y2 = face$faceRectangle$top + face$faceRectangle$height) 83 | faces_tib = faces_tib %>% bind_rows(tmp) 84 | } 85 | faces_tib 86 | knitr::kable(faces_tib) 87 | 88 | 89 | 90 | my_image <- readImage('SnoozeGenius.jpg') 91 | plot(my_image) 92 | 93 | coords = faces_tib %>% select(x1, y1, x2, y2) 94 | for(i in 1:nrow(coords)){ 95 | print(i) 96 | xs = c(coords$x1[i], coords$x1[i], coords$x2[i], coords$x2[i]) 97 | ys = c(coords$y1[i], coords$y2[i], coords$y2[i], coords$y1[i]) 98 | polygon(x = xs, y = ys, border = i+1, lwd = 4) 99 | } 100 | 101 | 102 | 103 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ML-Image-Processing-R 2 | Image processing in R 3 | 4 | This repository is starting to look at ways in which to process images using R libraries. 5 | 6 | So far you can see: OpenCV (which requires python integration) and Google Vision API. These are incredibly powerful and fast. Please see https://www.stoltzmaniac.com/category/image-processing/ for all posts. 7 | 8 | * [Google Vision API](https://www.stoltzmaniac.com/google-vision-api-in-r-rooglevision/) 9 | * [OpenCV Face Recognition](https://www.stoltzmaniac.com/facial-recognition-in-r/) --------------------------------------------------------------------------------