Giving your application eyes 👀 – Inspired by Common Conference’s talk about Watson

At common this year they seemed to wanted to expand our usage of tools that require Machine Learning or advanced algorithms that take a long time to come up with and may require a lot of processing power to accomplish.  Therefore you send your data offsite via an API to Watson and get back information about your data.  So below are some things I played with.

Given this recent tweet:

Lets see what we could do.  We could have an application that scans a certain hashtag or user and sends the images that person or topic posts about to a cloud service to find out what these services think about it.

Watson – BlueMix – Visual Recognition Demo

https://visual-recognition-demo.mybluemix.net/

Below are the results of submitting this image to Watson and what it thinks about the image.  The first result doesn’t make much sense to me, I’m not sure what a reformer instrument is… but the pommel horse and the gymnastic apparatus are clearly related to what Aaron is attempting.  The 5th and 6th result seem to do with the cover of his computer.  What could we do with this information?  Perhaps you could use it to auto-classify images of an conference without any human resources involved.  So your attendees could see images of things they might find interesting and the photographer only has to worry about snapping interesting photos and not have to worry about the dreaded work of organizing photos.

Watson.png

Google Cloud Vision API

https://cloud.google.com/vision/

As a developer we can’t sit on 1 person or companies view of the IT World.  Therefore I decided to check out what Google’s Cloud vision API saw.  It knew Aaron was human, in a room and doing “physical fitness”.  All very relevant to the picture

Screen Shot 2017-05-11 at 10.02.30 PM.png

It also gave us a what the dominant colors are of the picture and where we could crop the image

Screen Shot 2017-05-11 at 10.03.02 PM.png

One other thing it can provide us is if the image is “safe” to make sure obscene images are not used Screen Shot 2017-05-11 at 10.03.09 PM.png

Tesseract Open Source OCR Engine

https://github.com/tesseract-ocr

Lets say you don’t wait to give your data over to IBM or Google you could look into other repositories that allow you to do the algorithms locally such as tesseract that can be used for OCR.  I’m not sure what other libraries are available for doing the other things that Watson and Google Cloud Vision provide, but I believe the open source community will be coming up with intuitive ideas and perhaps searching more we could find some libraries that can do what IBM and Google are doing.

 

Round 2

Lets see how these services work with a different photo with text and faces turned directly to the camera.

Watson

Can see two males that are 18-24 years old and believes someone is their little brother and someone is their father and perhaps someone is wearing a wet suit (watson will need to improve on picking up skinny jeans.)

Screen Shot 2017-05-11 at 10.24.21 PM.png

Google’s turn

Now what Google was able to do was pretty crazy and I was wondering why Watson isn’t doing this or perhaps it will catch up to google after they have a big enough data set and a proper ML model.

Google was able to identify that the picture had faces expressing Joy! and thinks there’s possibly headwear but there wasn’t.  It also new the roll, tilt and pan of the image which could be used to autocorrect the image to align correctly.  It new the image was taken from twitter and that they were wearing a t-shirt and were having a “Good Time”.  It also had links to web pages where this image could  be found.  It was able to pick up the text on the shirt, and on the marketing displays in the expo.  It could even separate the text into the separate entities in the picture ( the shirt text, the 2 marketing display texts).   It even could keep the text together even though the “o” in “Power” was hidden by some dangling glasses.  It also new that it was a safe image

Screen Shot 2017-05-11 at 10.22.15 PM.png

Screen Shot 2017-05-11 at 10.22.23 PM.png

 

Screen Shot 2017-05-11 at 10.22.35 PM.png

Screen Shot 2017-05-11 at 10.22.44 PM.png

 

Screen Shot 2017-05-11 at 10.22.51 PM.png

 

Screen Shot 2017-05-11 at 10.23.16 PM.png

 

Conclusion

This is the future like it or not, and there’s some crazy applications that we can create when we combine the power of ML of both IBM, Google and open source projects ran locally.   I sort of which we could put pandora’s box away but its already open so we can’t bury our heads in the sand and should start looking into how we can use this innovation for the good of humanity.

 

.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s