At common this year they seemed to wanted to expand our usage of tools that require Machine Learning or advanced algorithms that take a long time to come up with and may require a lot of processing power to accomplish. Therefore you send your data offsite via an API to Watson and get back information about your data. So below are some things I played with.
Given this recent tweet:
Lets see what we could do. We could have an application that scans a certain hashtag or user and sends the images that person or topic posts about to a cloud service to find out what these services think about it.
Watson – BlueMix – Visual Recognition Demo
Below are the results of submitting this image to Watson and what it thinks about the image. The first result doesn’t make much sense to me, I’m not sure what a reformer instrument is… but the pommel horse and the gymnastic apparatus are clearly related to what Aaron is attempting. The 5th and 6th result seem to do with the cover of his computer. What could we do with this information? Perhaps you could use it to auto-classify images of an conference without any human resources involved. So your attendees could see images of things they might find interesting and the photographer only has to worry about snapping interesting photos and not have to worry about the dreaded work of organizing photos.
Google Cloud Vision API
As a developer we can’t sit on 1 person or companies view of the IT World. Therefore I decided to check out what Google’s Cloud vision API saw. It knew Aaron was human, in a room and doing “physical fitness”. All very relevant to the picture
It also gave us a what the dominant colors are of the picture and where we could crop the image
One other thing it can provide us is if the image is “safe” to make sure obscene images are not used
Tesseract Open Source OCR Engine
Lets say you don’t wait to give your data over to IBM or Google you could look into other repositories that allow you to do the algorithms locally such as tesseract that can be used for OCR. I’m not sure what other libraries are available for doing the other things that Watson and Google Cloud Vision provide, but I believe the open source community will be coming up with intuitive ideas and perhaps searching more we could find some libraries that can do what IBM and Google are doing.
Lets see how these services work with a different photo with text and faces turned directly to the camera.
Can see two males that are 18-24 years old and believes someone is their little brother and someone is their father and perhaps someone is wearing a wet suit (watson will need to improve on picking up skinny jeans.)
Now what Google was able to do was pretty crazy and I was wondering why Watson isn’t doing this or perhaps it will catch up to google after they have a big enough data set and a proper ML model.
Google was able to identify that the picture had faces expressing Joy! and thinks there’s possibly headwear but there wasn’t. It also new the roll, tilt and pan of the image which could be used to autocorrect the image to align correctly. It new the image was taken from twitter and that they were wearing a t-shirt and were having a “Good Time”. It also had links to web pages where this image could be found. It was able to pick up the text on the shirt, and on the marketing displays in the expo. It could even separate the text into the separate entities in the picture ( the shirt text, the 2 marketing display texts). It even could keep the text together even though the “o” in “Power” was hidden by some dangling glasses. It also new that it was a safe image
This is the future like it or not, and there’s some crazy applications that we can create when we combine the power of ML of both IBM, Google and open source projects ran locally. I sort of which we could put pandora’s box away but its already open so we can’t bury our heads in the sand and should start looking into how we can use this innovation for the good of humanity.