Visual search is a strong trend in the consumer goods market. It allows a user to find any good just by taking a picture of an item or pointing a camera on it. Then the program will find this product in an online store in a fraction of seconds and report its characteristics and price.
Some time ago, Your Remote Controller introduced the “product selection by photo” service. To take advantage of this function, the buyer needed to depict his non-working device and send the photo to the sales manager. And then an experienced manager figured out “what kind of remote controller is on the photo” and searched for the necessary model in the warehouse.This approach has an obvious complexity – the human factor and the experience of the staff. For example, only a manager with extensive experience can recognize a model of remote control bitten by a dog with half-eaten buttons or a spare part for a car. Only an expert with experience can identify an insect that has bitten you from a photo. A newly hired employee will not be able to cope with this task because he needs to be trained for a long time.
How We Taught the Machine to Recognize Objects
To turn the idea of object recognition into a reality, we used neural network training as a kind of machine learning. The first step was to prepare source materials. In this case, we made photos of more than 40 thousand images of a particular object in different positions and on all kinds of backgrounds. Some of the pictures were taken manually but mostly were generated in a virtual studio in semi-automatic mode.Phase 1 of training lasted three weeks. During this time, the neural network has taken more than 3 million steps. The result did not satisfy us because the network usually recognized only photos taken in ideal conditions. Any shadows, glare, perspective distortion led to a critical loss of recognition accuracy, so we started Phase 2 of the neural network training.
Since the Buyer Will Mainly Take Photos on the Phone, While Neural Network Training, We Took into Account the Number of Factors, Such As:
- bright or dim lighting
- bad angle
- erased buttons and labels
- bad background
- photos with perspective
Phase 2 of network training took a few more weeks. We prepared even more material for training and changed the approach. The neural network went through 3M steps and finally produced a really good result, much less sensitive to the quality of the incoming photo.
Technologies The Object Recognition Program Is Built On
- TensorFlow is a framework for machine learning our neural network is based on. We used Faster-RCNN-Inception-V2. This model is optimal for further training with our material;
- Google Cloud Vision, Soundex. We used the Google Cloud Vision API to recognize labels and to find the best match among the possible results. We used full-text search, Levenshtein distance calculation, and Soundex. Soundex is an algorithm for comparing two strings by their sounding which establishes the same index for strings that have a similar sounding in English.
How To Integrate Visual Search into Existing Systems?
The object recognition system we created is essentially an API service and easily integrates with the chatbot, website, mobile application, and the company's internal systems. Being API allows the system to function without human intervention as a fully automated process.
If you have bold ideas and need an image recognition system (absolutely any objects or human faces), please, feel free to contact us.