Visual Search  and Object Recognition engine  8 Visual Search  and Object Recognition engine  9 Visual Search  and Object Recognition engine  10

Visual Search and Object Recognition Engine

Visual search is a growing trend in the consumer goods market: point a camera on the item you are interested in or take its picture, and the system will find a product in an online store among millions of options in seconds.

Just take a look at numbers to get an overwhelming realization of the role of visual search in an e-commerce future.

remote control object recognition stat

With visual objects recognition systems, you no longer need to define a product's color, design, or other characteristics to find what you need and get its characteristics and price.

The following global retail companies already use visual search:

ASOS, eBay, Neimann Marcus, Ted Baker, Blippar, EasyJet,  Levi’s, Disney, Walmart, Salesforce, Syte, Houseology CarStory, Snapchat’s, Farfetch, Marks, Pinterest, Amazon

Possible use cases

  • Industrial goodsWhen you need to find some electronics, machinery parts, goods for B2B sector etc., you have to use correct terms, models and serial numbers. Sometimes the term can be forgotten or it takes to much time to type correct good models, dimensions and all the numbers. Visual search solves this problem on a fly and finds needed item per photo in seconds.

  • Furniture. The concept is very simple: point the camera at any piece of furniture in a house, cafe, store, or even in a printed catalog and immediately get a few links to purchase it. The system identifies a piece of furniture – a sofa, chair, carpet, lamp, finds this object among the options presented in its internal catalog and immediately offers to buy it. GrokStyle already uses it.

  • Selling of used cars. For those who want to buy a used car, the search was simplified by a visual search on the CarStory resource. You take a picture of any vehicle you like, without even knowing its model, and the system finds the car on the website for sale. Thus, any parking in a city or even a country becomes a virtual car dealership. It takes a person to an online car store in a second.

  • Real estate companies. With the help of AI visual search, the process becomes much faster and easier for buyers, as well as more convenient to control for brokers. For example, the system classifies all the houses for sale by specific criteria: “Hi-tech houses”, “Duplexes”, “3-bedroom houses”, “Two-story houses”, etc. Based on the user's request, the system offers him houses matching his request for sale. As well as information about them: location, cost, number of bedrooms and bathrooms.

Our Visual Recognition Product

Being on the cutting-edge of product developing, we spent nearly a year creating the best solution servers to achieve speed and accuracy. As a result, we made a product that can find the needed product among thousands of similar ones with precise accuracy.

Our product is a constructor for any goods that provides visual product recognition with the CNN ensemble-based solution that is GPU-based and uses custom neural networks. The buyer must represent the non-work machine and send a photo to the sales director to take advantage of this feature. Let’s take a closer look at the system and how it was made.

Solution Features

Remote control recognition system can find proper product ID among a thousand similar products in a base by a photo, avoiding a human factor or lack of staff experience.

visual recognition 

  1. High recognition results. A neural network can recognize items that look identical. Such accurately and fast recognition could not be demonstrated by the most experienced employee, who would spend a lot of time and funding on training.

  2. Smartphone-friendly. Taken into account, that the buyer will mainly use a smartphone camera to take photos, we also took into consideration the number of factors during neuronal network training:

      • bright or dim lighting
      • bad angle
      • erased buttons and labels
      • complex background
      • photos with perspective.
  1. Neural networks power. The neural network assembly for the remote controller identification function enables you to determine the model accurately, not depending on the language of the buttons and labels.

  2. Quick retraining. The solution allows retraining without the participation of a developer. So you can easy and efficiently retrain the network with new products.

How We Taught the Machine to Recognize Objects

We used neural network training to turn object recognition into reality as a type of machine learning.

Preparing data. With different backgrounds, lights, and positions, we created thousands of images from all over the country.  We’ve created a special virtual studio, producing markup images (material) to train neural networks. In various positions and backgrounds we took photos of more than 40 000 images of a particular object. We have also compiled a database of photographs gathered manually.

Phase 1. This phase lasted 3 weeks. The neural network took over 3 million steps during this time

Phase 2 lasted a few more weeks. We have prepared more training material and changed the methodology. The neural network took 3 more million steps. It has finally produced an outstanding result, which is much less sensitive to the quality of incoming images.

Comparison with other Algorithms

There are a lot of algorithms that allow different levels of recognition

Hash-based. These algorithms are used in Google’s visual search and can quickly distinguish a “cat” from a “car”, but that’s how its capabilities are limited.

example of recognition

SIFT/SURF.  These algorithms allow detecting points of interest on an image but are too sensitive to light, damage of an original object, etc.

sift/surf comparison

Custom CNN (these are used in Google Lens, Google AutoML Vision, and other ready-made box-solutions). These algorithms help you to differentiate perfectly between one class and another but make mistakes if items are visually similar.

comparison with custom cnn

Technologies Used

TensorFlow – the machine learning framework.  We used it to construct a neural network that is optimal for further training with our material.

Google Cloud Vision API is used to recognize labels and to find the best match among the possible results, we used, and Soundex.

Soundex is an algorithm for comparing two strings by their sound, setting the same index for strings that have a similar sound in English.

We also used a full-text search and Levenshtein distance calculation.

Visual Recognition System Business Values

  • Visual Search can be integrated into any of your API service and the company's internal systems: chatbot, website, mobile application of your wholesales partners to allow your end-customers to find a needed item by photo.
  • Visual Search helps you measure a demand: collect what models appends in search often than others and how many times a particular item has been searched.
  • A priority list for users: if your goods are easy to find, you take an advantage in your users’ eyes.
  • Unique solution for your company. It doesn’t matter if you sell, Visual Search System can be easily trained to the new product.
  • Cost-savings. You will save costs for staff training and get a trendy tool for end-customers
  • Easy integration with API service.

Want to Implement Visual Search into your Business?

Already know how your business can benefit from an object or human face recognition system? Or, maybe you have bold ideas for your business? Please, feel free to contact us.

Also you can check our full presentation about Remote control Visual Search.

Remote control object recognition - Evergreen from Kravtsov Sergey


The images used in this article are taken from open sources and are used as illustrations.
Do you want to discuss your project or order development?