Image Annotation for Machine Learning | Evergreen blog 8 Image Annotation for Machine Learning | Evergreen blog 9 Image Annotation for Machine Learning | Evergreen blog 10

Image Annotation for Machine Learning

#Machine Learning

Image annotation is an integral part of Artificial Intelligence development, and it is one of the basic tasks in computer vision technology. Annotated images are needed to train machine learning algorithms to recognize objects contained in visuals and give computers the ability to ‘see’ almost like we humans do. 

Manual image annotation can be time-consuming and quite expensive, especially when the set of images that need annotation is extremely large. For techno-geeks, we’ve placed the description of some of these tasks below.

Image Annotation (click to expand)

is the human-powered task of adding labels to an image (annotating) to create training datasets for computer vision algorithms. AI engineers usually predetermine these labels manually using special image annotation software or tools: they define regions in an image and create text-based descriptions to them. 

Image annotation services include:

  • bounding boxes — drawing 2D vector boxes around the objects that need annotation within an image;
  • 3D cuboids — are almost the same as 2D boxes, only that they show the approximate depth of target objects;
  • lines and splines — it is the labeling of straight or curved lines on images: pathways, sidewalks, roads, etc. This technique is used for lane and boundary recognition, and trajectory planning for autonomous cars, drones, warehouse robots and in many other cases;
  • semantic segmentation — all objects in a picture are being annotated according to a list of segment labels;
  • pixel-precise/ pixel-wise segmentation;
  • polygonal segmentation — it is used to segment objects with irregular shapes and to capture all exact edges of an object;
  • image classification — the process of associating the whole image with a single label.

Automatic image annotation (AIA, also known as automatic image tagging) is the process in which a computer automatically assigns metadata to a digital image (captions or labels), using relevant keywords to describe its visual content. You can read more about automatic image captioning in our article

Existing image annotation algorithms can be divided into two categories: 

  • model-based learning methods — they explore the correlation between the visual features and their semantic meaning to discover a mapping function through machine learning or knowledge models for image annotation;
  • database-based models — they directly provide a sequence of plausible labels according to the already annotated images in the database.

AI annotation tools allow users to label more images in a shorter time and with greater efficiency by automating the majority of manual tasks and can be further trained to ‘translate’ new images more precisely.

Automatic Image Annotation


Open Source Annotation Tools

We’ve picked some open source solutions that can facilitate the data annotation process, or can be used as a base to develop custom AI annotation tools.


It is a free, open-source graphical image annotation tool written in Python, used for labeling objects in images. Annotations can be saved as XML files in PASCAL VOC/ YOLO format. LabelImg allows you to create bounding boxes to annotate objects, using Qt graphical interface. We at Evergreen used this solution to prepare datasets for neural network training in several projects that we developed.


CVAT is a free open source image and video annotation tool that provides easy labeling of datasets for computer vision. It allows users to annotate data for several machine learning tasks, including object recognition, image classification, and image segmentation. CVAT supports additional optional components: Deep Learning Deployment Toolkit (Intel Distribution of OpenVINO toolkit element), NVIDIA CUDA Toolkit, TensorFlow Object Detection API, and more.

CVAT example


Auto Annotate

It is an open-source solution for automatic image annotation. A Python class called ‘generate XML’ annotates the images through the inference of a pre-trained model to get the positions of the bounding boxes. The script also uses TensorFlow repository for training. Resulting images (with bounding boxes) and XML can be later opened in LabelImg.

Auto Annotation example


Commercial Image Annotation Tools

Several interesting solutions already exist on the market, developed to facilitate the process of image annotation for further application in commercial and academic projects. Here are some of our picks.

The platform provides several AI-powered annotation tools (DEXTR, Classification Predictor, Object Detection Assistant, and Instance Segmentation Assistant) along with manual annotation tools. Auto-generated annotations can be manually adjusted for better precision and quality. It is possible to train the object detection, semantic segmentation, and instance segmentation on your own datasets.

V7 Darwin

It is an automated AI-powered annotation tool that works for all data and automatically generates polygon and pixel-precise masks. You can define a region of interest where the object is present, and the deep learning algorithm will detect the most salient object or part visible and segment it. Darwin’s Auto-Annotate AI can generate very precise masks as the first guess.


A cloud-based annotation platform that comprises of multiple applications to automate the data preparation processes for retail, robotics, autonomous vehicles, precision agriculture, and more. Dataloop's annotation tools can be used on all prevalent types of visual media (images, video). You can integrate deep learning models and run automatic annotations using pre-trained classes. Human annotators then only fix or validate the labeled data to accelerate the annotation process.

Our team at Evergreen has been successfully implementing TensorFlow — an open-source machine learning framework — for training deep neural networks in our projects, and we have experience in implementing visual search and object recognition solutions for our clients. Are you interested in learning more about our business cases? Please contact us right away.

To Sum It Up

Labeling data in images, text, or video is essential to train deep learning algorithms by ‘feeding’ the model with information about what is shown in the picture and making objects recognizable for computers and machines.

The specialists at Evergreen have many years of experience in using machine learning and artificial intelligence technologies to develop projects in the field of visual search, face, and object recognition for different businesses. We can develop, support, and enhance an individual solution for a client: build an MVP based on the use of open-source solutions timely and cost-effectively, support and maintain the project at every growth stage, and lay the foundation for future development at the start.

If you are interested in creating a personalized AI-powered solution for your business or eCommerce project with elements of object recognition, don’t hesitate to contact us. We will be happy to offer you a unique product that uses innovative machine learning technology —  let's start today!

The images used in this article are taken from open sources and are used as illustrations.
Do you want to discuss your project or order development?