Biometrics
Robotics
Resources
Ordering
Services

SentiSight object learning and recognition processes

Download
SentiSight SDK
brochure
(PDF)

Object learning process

Object learning
SentiSight learns an object in manual mode Click to zoom

In order to recognize an object in an image, the appearance of an object must first be memorized. In the learning phase, SentiSight algorithms extract specific object features from a video stream or single image and save them into what is known as an object's model.

In many cases there is more information in a video or single image than just the object you want SentiSight to learn, like a background, other objects in the room or a hand holding the object. Therefore, to learn an object, information about the exact location of the object in the image should be provided.

SentiSight supports 2 methods of object learning: manual and automatic.

Manual object learning is suitable for most situations. A user must perform these steps for manual object learning in the SentiSight 3.0 SDK:

  1. Outline object's shape on an image by marking object's corner points to build a polygon. The image can be provided by user from image file, video file or live video stream.
  2. Choose if to use local features-based, or shapes-based algorithm, or both algorithms.
  3. Optionally choose more images of the object and repeat Step 1 for each image. The algorithm assists the user by estimating an approximate shape of the object if the object in an image is recognized using data from previous images. Learning the object from different sides and angles results in better recognition quality.
  4. Input the learned object name (ID) into the system.

Automatic object learning is suitable for lightweight movable objects. This learning procedure is based on detecting an object by excluding a static background and object's holder (usually a hand) from an image.

A user must perform these steps for automatic object learning in the SentiSight 3.0 SDK:

  1. Choose a background and direct the camera to it.
  2. Choose a holder – an object that will be used to hold and move the learned object. A user's hand can be the "holder".
  3. This "holder" should be presented it to the camera first, in various poses and configurations (if it is not rigid object) so that it can be learned by SentiSight.
  4. Choose if to use local features-based, or shapes-based algorithm, or both algorithms.
  5. After the holder has been learned, SentiSight is ready to learn the object itself, by having the holder rotate and move the object closer and further from the camera.
  6. Input the learned object name (ID) into the system.

Therefore, the automatic method requires to use live video or to provide separate videos or image sets of background, holder and object. Also, the other background elements could be learned together with the object if the object is hardly separable from the background. This can affect the ability of the algorithm to recognize the unique qualities of the object and may result in the object being misclassified with other objects that have the same background.

Manual object learning should be used for objects that cannot be moved or if there are no way to provide separate media with objects background and/or holder. Thus, automatic learning provides less amount of user interaction with the system, but it is not as precise as manual learning. Also manual learning is suitable for wider range of cases.

Object recognition process

Object recognition
SentiSight recognizes an object Click to zoom

Object recognition requires no user interaction apart from providing a video file with the object or pointing a camera to the scene where the learned object is presented or will appear. When the object appears in the vision field, SentiSight tries to recognize it. If the object is recognized by SentiSight, object's name (ID) and coordinates are returned.

The SentiSight algorithm creates a model with possible views from different sides, in different 3D poses and in different lighting conditions in object learning stage. This object's model improves recognition capability.

SentiSight algorithms and technology capabilities

All performance tests were made on Intel Core i7 processor with 4 cores running at 2.67 GHz.

SentiSight is designed to be as universal as possible and is able to perform fully automatic and manual object learning. The technology can be used for a wide range of tasks, including:

  • Recognition of documents, stamps, labels, packaging and other items for sorting, logo masking, usage monitoring and similar applications
  • Object counting and inspection for assembly lines and other industrial applications
  • Augmented and extended reality applications for toys, games, device and Web applications such as: smart toys for children that recognize cards, images, pictograms, etc.; recognition of places based on photographs and recognition of products such as beverages, foods and other consumer goods.
  • Robotic vision for navigation and manipulation
  • Law enforcement applications for identification, such as tattoo recognition

The SentiSight 3.0 technology has these capabilities for advanced visual-based object learning and recognition:

  • Accurate object detection. The SentiSight algorithm is able to find out:
    • whether a particular object is presented in a scene;
    • where the object is located in the scene;
    • how many instances of the object are there in the scene.
  • Two algorithms for object recognition. Depending on the object type one of these algorithms (or both) may be used for successful recognition:
    • Local features based algorithm uses small details of an object as distinctive features that are extracted into an object model and are used later to recognize the object. This algorithm has fast performance but is not suitable for solid-colored, reflecting or transparent (glass etc.) objects.
    • Shape recognition
      SentiSight shape recognition screenshots thumbnail Click to zoom
      Shape based algorithm is useful for the objects, which do not have any distinctive details but have stable external edges (boundaries) and / or internal edges. This algorithm has slower performance but allows to recognize most objects that are not recognized by the local features based algorithm.
  • Simultaneous multiple object recognition. The SentiSight algorithm provides simultaneous multiple 2D and 3D object detection and recognition.
  • Object evaluation. The algorithm is also able to estimate the region an object occupies in a scene, providing additional information about the size, orientation and scale of the recognized object.
  • Fast image processing. SentiSight can process video streams in real time, so it can be used for real-time applications. Also the algorithm is able to run several threads on multi-core processors making the recognition several times faster.
  • Object tracking mode. The SentiSight 3.0 library has a tracking mode for tasks that need very fast image processing during object recognition stage. The tracking works with complex backgrounds and fast moving objects. Tracking is initialized if an object is recognized and located, then tracks the object until it changes somewhat in appearance, at which point tracking is reinitialized by recognition. In tracking mode SentiSight is able to process more that 100 frames per second (320 x 240 pixels).

Technical Specifications

All specifications are given for Intel Core i7 processor with 4 cores running at 2.67 GHz.

The specifications are given for SentiSight 3.0 local features recognition and shape recognition algorithms. These algorithms can be used separately depending on object type, or together.

The specifications are provided for 320 x 240 pixels images. These performance dependencies from image area are valid for the same images with different resolutions:

  • Local features based algorithm has linear dependence for object learning and linearithmic (n log n) dependence for object recognition.
  • Shape based algorithm has linearithmic (n log n) dependence for object learning and quadratic dependence for object recognition.

Object model size depends on how feature-rich is an object, and thus is individual for each object.

These conditions may alter algorithms performance:

  • Rotation and translation. The algorithm is generally rotation and translation invariant in a plane perpendicular to the camera. Also the algorithm is invariant for rotations up to 10-15 degrees out of a plane perpendicular to the camera. Different views of an object can be added to a model to handle larger rotations.
  • Resolution and scale changes. Scale (size in image) difference between object's model and object itself can be up to 2-3 times. Objects should contain enough details, and be large enough to be recognized.
  • Occlusions. The algorithm is robust to occlusions as big as 50 % of the objects size if enough unique edges remain visible.
  • Lighting conditions (illumination, shadows and reflectance).
    • Planar objects only have problems with reflectance.
    • 3D objects have problems with varying lighting conditions, but constant lighting conditions do not cause many problems.
  • Transparency. In general transparent objects are difficult to recognize.
  • Rigidity. The algorithm can recognize only rigid objects. At least significant part of the object should be rigid.

Object recognition algorithms can be run in more than one thread on multi-core processors allowing to increase object model matching speed. The table below provides object recognition speeds as a range, where the smaller number means recognition speed using 1 thread, while the larger number means recognition speed using 8 threads. Note, that the specified processor executes 2 threads per one processor core in parallel.

SentiSight 3.0 object recognition algorithms technical specifications.
  Local features
recognition algorithm
Shape
recognition algorithm
Static Background Extraction/
Object mask separation
25 frames per second
Learning: Processing of single objects' frame 0.03 seconds 0.58 seconds
Learning: Generalization time
(for 100 frames of object)
0.5 seconds Not applicable
Recognition speed (1)
(models per second)
13,000 - 48,000 1,800 - 4,500

(1) When object model contains one template. Object model may contain multiple templates (usually corresponding with different viewpoints), therefore the algorithm will compare an object against all templates in the model before returning the recognition result. Also, this recognition speed is reached with sufficiently big databases (2,000 images and more); with smaller databases the recognition is slower.

Reliability and Performance Tests

All tests were performed on Intel Core i7 processor with 4 cores running at 2.67 GHz.

SentiSight 3.0 algorithm was tested with a subset of Amsterdam Library of Object Images (ALOI).

  • The subset contained objects 1-100 from ALOI.
  • Images with object viewpoint variations (ALOI-VIEW collection) were used. 36 images per object were used.

Local features and shape based algorithms from SentiSight 3.0 were tested separately.

SentiSight 3.0 performance was tested on these image resolutions:

  • 768 x 576 pixels – the original full resolution images from ALOI.
  • 320 x 240 pixels – obtained by resizing the 768 x 576 images before testing.

At 0.1% False Acceptance Rate (FAR), the recognition rate is from 70% to more than 99% depending on object structural appearance, transparency, etc. For objects with well defined intenal structure, the recognition rate is 98% - 99% at 0.1% FAR.

SentiSight 3.0 algorithms tests
  Local features
recognition
Shape
recognition
768 x 576 320 x 240 768 x 576 320 x 240
Average learning time
for 1 image
(seconds)
0.0791 0.0143 1.4718 0.2968
Average learning time
for 1 object (36 images)
(seconds)
2.8484 0.5148 52.9835 10.6865
Average recognition speed
when 1 thread is running
(templates per second)
4561 25435 106 3614
Average recognition speed
when 8 threads are running
(templates per second)
17455 96435 265 9035
Average object model size
(kilobytes)
722.80 222.17 3258.69 489.53
Products
AFIS or multi-biometric fingerprint, iris, face and voice identification for large-scale systems.
MegaMatcher

Face identification for PC or Web solutions.
VeriLook

Fingerprint identification for PC and Web solutions.
VeriFinger

Iris identification for PC and Web solutions.
VeriEye

Speaker recognition for PC or Web applications.
VeriSpeak

Object recognition for robotics and computer vision.
SentiSight

More products for developers:

End-user products:
  • NCheck Finger Attendance – an attendance control application that uses fingerprint biometrics to perform employee identification.
  • NVeiler Video Filter – a plug-in for VirtualDub that automatically detects faces in a frame, tracks the faces (or other objects) in subsequent frames and hides them.
 
Copyright © 1998 - 2012 Neurotechnology | Terms & Conditions | Privacy Policy