Biometrics
Robotics
Resources
Ordering
Services

VeriSpeak Algorithm Features and Capabilities

Performance numbers are provided for a PC with Intel Core 2 Q9400 processor (2.67 GHz).

Download
VeriSpeak SDK
brochure
(PDF)

Neurotechnology has developed VeriSpeak, a PC-based speaker recognition algorithm designed for biometric system integrators. The VeriSpeak algorithm implements voice enrollment and voiceprint matching using proprietary sound processing technologies:

  • Text-dependent algorithm. The text-dependent speaker recognition is based on saying the same phrase for enrollment and verification. The VeriSpeak algorithm determines if a voice sample matches the template that was extracted from a specific phrase. During enrollment, one or more phrases are requested from the person being enrolled. Later that person may be asked to pronounce a specific phrase for verification. This method assures protection against the use of a covertly recorded random phrase from that person.
  • Two-factor authentication with a passphrase. The VeriSpeak voiceprint matching algorithm can be configured to work in a scenario, where each user records a unique phrase (such as passphrase or an answer to a "secret question" that is known only by the person being enrolled). Later a person is recognized by his or her own specific phrase with a high degree of accuracy. The overall system security increases as both voice authenticity and passphrase are checked.
  • Liveness detection. A system may request each user to enroll a set of unique phrases. Later the user will be requested to say a specifc phrase from the enrolled set. This way the system can ensure that a live person is being verified (as opposed to an impostor who uses a voice recording).
  • Identification capability. VeriSpeak functions can be used in 1-to-1 matching (verification) and 1-to-many (identification) modes.
  • Multiple samples of the same phrase. A template may store several voice records with the same phrase to improve recognition reliability. Certain natural voice variations (i.e. hoarse voice) or environment changes (i.e. office and outdoors) can be stored in the same template.
  • Fused matching. A system may ask users to pronounce several specific phrases during speaker verification or identification and match each audio sample against records in the database. The VeriSpeak algorithm can fuse the matching results for each phrase together to improve matching reliability.

Technical Specifications

At least 11,025 Hz sampling rate with at least 16-bit depth should be configured during voice recording.

At least 2-second long voice samples are recommended to assure recognition quality. Longer voice samples will improve the recognition quality.

See also the whole list of recommendations and constraints for speaker recognition.

All voice templates should be loaded into RAM before identification, thus the maximum voice template database size is limited by the amount of available RAM.

VeriSpeak 1.1 algorithm specifications have these dependencies from the voice sample length:

  • Linear dependence for voice template extraction time and template size in the database. For example, when using voice samples that are 2 times shorter, these values will be 2 times smaller.
  • Quadratic dependence for the template matching speed. For example, when using voice samples that are 2 times shorter, the matching will be 4 times faster.

VeriSpeak 1.1 can perform template matching in two modes:

  • Fixed phrase – all subjects in the database have recorded same phrase. This mode provides faster matching, but lower reliability.
  • Unique phrase – each subject in the database has recorded a specific phrase. This mode provides higher reliability, but slower matching.

The VeriSpeak voice template matching algorithm can be run on more than one processor core on multi-core processors, enabling an increase in template matching speed. The template matching speeds in the table below are given as a range, where the smaller number means matching speed using 1 processor core, while the larger number means matching speed using 4 processor cores. The specifications are provided for these processors:

  • Intel Core 2 Q9400 (4 cores), running at 2.67 GHz clock rate;
  • Intel Core i7-2600 (4 cores), running at 3.4 GHz clock rate.
VeriSpeak 1.1 algorithm specifications (for 5 second long voice samples)
  Intel Core 2 Q9400 Intel Core i7-2600
Fixed
phrase
Unique
phrase
Fixed
phrase
Unique
phrase
Voice template extraction time
(seconds)
0.12 - 0.15 0.08 - 0.10
Matching speed
(voiceprints per second)
230 - 920 140 - 560 450 - 1,800 250 - 1,000
Template size in database (1)
(bytes)
4,500 - 5,000

(1) When 1 voiceprint record is stored in a template. Template size increases proportionally when multiple voiceprint records are stored in the same template.

Reliability and Performance Tests

Experiment 1
VeriSpeak ROC chart calculated using voice samples from XM2VTS database
Click to zoom


Experiments 2 and 3
VeriSpeak ROC chart calculated using voice samples from Neurotechnology internal database
Click to zoom

The VeriSpeak 1.1 algorithm has been tested with voice samples taken from the XM2VTS Database, as well as with voice samples from Neurotechnology internal database.

These voice template extraction and matching experiments were performed:

  • Experiment 1 – used voice samples from the XM2VTS database. All samples include the same fixed phrase pronounced by all subjects.
  • Experiment 2 – used voice samples from Neurotechnology internal voice database 1. All samples include the same fixed phrase pronounced by all subjects.
  • Experiment 3 – used voice samples from Neurotechnology internal voice database 2. Each subject pronounced a unique phrase during his/her recording.

Receiver operation characteristics (ROC) curves are usually used to demonstrate the recognition quality of an algorithm. ROC curves show the dependence of false rejection rate (FRR) on the false acceptance rate (FAR). Charts with ROC curves for both databases are available on the right.

Template matching was performed using all 4 cores of the specified processors. The performance tests were performed on PCs with these processors:

  • Intel Core 2 Q9400, running at 2.67 GHz clock rate;
  • Intel Core i7-2600, running at 3.4 GHz clock rate.
VeriSpeak 1.1 algorithm tests with XM2VTS and Neurotechnology internal databases
  Exp. 1 Exp. 2 Exp. 3
Total voice samples in the database 2360 309 305
Subjects in the database 295 42 42
Recording sessions per subject 8 1 - 10 1 - 10
Average voice sample length (seconds) 7.112 4.975 6.214
Average template extraction speed
(seconds)
Core 2 Q9400 0.203 0.109 0.141
Core i7-2600 0.128 0.082 0.101
Average voiceprint template size (bytes) 6405 4503 5623
Template matching speed
(voiceprints per second)
Core 2 Q9400 532 1056 368
Core i7-2600 920 1848 600
Products
AFIS or multi-biometric fingerprint, iris, face and voice identification for large-scale systems.
MegaMatcher

Face identification for PC or Web solutions.
VeriLook

Fingerprint identification for PC and Web solutions.
VeriFinger

Iris identification for PC and Web solutions.
VeriEye

Speaker recognition for PC or Web applications.
VeriSpeak

Object recognition for robotics and computer vision.
SentiSight

SDKs for mobile devices:

More products for developers:

End-user products:
  • NCheck Finger Attendance – an attendance control application that uses fingerprint biometrics to perform employee identification.
  • NVeiler Video Filter – a plug-in for VirtualDub that automatically detects faces in a frame, tracks the faces (or other objects) in subsequent frames and hides them.
 
Copyright © 1998 - 2012 Neurotechnology | Terms & Conditions | Privacy Policy