VeriSpeak Algorithm Features and Capabilities
Performance numbers are provided for a PC with Intel Core 2 Q9400 processor (2.67 GHz).
Download VeriSpeak SDK brochure (PDF)
Complete information, including technical specifications, licensing and prices. The 18-page brochure can be printed on both Letter and A4 paper.
File size: 1.8 Megabytes; Updated on: April 19, 2012.
Neurotechnology has developed VeriSpeak, a PC-based speaker recognition algorithm designed for biometric system integrators.
The VeriSpeak algorithm implements voice enrollment and voiceprint matching using proprietary sound processing technologies:
-
Text-dependent algorithm.
The text-dependent speaker recognition is based on saying the same phrase for enrollment and verification.
The VeriSpeak algorithm determines if a voice sample matches the template that was extracted from a specific phrase.
During enrollment, one or more phrases are requested from the person being enrolled.
Later that person may be asked to pronounce a specific phrase for verification.
This method assures protection against the use of a covertly recorded random phrase from that person.
-
Two-factor authentication with a passphrase.
The VeriSpeak voiceprint matching algorithm can be configured to work in a scenario, where each user records a unique phrase (such as passphrase or an answer to a "secret question" that is known only by the person being enrolled).
Later a person is recognized by his or her own specific phrase with a high degree of accuracy.
The overall system security increases as both voice authenticity and passphrase are checked.
-
Liveness detection.
A system may request each user to enroll a set of unique phrases.
Later the user will be requested to say a specifc phrase from the enrolled set.
This way the system can ensure that a live person is being verified (as opposed to an impostor who uses a voice recording).
-
Identification capability.
VeriSpeak functions can be used in 1-to-1 matching (verification) and 1-to-many (identification) modes.
-
Multiple samples of the same phrase.
A template may store several voice records with the same phrase to improve recognition reliability.
Certain natural voice variations (i.e. hoarse voice) or environment changes (i.e. office and outdoors) can be stored in the same template.
-
Fused matching.
A system may ask users to pronounce several specific phrases during speaker verification or identification and match each audio sample against records in the database.
The VeriSpeak algorithm can fuse the matching results for each phrase together to improve matching reliability.
Go to VeriSpeak contents
Technical Specifications
At least 11,025 Hz sampling rate with at least 16-bit depth should be configured during voice recording.
At least 2-second long voice samples are recommended to assure recognition quality.
Longer voice samples will improve the recognition quality.
See also the whole list of recommendations and constraints for speaker recognition.
All voice templates should be loaded into RAM before identification, thus the maximum voice template database size is limited by the amount of available RAM.
VeriSpeak 1.1 algorithm specifications have these dependencies from the voice sample length:
-
Linear dependence for voice template extraction time and template size in the database.
For example, when using voice samples that are 2 times shorter, these values will be 2 times smaller.
-
Quadratic dependence for the template matching speed.
For example, when using voice samples that are 2 times shorter, the matching will be 4 times faster.
VeriSpeak 1.1 can perform template matching in two modes:
-
Fixed phrase – all subjects in the database have recorded same phrase.
This mode provides faster matching, but lower reliability.
-
Unique phrase – each subject in the database has recorded a specific phrase.
This mode provides higher reliability, but slower matching.
The VeriSpeak voice template matching algorithm can be run on more than one processor core on multi-core processors, enabling an increase in template matching speed.
The template matching speeds in the table below are given as a range, where the smaller number means matching speed using 1 processor core, while the larger number means matching speed using 4 processor cores.
The specifications are provided for these processors:
- Intel Core 2 Q9400 (4 cores), running at 2.67 GHz clock rate;
- Intel Core i7-2600 (4 cores), running at 3.4 GHz clock rate.
| VeriSpeak 1.1 algorithm specifications (for 5 second long voice samples) |
| |
Intel Core 2 Q9400 |
Intel Core i7-2600 |
Fixed phrase |
Unique phrase |
Fixed phrase |
Unique phrase |
Voice template extraction time (seconds) |
0.12 - 0.15 |
0.08 - 0.10 |
Matching speed (voiceprints per second) |
230 - 920 |
140 - 560 |
450 - 1,800 |
250 - 1,000 |
Template size in database (1) (bytes) |
4,500 - 5,000 |
(1)
When 1 voiceprint record is stored in a template.
Template size increases proportionally when multiple voiceprint records are stored in the same template.
Go to VeriSpeak contents
Reliability and Performance Tests
The VeriSpeak 1.1 algorithm has been tested with voice samples taken from the XM2VTS Database, as well as with voice samples from Neurotechnology internal database.
These voice template extraction and matching experiments were performed:
-
Experiment 1 –
used voice samples from the XM2VTS database.
All samples include the same fixed phrase pronounced by all subjects.
-
Experiment 2 –
used voice samples from Neurotechnology internal voice database 1.
All samples include the same fixed phrase pronounced by all subjects.
-
Experiment 3 –
used voice samples from Neurotechnology internal voice database 2.
Each subject pronounced a unique phrase during his/her recording.
Receiver operation characteristics (ROC) curves are usually used to demonstrate the recognition quality of an algorithm. ROC curves show the dependence of false rejection rate (FRR) on the false acceptance rate (FAR).
Charts with ROC curves for both databases are available on the right.
Template matching was performed using all 4 cores of the specified processors.
The performance tests were performed on PCs with these processors:
- Intel Core 2 Q9400, running at 2.67 GHz clock rate;
- Intel Core i7-2600, running at 3.4 GHz clock rate.
| VeriSpeak 1.1 algorithm tests with XM2VTS and Neurotechnology internal databases |
| |
Exp. 1 |
Exp. 2 |
Exp. 3 |
| Total voice samples in the database |
2360 |
309 |
305 |
| Subjects in the database |
295 |
42 |
42 |
| Recording sessions per subject |
8 |
1 - 10 |
1 - 10 |
| Average voice sample length (seconds) |
7.112 |
4.975 |
6.214 |
Average template extraction speed (seconds) |
Core 2 Q9400 |
0.203 |
0.109 |
0.141 |
| Core i7-2600 |
0.128 |
0.082 |
0.101 |
| Average voiceprint template size (bytes) |
6405 |
4503 |
5623 |
Template matching speed (voiceprints per second) |
Core 2 Q9400 |
532 |
1056 |
368 |
| Core i7-2600 |
920 |
1848 |
600 |
Go to VeriSpeak contents
|