We are often asked about the accuracy of the Facewatch algorithm and what sets us apart from the others. It’s important to bear in mind that we are detecting faces in live conditions and achieving an accuracy rate of 99.87%.
Bias in terms of facial recognition relates to variation in accuracy, as quantified by false match rate or false non-match rate, across a multi-dimensional landscape of faces by age, gender, skin-tone and other facial attributes.
Previously, it was true that facial recognition algorithms had an issue with bias. However, in the past 5-10 years, software developers such as SAFR have focused their attention on combatting bias in facial recognition.
Modern algorithms have been trained on enough data to ensure that racial bias is pretty much eliminated. Facewatch take this a step further by also using two algorithms, with the primary one being SAFR, and trained facial analysts who have been selected because they have been tested on disparate data sets of faces.
In the year 2000 a US Government Agency called the National Institute of Standards and Technology (NIST) established the Face Recognition Vendor Test (FRVT) program to evaluate the performance of face recognition algorithms which are submitted by software developers from around the world.
In 2019 NIST tested nearly 200 face recognition algorithms from nearly 100 developers with databases of more than 18 million images of more than 8 million people to quantify the demographics differences by sex, age, and race or country of birth. The results of these tests were published in the NIST FRVT Part 3 Demographic Effects report and found that most, but not all, of the algorithms did have a problem with bias which meant that false match rates were usually highest in African and East Asian people, and lowest in Eastern European individuals.
Conversely, for algorithms developed in China, this effect was reversed, with the lowest false positive rates on East Asian Faces, indicating that the training dataset for most algorithms was predominately made up of white faces apart from Chinese software developers whose datasets were predominantly made up of Chinese faces.
One important exception that the NIST noted in their report was that developers such as SAFR supplied identification algorithms “for which false positive differentials are undetectable”. What this shows is that the approach of curating a diverse and representative dataset does make it possible for software developers to train a facial recognition algorithm to have virtually no bias.
An AI or machine learning model can only be as good as the dataset it was trained upon and so with this in mind, from the outset SAFR curated a diverse and representative training dataset of high quality face images to train their SAFR facial recognition algorithms. They also continue to test and measure demographic differentials and further train their facial recognition algorithm to improve accuracy whilst maintaining low bias.