Face Recognition

In the sections below some notes on Face Recognition process are given.

Face Recognition Process

Face Recognition is the process of automatically identifying or verifying a person from a digital image(s) or video frames containing a unique face(s). There are two main applications of face-recognition Verification and Identification. However, before any face-recognition can be performed you first need to perform enrollment.


Associated command line tool: PapillonEnrollment

In order to perform face-recognition (either verification or identification) you first need to generate a description for the subject from their detected faces in the input image(s) or video frames. In face-recognition this description is more commonly known as a biometric template. Also, the process of turning a detected face(s) into a description is often referred to as feature-extraction with the data held in the description referred to as the features.

In the early days of face-recognition these features were handcrafted and measured things like the distance between the eyes or the length of the nose. However, with modern algorithms these features are implicity generated by powerful machine learning algorithms.

The diagram below shows the process for generating a description for a subject from a set of images or a video.


Note, all the faces in the input image(s) and/or video need to contain examples of faces of the same subject. However, these face images can come from a mixture of sources.

The Enrollment Engine

In Papillon there is an enrollment engine, papillon::PEnrollment, which can be used to help generate descriptions of subjects from videos, video streams(cameras) and images. This class will perform the whole processing chain including loading the default models with sensible parameters; performing face-detection and generating the descriptions. Optionally it will add the descriptions to a watchlist. The initialisation of the enrollment class is shown in the code snippet below.

papillon::PEnrollment enrollment;

The following code then generates a template for a subject from an image stored in a file...

papillon::PDescription descriptionOfSubject;
enrollment.EnrollFromImage("/path/to/image/subject.jpg", descriptionOfSubject);

The following code generates a template for a subject from a video clip...

PDescription descriptionOfSubject;
enrollment.EnrollFromVideo("/path/to/video/subject.mp4", descriptionOfSubject);

Note, in both instances only a single subject should be in the present in the input image(s) or video. If more than one face is detected in an image or frame of video then the functions return an error. If your application demands that there need to be more than one face in your video then please look at FaceLog2 which has been designed to cater to this situation.

A complete example showing enrollment then subsequent recognition example is shown in Learn how to do face-recognition with this SDK..

Within the SDK there is a command-line tool for performing enrollment from images or video streams (including H264 video files, webcam streams, RTSP streams, MJPEG streams, TVI streams). The tool also allows you to manage your watch list, listing, viewing and deleting subjects. Detailed instructions on the command-line tool can be found at PapillonEnrollment. The source code to this command-line tool can be found here Command line tool to perform Enrollment.

If you wish to learn how to perform the different steps in the enrollment process your self, then read the rest of this section.

Face Detection

The first step in the enrollment process is to find all the faces in the input image/frame. This is done using a Face Detector which provides the locations of all the faces in the input image/frame. Papillon has a very powerful and quick face-detector.

The following code creates the default face detector.

papillon::PDetector detector;
papillon::PDetector::Create("FaceDetector2", "", detector);

You can then detect faces in an input papillon::PFrame.

papillon::PDetectionList detectionList;
papillon::PDetectorOptions options;
detector.Detect(frame, options, detectionList);

The faces are stored as a list of papillon::PDetection's. Note, that there may be many faces in a single frame and these will be of different subjects.

A detailed example of using the Face Detector can be found in the supplied example Detect faces.

The Face Description

The process of describing takes the detected faces and generates a description (papillon::PDescription) for those face(s). This description contains information that is unique for that subject. It is this information that gets used to match the subjects. It is very important that only faces of the same identity are placed inside a single description. For each face that is added to a description a descriptor (papillon::PDescriptor) is added to the description. You can keep adding faces to a description (as long as you know it is the same identity). Generally this does improve performance, however you have to be careful not to add too many or the description can become very large, slowing down search times.

Note, you can also specify that for each descriptor you can store a thumbnail of the face that was used to generate it. This can be useful to help display information about the subject. However, it does increase the size of the description significantly.

The default face describer in Papillon is based on very powerful Deep Learning algorithms which have been trained on millions of labelled face images of 1,000s of subjects. For each face added to the description, a Deep Learning based descriptor is added to it.

More information on the describer plugins and how to use them can be found Describer (PDescriber).

The following code snippet generates the default face describer.

You can also generate a description from an example set (papillon::PExampleSet). An example set is a collection of detections from the same subject, that could come from a variety of sources. An example set can have an identity id and friendly name. This information gets added to the description when you generate it.

Note, descriptions are fairly generic and they have been designed to hold other information, other than face information. For example, you can pass the same description to a meta-data classifier and meta-information about the face will be added to the same description, as a separate descriptor.


Associated command line tool: PapillonVerification

Verification asks whether the faces are the same identity?

Verification is the process of matching two papillon::PDescription's against each other to determine whether they belong to the same identity. The matching is performed by papillon::PComparer.


In order to compare two descriptions you need a papillon::PComparer, which is capable of matching the information held in the descriptions. The job of the comparer is to provide a match score which determines how similar the two descriptions are. The higher the score, the more probable the match.

For most purposes the default comparer is good enough which can be constructed using the following code.

papillon::PComparer comparer;
PComparer::Create(comparer, 0.75);

This comparer has been given a threshold of 0.75. This is the threshold used to determine whether the two descriptions match. Note, the comparer gets used both for verification and identification tasks.

For example, if you have previously made two descriptions A and B, and had a valid comparer, then you can compare two descriptions using the code snippet below.

PMatchScore matchScore;
comparer.Compare(descriptionA, descriptionB, matchScore);
P_LOG_INFO << "Match score: " << matchScore.GetScore() << " IsVerified: " << matchScore.IsVerified();

In all cases the higher the score, the more similar the face-recognition think the faces are. In verification applications a threshold is applied to determine whether the match has been successful.

A complete example demonstrating verification is shown in Learn how to do face-recognition with this SDK..

Also, within the SDK there is a command-line tool for performing verification. This gives the ability to compare two faces coming from a watchlist, binary description files, a single image, multiple images or video streams (including H264 video files, webcam streams, RTSP streams, MJPEG streams, TVI streams). Detailed instructions on its use can be found at PapillonVerification.

The source code to this command-line tool can be found here Command line tool to perform Verification (1:1 matching) .


Associated command line tool: PapillonIdentification

Identification asks what is the identity of this face, are they in my database and if so who?

Identification is the process is matching an unknown description of an identity against a set of known descriptions stored in a watch list in a attempt to ascertain their identity. The output of identification is an ordered list of the top N matching subjects, that have a match score over a chosen threshold. In this SDK there is a class papillon::PWatchlist which holds a collection of papillon::PDescription's (typically known as a gallery in face-recognition). The helper class papillon::PEnrollment can automatically generate descriptions and insert/update them into your watchlist. You can then search the watchlist (papillon::PWatchlist::Search) by providing a probe descrption and obtain a set of identify results.

Below is some pseudo code that generates a small watchlist, performs a search and prints out the top match.

// Populate the watchlist with some known descriptions
PWatchlist watchlist;
// Perform search using an unknown description
PIdentifyResults identifyResults;
int32 topN = 1;
float threshold = 0.75;
watchlist.Search(unknownDescription, comparer, identifyResultsi, topN, threshold);
// Look at top result
PIdentifyResult topMatch = identifyResults.Get(0);
P_LOG_INFO << topMatch.GetIdentityId() << " " << topMatch.GetScore();

A complete example demonstrating identification is shown in Learn how to do face-recognition with this SDK..

Also, within the SDK there is a command-line tool for performing identification of faces found in a set of images, video-stream, web-camera or RTSP stream. Detailed instructions on how to use this tool can be found at PapillonIdentification.

The source code to this command-line tool can be found here Command line tool to perform Identification (1:N matching).

Live Video Face Recognition

Within the SDK there exists a very powerful analytic for which has been tuned for face-recognition on a live video stream, given a watch list (FaceLog2 (DEPRECATED, favour FaceLog6)). This analytic will track faces in the video stream, releasing sighting events.

Face Recognition Performance

There are many factors which determine how successful a deployed face-recognition system will be. Automatic Face Recognition works best when the face in the image has a frontal pose (within +/- 30 degrees) and has an inter-occular distance (distance between the eyes) of at least 50 pixels. Ideally the face in the image should also be uniformly illuminated, with no cast shadows. However, the algorithms supplied in this current SDK release will work also with lower quality data but the results will be less reliable.

In general there are two main aspects to consider when designing a face-recognition application. Is the application going to be

  • Consensual or Non-Consensual? and
  • Constrained or Non-Constrained?

Consensual or Non-Consensual

This refers to whether the subjects who are using the face-recognition system are aware they are being surveyed. Typically, in verification systems the subject is aware and they actively help. For example, in an access control system based on face-recognition the subject wants to get into the building so they quickly learn to present their face to the camera. This results in a high-resolution, frontal pose face image with neutral expressions.

Conversely, in more covert applications the camera may be hidden and the subjects are unaware they are being captured. In this situation the faces can be lower resolution, off-frontal and verying pose. Often there will also be more motion blur as the subjects are moving.

Obviously, face-recognition will perform best in Consensual applications and different applications of face-recogntion fall in between these two extremes. For example, with correct camera placing at pinch points (i.e. door entrances, the top of esculators) the application can be made semi-consensual.

Constrained or Non-Constrained

This refers to capture environment. A constrained environment has ideal conditions for face-recognition. This means you have complete control over the lighting and other external factors. A good example of a constrained environment is that in which you get your passport photo taken. ISO have released a standard (ISO-19794-5) which relates to best practices for taking images of faces in ideal conditions for both human and computer recognition of faces.

Conversely, non-constrained conditions mean when you have little or no control over the capture environment. The extreme example of this is trying to perform face recognition in big open-spaces, out-door like parks, football stadiums. Here, the scene is being lit by the moving sun in the day and then no light at night. Under these circumstances the same face of the same subject can look completely different and represent the most difficult challenge to automatic face-recognition.

Again, for most applications some control can be placed to make your application semi-constrained to get best performance.

Face Recognition Use Cases

The following diagram helps show the trade off between the different use-cases and the level of difficulty of the application.

Human Face Recognition?

It has been shown that humans are very good at familar face-matching. Over time we build complex models of known individuals in our brain and are able to recognise them under extreme transformations (i.e. bad lighting, shadow, deformation).

However, it also been shown that humans are very bad at unfamiliar face-recognition. This is akin to a security guard trying to perform face matching of an unknown person agains their passport photo. The security guard is unable to do a very good or consistent job. It has been shown that computer face-recognition far exceeds human performance in these situations, which actually represent most use-cases of face-recogntion technology.

Error Measurements

Below are defined some typical error measurements such as Failure To Enroll (FTE), False Acceptance (FA) and False Rejection (FR) which are used to measure the performance of a face-recognition system.

Failure To Enroll

This occurs when the engine is unable to detect a face in the input image/video. This typically happens because the face is too small in the image. However, there are occasions when this is not the case and the face detector has failed to detect the face. In these cases a more manual process will be needed.

False Acceptance

False acceptance is when you compare two descriptions of the different identities and the system says they are the same. That is the match score is above the threshold. It is potentially a very dangerous error as you could of allowed them access to some resources they should not have access to. The false acceptance rate can be lowered by increasing the threshold value. However, this will increase the number of false rejections.

Typically a FA rate is reported as a percentage (or fraction) and scales between 0 and 100 (or 0 and 1). For example, an FA rate of 0.1% means that 1 in every 1,000 comparisons (if random) will result in a false acceptance.

False Rejection (or True Positive)

False rejection is when you compare two descriptions of the same identity and the system says they are different, i.e. the match score is below the threshold value. This can be annoying to a user, as you could be blocking them access to some resource they need quickly. The false rejection rate can be lowered by lowering the threshold. However, this will increase the chance of false acceptance.

Typically a FR rate is reported as a percentage (or fraction) and scales between 0 and 100 (or 0 and 1). For example, an FR rate of 0.1% means that 1 in every 1,000 comparisons (if random) will result in a false rejection.

The True Positive rate is the opposite to the False Rejection rate (i.e. TP = 100 - FA). Some people prefer using this metric as it gives the success rate of the system. For example a system with a 99.9% True Positive rate will verify 999 out of every 1,000 people successfully.

The Receiver Operator Characteristic Curve

The false acceptance and false rejection rates are dependent on one-another and are linked by the threshold. Often this relationship is shown on a ROC curve. It is usual to report the FR rates at key FA points taken from the RoC curve, for example at the 0.001% (1 in 100,000), 0.01% (1 in 10,000), 0.1% (1 in 1,000) and 1% (1 in 100) levels.

Expected Performance

Below is a table of the performance we typically see on different data sets.

The first is an internal ISO data set which was taken in controlled conditions with consensual subjects.

The Non-ISO data sets are much more challenging data sets that are taken in less controlled conditions with non-consensual subjects.

  • LFW A commonly used dataset known as the Labelled Faces in The Wild.
  • Hard. An internal data set collected by Digital Barriers consisting of a balance of 6 ethnic groups.
  • IJB-C A large data set made available for facial recognition testing by NIST.

Papillon v4.9 and higher (model FR09)

Dataset FA=1% FA=0.1% FA=0.01% FA=0.001% EER
ISO 0.14% 0.14% 0.14% 0.17% 0.14%
Non-ISO (lfw) 0.23% 0.31% 1.37% 2.12% 2.27%
Non-ISO (hard) 0.02% 0.12% 0.76% 2.95% 0.11%
Non-ISO (IJB-C) 6.2% 8.6% 11.4% 17.4% -

Note, for Papillon v4.9 tests FaceDetector5 was used. In all previous tests FaceDetector2 was used.

Papillon v4.6 and 4.6.1 (model FR08)

Dataset FA=1% FA=0.1% FA=0.01% FA=0.001% EER
ISO 0.23% 0.23% 0.23% - 0.23%
Non-ISO (lfw) 0.23% 0.31% 1.37% - 0.23%
Non-ISO (hard) 0.85% 1.05% 2.15% - 0.85%
Non-ISO (IJB-C) 13.2% 32.7% 53.3% 70.9% -

Papillon v4.3, v4.4, v4.5 (model FR07)

Dataset FA=1% FA=0.1% FA=0.01% EER
ISO 0.23% 0.23% 0.23% 0.23%
Non-ISO (lfw) 0.23% 0.75% 2.42% 0.31%
Non-ISO (hard) 0.87% 1.38% 3.80% 0.89%

Papillon v4.2 (model FR06)

Dataset FA=1% FA=0.1% FA=0.01% EER
ISO 0.24% 0.35% 0.85% 0.27%
Non-ISO (lfw) 0.61% 2.64% 8.25% 0.68%
Non-ISO (hard) 1.09% 3.06% 10.40% 1.06%

Papillon v4.1 (model FR05)

Dataset FA=1% FA=0.1% FA=0.01% EER
ISO 0.24% 0.49% 1.61% 0.25%
Non-ISO (lfw) 1.51% 5.60% 15.22% 1.29%
Non-ISO (hard) 1.64% 7.64% 24.25% 1.38%

Papillon v4.0

Dataset FA=1% FA=0.1% FA=0.01% EER
ISO 0.34% 0.99% 3.49% 0.45%
Non-ISO 2.26% 9.27% 21.66% 1.55%

Note, the error rates reported are the False Rejection rates at different False Acceptance rates. The Equal Error Rate (EER) refers to where the False Acceptance Rate equals the False Rejection Rate.

In your application, if you think you are not getting these performance levels please contact us.

Threshold Levels

In many Face Recognition applications you have to set a threshold which determines whether two faces match. The table below suggests some threshold values to use for different security levels depending on the type of data being analysed. This table corresponds to FaceRecognition09 model, included with SDK 4.9 or higher.

False Acceptance Rate Security Level ISO Non-ISO
1.0% (1 in 100) Low 0.288 0.211
0.1% (1 in 1,000) Medium 0.398 0.316
0.01% (1 in 10,000) High 0.485 0.409
0.001% (1 in 100,000) Very High 0.581 0.491

Computational Requirements

The speed of the face-detector depends on the input resolution of the image and the minimum detection size requested. It is recommended that the minimum size is set to no lower than 80 pixels (this is the default value). Significant processing performance can be expected if this minimum is raised. However, this should only be done if you do not expect any faces of a lower width to appear in your input images or video stream.