I dissected Apple’s patent to figure out how iPhone X Face ID works

By Kathy Li
September 21, 2017


USERS OF THE Samsung Galaxy S8 have reported that they are able to trick the smartphone’s facial recognition by presenting a photograph of the owner. At the latest launch event held at Cupertino’s Steve Jobs Theater, the iPhone X was unveiled. Apple’s newest iPhone apparently also boasts facial recognition capabilities, but sans the same pitfall of the S8.

Have you ever wondered how Apple was able to achieve the above? I, for one, have. A few months prior to the launch, TechCrunch revealed a patent application filed by Apple, thus spawning rumors about a potential face ID’ing feature. Now that Face ID has been officially announced, how closely does it match what has been described in the patent application? I read the full 22-page document to investigate.

(Note: For simplicity, I will be referring to Apple’s Presence Sensing patent application as “the patent” from here on out.)

(Note #2: Don’t get me wrong, I have talked to a number of friends who currently use the S8 and swear by it. It seems like a wonderful device overall.)

What we already know

If I were to read the legalese-heavy patent before knowing what would be released for sure, I would probably just end up grasping at straws. With the iPhone X cat now out of the bag, at least we have some confirmed hardware and features to base off of.

At the launch event, Apple’s Phil Schiller already walked us through a brief introduction of how Face ID works from a user’s perspective. Let’s revisit it here.


Source: Apple

According to Schiller, the verification is performed by the TrueDepth camera system, which is mostly packed into a tiny black bar sitting atop the screen. When you grab the iPhone X, the flood illuminator lights up your face with infrared (IR) light -- a type of light that is invisible to the human eye. The dot projector then projects over 30,000 invisible IR dots onto your face. The IR camera captures an IR image along with the dot pattern.

The data set is passed into a neural network (powered by the A11 Bionic chip) to create a 3D mathematical model of your face, which is checked against the one stored during setup. If the system returns a match, your phone will be unlocked.

While that’s all fine and dandy, my question is: How does the iPhone X know when to activate the 3D mapping captures? After all, it doesn’t seem very power efficient for the full-fledged Face ID function to be running all the time. Or so I would imagine. For example, when you are sleeping at night, you wouldn’t want your phone to be constantly taking 3D images of you, right?

Based on what I have read in the document, I am guessing that the Presence Sensing component might be the key to the gate here. And if it’s smart enough to determine whether a real human is present or not, that would also explain why Face ID cannot be fooled by photographs.

Presence Sensing

As with most patent applications, the published document is basically full of legalese and complex mathematical formulas. I will do my best to translate the technobabble into a more readable form.

In essence, Presence Sensing (PS) represents a system in a device that utilizes multiple sensors to help determine the device’s next state. It can operate even when the device is in sleep mode.


Source: US Patent & Trademark Office

This alone already sounds like what Face ID is equipped to do, doesn’t it? At least at a high level, anyway. Now, we’ll take a deeper look to see if there are more clues.

In the summary section, it is stated that PS utilizes at least one of three parameters:

  1. Skin tone detection
  2. Face detection
  3. Movement detection

In the detailed description, it is stipulated that a “light level determination” may be made (block 204 in FIG. 6) and provided to the skin tone detection routine, and possibly other routines as well. This could be what the Ambient Light Sensor (confirmed for iPhone X) is meant for, and could also be part of why Face ID is able to work in low light conditions.

Skin Tone Detection


Source: US Patent & Trademark Office

From FIG. 7, we can see that depending upon the results of the light level determination, different routines will be triggered for further skin tone detection. Beyond that, the actual routines look to me like a tangle of jargon that will take me much longer to investigate. Let’s skip those for now.

From what I can tell, the output of this tier will be a probability of user presence due to detecting a skin tone.

Face Detection

I am guessing that the face detection described here is not necessarily the same as the capturing and processing of 3D face models. Face detection has been around for a relatively long time. Even on a first-generation iPhone, every time you try to take a selfie, a square would appear on the screen -- predicting where your face might be. It’s nothing new per se.


Source: US Patent & Trademark Office

The face detection component here is most likely used only to determine whether a face (probably any face at all) is present. And like the skin tone detection component, it looks like a probability value will be returned for further calculations.

Motion Detection

If I were to give it a guess, I would imagine that the Proximity Sensor (also confirmed for iPhone X) is part of the motion detection component. The motion detection probably works a little like a vehicle’s park assist, with the ability to tell when you are approaching an object.

Tiered system

What about our “always on” concern? According to the patent, PS comprises multiple tiers. One reduced set of routines will indeed run all the time -- as long as the device is powered on -- to serve as a first gateway. It is possible that the TrueDepth system isn’t running in its entirety 24/7, but in a tiered manner instead.


Source: US Patent & Trademark Office

As for which exact component(s) belongs to the first pass, it hasn’t been clearly stated in the document.

It has been, nevertheless, pointed out that the fusion of the parameters and detection logic may include the use of neural networks, support vector machines, and/or some other form of probabilistic machine learning-based algorithm. What this also means, is that individual parts of the system can be further trained over time to produce progressively better results.

What about the 3D mappings?

Within the Presence Sensing patent, there is no explicit description of the capturing and processing of 3D face models. And that makes sense, as each patent is only supposed to cover one invention. The complexity of 3D modeling does seem to be out of scope here.

Upon further searches, I came across this other patent application filed earlier in 2017 -- “Scanning projectors and image capture modules for 3D mapping”. The filing date came just a little after Apple’s acquisition of several Israeli face recognition-focused companies. Coincidentally, one Israeli inventor has been specified in the document as a team member.

Could this particular patent application answer more of the questions? Perhaps it should be my next reading assignment.

Implications for AI’s future

As of today, the iPhone X has not officially shipped yet, so there are some nuts and bolts that have yet to be tested. Like many, I am skeptical about using Face ID in sensitive features like authorizing Apple Pay.

That being said, there is one thing about the whole TrueDepth camera system that draws my attention -- namely, the A11 Bionic chip. I am truly impressed by how fast it appears to handle all the machine learning-related computations. This casts hopes for even more progressive research and development of standalone AI-powered apps -- standalone being the operative word here. Why? Because it reduces the reliance on cloud computing (not always secure yet, especially as far as sensitive data is concerned) and connectivity.