# CVE-2021-1048: Android kernel refcount increment on mid-destruction file
*Jann Horn*

## The Basics

**NOTE: The original vulnerability was in the Linux kernel, but in-the-wild
exploitation was only seen on Android-based devices, which run Android-specific
kernel forks**

**Disclosure or Patch Date:** it's complicated (but the Android bulletin is from 6 November 2021)

**Product:** Android / Linux kernel

**Advisory:** [ASB 2021-11](https://source.android.com/security/bulletin/2021-11-01#kernel-components_1)

**Affected Versions (upstream Linux):**
 - 5.9-rc2 - 5.9-rc3 (mainline: only release candidates affected)
 - 5.8.4 - 5.8.7 (short-lived stable branch)
   - date range: 2020-08-26 - 2020-09-09
 - 5.7.18 and higher (short-lived stable branch, EOL before fix)
   - date range: 2020-08-26 - EOL
 - 5.4.61 - 5.4.63 (LTS stable branch)
   - date range: 2020-08-26 - 2020-09-09
 - 4.19.142 - 4.19.143 (LTS stable branch)
   - date range: 2020-08-26 - 2020-09-09
 - 4.14.195 - 4.14.196
   - date range: 2020-08-26 - 2020-09-09
 - 4.9.234 - 4.9.235
   - date range: 2020-08-26 - 2020-09-12
 - 4.4.234 - 4.4.235
   - date range: 2020-08-26 - 2020-09-12

**Affected Versions (Android devices):** possibly some Android devices before SPL 2021-11-06, depending on LTS syncs

**First Patched Version:**
 - upstream: 5.9-rc4, 5.8.8, 5.4.64, 4.19.144, 4.14.197, 4.9.236, 4.4.236
 - Android devices: SPL 2021-11-06 or lower (see "context of bug" section for explanation)

**Issue/Bug Report (upstream Linux):** https://lore.kernel.org/linux-fsdevel/000000000000dc862405ae31ae9b@google.com/T/#u

**Issue/Bug Report (Android devices):** unknown

**Patch CL:** https://git.kernel.org/linus/77f4689de17c

**Bug-Introducing CL:** https://git.kernel.org/linus/a9ed4a6560b8 (bugfix for another memory corruption)

**Reporter(s) (upstream Linux):** syzbot/syzkaller

**Reporter(s) (Android devices):** unknown

## The Code

**Proof-of-concept:** N/A

**Exploit sample:** N/A

**Did you have access to the exploit sample when doing the analysis?** no

## The Vulnerability

**Bug class:** object state confusion leading to use-after-free

**Vulnerability details:**

`ep_loop_check_proc()` is trying to increment the refcount of a file with
`get_file()`. However, `get_file()` is only allowed when a refcounted reference
is already held to the file; and `ep_loop_check_proc()` instead relies on
locking `ep->mtx` to protect the weak reference to the file from concurrent
removal by `eventpoll_release()`, which doesn't prevent encountering a file with
refcount zero.

Here is a diagram of the relevant lifetime states of `struct file`:

![](CVE-2021-1048-file-states.png)

Essentially, `get_file()` is called on an object that may be in a state in which
`get_file()` is not permitted.

**Patch analysis:**

`get_file()` is replaced with `get_file_rcu()`, which is valid for (a superset
of) all possible states of the file.

**Thoughts on how this vuln might have been found _(fuzzing, code auditing, variant analysis, etc.)_:**
Since the bug was quickly fixed in upstream Linux, but not in all Android
devices, there's a good chance that the attackers specifically searched for
memory corruption fixes that are present upstream but not in Android devices.

This reminds me of
https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html ,
another case where a bug was fixed upstream but not in all Android kernels.

**(Historical/present/future) context of bug:** 

The commit that introduced the bug (and fixed another one) was included in the
Android Security Bulletin for December 2020, forcing all Android vendors to
include that commit. However, the fix for this bug, despite quickly landing in
upstream stable kernels (see "Affected Versions" above), was only included in an
Android Security Bulletin in November 2021.

This means that devices by Android vendors who only cherrypick bugfixes
referenced in Android Security Bulletins, rather than pulling the complete
Android common kernel tree, will have been vulnerable for almost a year, even
though upstream stable releases (and Android common kernels) were only affected
for ~2-3 weeks.

That doesn't necessarily mean that all Android devices were affected that long
though; for example, Pixel 4 XL devices seem to have been patched in their
March 2021 security update through the periodic LTS update from 4.14.191 to
4.14.199.
The kernel versions that were shipped to Pixel 4 XL devices are (from running
`strings` on `boot.img` in the firmware images):

 - in the December 2020 update: `4.14.191-gf6c9439f069c-ab6924784` (still vulnerable?)
 - in the January 2021 update: `4.14.191-gd36f32db91a3-ab6960308` (still vulnerable?)
 - in the February 2021 update: `4.14.191-gd36f32db91a3-ab7006457` (still vulnerable?)
 - in the March 2021 update: `4.14.199-g815ef3fd6754-ab7079165` (fixed)
 - in the April 2021 update: `4.14.199-gb0863551cb91-ab7132611` (fixed)


## The Exploit

(The terms *exploit primitive*, *exploit strategy*, *exploit technique*, and *exploit flow* are [defined here](https://googleprojectzero.blogspot.com/2020/06/a-survey-of-recent-ios-kernel-exploits.html).)

**Exploit strategy (or strategies):** N/A - no exploit sample to analyze

**Exploit flow:** 

**Known cases of the same exploit flow:**

**Part of an exploit chain?**

## The Next Steps

### Variant analysis

**Areas/approach for variant analysis (and why):**

I think there are two approaches for variant analysis here:

1. Check whether any Linux kernel patches listed in Android Security Bulletins
   are referenced by other commits in the `Fixes:` tag, and verify for any hits
   that they either aren't security-relevant or have also been included in an ASB.
2. Look whether there are any other codepaths that extract a file from an epoll
   item and assume that its refcount is non-zero.

**Found variants:**

I found no variants with clear security implications.

Re #1, the following upstream Linux commits referenced in bulletins from 2020
and 2021 are referenced by followup fix commits:

 - d0cb50185ae9 (`do_last(): fetch directory ->i_mode and ->i_uid before it's too late`)
   - followup: 6404674acd59 (`vfs: fix do_last() regression`)
     - reported by syzkaller: https://syzkaller.appspot.com/bug?extid=190005201ced78a74ad6
     - looks like just a NULL deref when racing?
 - 07e6124a1a46 (`vt: selection, close sel_buffer race`)
   - followup: e8c75a30a23c (`vt: selection, push sel_lock up`)
     - deadlock fix
   - followup: 4b70dd57a15d (`vt: selection, push console lock down`)
     - deadlock fix
 - 594cc251fdd0 (`make 'user_access_begin()' do 'access_ok()'`)
   - followup: ab10ae1c3bef (`lib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user()`)
     - looks like a powerpc-specific performance regression fix?
 - 6d390e4b5d48 (`locks: fix a potential use-after-free problem when wakeup a waiter`)
   - followup: dcf23ac3e846 (`locks: reinstate locks_delete_block optimization`)
     - performance regression fix
 - a9ed4a6560b8 (`epoll: Keep a reference on files added to the check list`)
   - followup: 77f4689de17c (`fix regression in "epoll: Keep a reference on files added to the check list"`)
     - original case
 - 21998a351512 (`x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.`)
   - followup: 33fc379df76b (`x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb`)
     - fixes incorrect reporting of speculation mitigation status on X86
   - followup: 1978b3a53a74 (`x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP`)
     - fixes not being able to turn on IBPB on X86
 - 8019ad13ef7f (`futex: Fix inode life-time issue`)
   - followup: 8d67743653dc (`futex: Unbreak futex hashing`)
     - performance regression fix, theoretically also correctness fix

Re #2: The only place that looks vaguely interesting in that regard is
`ep_item_poll()`: From what I can tell, it can invoke `vfs_poll()` on a file
whose refcount is already zero, but only before the file's `->release()` handler
is called. But I think that's fine.

### Structural improvements

What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?

**Ideas to kill the bug class:**
In my opinion, the bug class here is "object state confusion", and killing the
bug class would have to involve using static analysis and annotations to
sanity-check whether object states match the requirements.

**Ideas to mitigate the exploit flow:** N/A

**Other potential improvements:**
When cherrypicking specific security fixes, it would probably be a good idea to
at least monitor the upstream repository for commits that refer to the
cherrypicked patch with `Fixes:`.

### 0-day detection methods

What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected **as a 0-day**?

## Other References 
