Jann Horn

The Basics

Disclosure or Patch Date: March 7, 2022

Product: Arm Mali GPU driver for Linux/Android

Advisory:

Affected Versions: see Arm advisory (note that the affected version range for the Bifrost version of the related CVE-2021-28664 seems to be off-by-one)

First Patched Version:

  • for Arm: see Arm advisory
  • for Pixel: patch level 2022-03-05

Issue/Bug Report: N/A

Patch CL: https://android.googlesource.com/kernel/google-modules/gpu/+/5381ff7b4106b277ff207396e293ede2bf959f0c%5E%21/

Bug-Introducing CL: N/A, Arm usually only publishes driver versions as tarballs

Reporter(s): unknown

The Code

Proof-of-concept:

Exploit sample: N/A

Did you have access to the exploit sample when doing the analysis? no

The Vulnerability

Bug class: Broken access control logic

Vulnerability details:

The out-of-tree Mali driver allows userspace to create GPU memory objects from host-virtual memory areas using the memory type KBASE_MEM_TYPE_IMPORTED_USER_BUF, which grabs page references using pin_user_pages_remote() (or get_user_pages_remote() on older kernels). I think this is somewhat frowned upon in upstream GPU drivers nowadays; for comparison, the upstream Intel GPU driver i915 has a similar mechanism under the name userptr, but the function i915_gem_userptr_ioctl implementing this interface has the following comment on top of it:

https://elixir.bootlin.com/linux/v5.18.14/source/drivers/gpu/drm/i915/gem/i915_gem_userptr.c#L477

 * Also note, that the object created here is not currently a "first class"
 * object, in that several ioctls are banned. These are the CPU access
 * ioctls: mmap(), pwrite and pread. In practice, you are expected to use
 * direct access via your pointer rather than use those ioctls. Another
 * restriction is that we do not allow userptr surfaces to be pinned to the
 * hardware and so we reject any attempt to create a framebuffer out of a
 * userptr.
 *
 * If you think this is a good interface to use to pass GPU memory between
 * drivers, please use dma-buf instead. In fact, wherever possible use
 * dma-buf instead.

Unlike i915, the Mali driver makes it possible for host userspace to create a GPU memory object from a userspace area, but then access this object from userspace.

The driver uses flags on the GPU memory object to track access permissions:

  • KBASE_REG_GPU_RD and KBASE_REG_GPU_WR for read / write access from jobs running on the GPU through GPU-virtual addresses; this mainly works by controlling the ENTRY_ACCESS_RW and ENTRY_ACCESS_RO bits in the GPU page tables
  • KBASE_REG_CPU_RD and KBASE_REG_CPU_WR for read / write access from host kernel code (on behalf of host userspace) and host userspace; these flags affect VMA permission flags in the host kernel (which control permission bits in host page tables) and are also used for explicit permission checks in kernel code

However, in vulnerable versions of the driver, kbase_jd_user_buf_pin_pages() only checks the KBASE_REG_GPU_WR flag to determine whether pin_user_pages_remote() should request write access, and wrongly ignores the KBASE_REG_CPU_WR flag. The fix is essentially (with lots of duplicate changes to handle different kernel versions):

@ -4556,65 +4557,62 @@ int kbase_jd_user_buf_pin_pages(struct kbase_context *kctx,
                struct kbase_va_region *reg)
 {
        struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc;
        struct page **pages = alloc->imported.user_buf.pages;
        unsigned long address = alloc->imported.user_buf.address;
        struct mm_struct *mm = alloc->imported.user_buf.mm;
        long pinned_pages;
        long i;
+       int write;
 
        if (WARN_ON(alloc->type != KBASE_MEM_TYPE_IMPORTED_USER_BUF))
                return -EINVAL;
 
        if (alloc->nents) {
                if (WARN_ON(alloc->nents != alloc->imported.user_buf.nr_pages))
                        return -EINVAL;
                else
                        return 0;
        }
 
        if (WARN_ON(reg->gpu_alloc->imported.user_buf.mm != current->mm))
                return -EINVAL;
 
+       write = reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR);
+
[...]
        pinned_pages = pin_user_pages_remote(
                mm, address, alloc->imported.user_buf.nr_pages,
-               reg->flags & KBASE_REG_GPU_WR ? FOLL_WRITE : 0, pages, NULL,
-               NULL);
+               write ? FOLL_WRITE : 0, pages, NULL, NULL);
[...]

So in a vulnerable version, an attacker can write into read-only pages from shared libraries and such as follows:

  • Map some page from a shared library as read-only
  • Create a Mali KBASE_MEM_TYPE_IMPORTED_USER_BUF with KBASE_REG_CPU_WR but without KBASE_REG_GPU_WR from the victim page mapping; this involves creating a host-side VMA for the Mali memory object. The buffer has to be created in a way that doesn't set KBASE_REG_SHARE_BOTH.
  • Trigger kbase_jd_user_buf_pin_pages() on this memory object (either via KBASE_IOCTL_KCPU_QUEUE_ENQUEUE with a BASE_KCPU_COMMAND_TYPE_MAP_IMPORT command, or by submitting an atom with BASE_JD_REQ_EXTERNAL_RESOURCES) to execute the incorrect get_user_pages() call
  • Write into the Mali memory object from host userspace

Patch analysis:

The patch addresses the remaining site that was missed in the CVE-2021-28664 fix (see below). At this point, I see no remaining places in the driver that look up page pointers with access flags that don't match the corresponding Mali memory object.

Thoughts on how this vuln might have been found:

This vulnerability is a straightforward variant of a previous Mali bug, CVE-2021-28664, which was fixed as follows around 10 months earlier (from the diff between Mali Bifrost r29p0 and r30p0):

 static struct kbase_va_region *kbase_mem_from_user_buffer(
                struct kbase_context *kctx, unsigned long address,
                unsigned long size, u64 *va_pages, u64 *flags)
 {
[...]
+       int write;
[...]
+       write = reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR);
+
 #if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE
        faulted_pages = get_user_pages(current, current->mm, address, *va_pages,
 #if KERNEL_VERSION(4, 4, 168) <= LINUX_VERSION_CODE && \
 KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE
-                       reg->flags & KBASE_REG_CPU_WR ? FOLL_WRITE : 0,
-                       pages, NULL);
+                       write ? FOLL_WRITE : 0, pages, NULL);
 #else
-                       reg->flags & KBASE_REG_CPU_WR, 0, pages, NULL);
+                       write, 0, pages, NULL);
 #endif
 #elif KERNEL_VERSION(4, 9, 0) > LINUX_VERSION_CODE
        faulted_pages = get_user_pages(address, *va_pages,
-                       reg->flags & KBASE_REG_CPU_WR, 0, pages, NULL);
+                       write, 0, pages, NULL);
 #else
        faulted_pages = get_user_pages(address, *va_pages,
-                       reg->flags & KBASE_REG_CPU_WR ? FOLL_WRITE : 0,
-                       pages, NULL);
+                       write ? FOLL_WRITE : 0, pages, NULL);
 #endif

This is very similar to the patch linked above - essentially, this was a bug in duplicated code, and only one instance of it was patched. Both copies of the code call get_user_pages() to grab page references for a KBASE_MEM_TYPE_IMPORTED_USER_BUF memory object, and both of them wrongly ignored KBASE_REG_CPU_WR. The only difference between them is that one copy is for the case where pages are pinned directly at object creation, while the other copy handles the case where pages are pinned at a later point. Which one of these codepaths is used depends on the KBASE_REG_SHARE_BOTH flag.

It seems likely that an attacker could have discovered this issue by looking at the fix for CVE-2021-28664 and searching for other get_user_pages() callers in the Mali driver.

There has also been at least one very similar issue in an upstream graphics driver: https://git.kernel.org/linus/cd5297b0855f

(Historical/present/future) context of bug:

See previous section. Additionally:

Looking through the list of public Mali bugs for issues described as "Mali GPU Kernel Driver elevates CPU RO pages to writable", there is a third bug CVE-2021-44828 with this description. This bug doesn't involve get_user_pages(), but it does again involve a missing check for the KBASE_REG_CPU_WR flag.

Various methods across the driver (kbase_kcpu_jit_allocate_process(), kbasep_write_soft_event_status() and kbase_jit_allocate_process()) would write to Mali memory objects on behalf of the user, but instead of doing this by directly writing to corresponding userspace-virtual addresses, they map the corresponding page into kernel-virtual memory using kbase_vmap(), then write to this kernel-virtual address. The bug was that there was no check to ensure that the Mali memory object was actually marked as writable using KBASE_REG_CPU_WR. This was addressed by instead using kbase_vmap_prot(), which performs the necessary access check.

The Exploit

(The terms exploit primitive, exploit strategy, exploit technique, and exploit flow are defined here.)

Exploit strategy (or strategies):

Exploit flow:

Known cases of the same exploit flow:

Part of an exploit chain?

The Next Steps

Variant analysis

Areas/approach for variant analysis (and why):

  • Audit permission flag checks in Mali and other GPU drivers for memory imported via get_user_pages(). (TODO)

Found variants:

Structural improvements

What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?

Ideas to kill the bug class:

  • Maybe get rid of the get_user_pages-based interface if it's unnecessary, since having KBASE_MEM_TYPE_IMPORTED_USER_BUF makes the impact of these types of bugs much worse?

Ideas to mitigate the exploit flow:

Other potential improvements:

0-day detection methods

What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected as a 0-day?

Other References