Clement Lecigne, Google Threat Analysis Group

The Basics

Disclosure or Patch Date: October 2, 2023

Product: Qualcomm Adreno GPU

Advisory: https://docs.qualcomm.com/product/publicresources/securitybulletin/october-2023-bulletin.html

Affected Versions: N/A

First Patched Version: N/A

Issue/Bug Report: N/A

Patch CL: https://git.codelinaro.org/clo/la/kernel/msm-5.4/-/commit/502b225c16ccd1e6adab632faf2637ff6fe74569

Bug-Introducing CL: https://git.codelinaro.org/clo/la/kernel/msm-5.4/-/commit/36b196ce5c65ccade05e5e63ffe25c5661ad096e

Reporter(s): Clement Lecigne of Google's Threat Analysis Group

We would like to thank the Qualcomm product security team for their help while figuring out this vulnerability.

The Code

Proof-of-concept:

Pseudo-code triggering the overflow.

// Opening KGSL.
kgsl = open("/dev/kgsl-3d0", O_RDWR, 0);
// Drawing context.
ioctl(fd_kgsl, IOCTL_KGSL_DRAWCTXT_CREATE, ...);
// Create timeline.
ioctl(fd_kgsl, IOCTL_KGSL_TIMELINE_CREATE, ...);
// Issue GPU aux command.
struct kgsl_gpu_aux_command aux;
aux.flags =  KGSL_CONTEXT_SYNC|KGSL_GPU_AUX_COMMAND_TIMELINE;
aux.numsyncs = 0x1337;  // > 32.
ioctl(fd_kgsl, IOCTL_KGSL_GPU_AUX_COMMAND, &aux);

Exploit sample: Not public.

Did you have access to the exploit sample when doing the analysis? Yes

The Vulnerability

Bug class: Memory out-of-bounds write

Vulnerability details:

long kgsl_ioctl_gpu_aux_command(struct kgsl_device_private *dev_priv,
        unsigned int cmd, void *data)   // [0]
{
...
    if (param->flags & KGSL_GPU_AUX_COMMAND_SYNC) { // [1]
        ...
        ret = kgsl_drawobj_sync_add_synclist(device, syncobj,
                u64_to_user_ptr(param->synclist),
                param->syncsize, param->numsyncs);  // [2]
        if (ret)
            goto err;
    }
...
}
int kgsl_drawobj_sync_add_synclist(struct kgsl_device *device,
        struct kgsl_drawobj_sync *syncobj, void __user *ptr,
        unsigned int size, unsigned int count)
{
...
    syncobj->synclist = kcalloc(count,
        sizeof(struct kgsl_drawobj_sync_event), GFP_KERNEL);
...
    for (i = 0; i < count; i++) {
...
        ret = kgsl_drawobj_sync_add_sync(device, syncobj, &sync);   // [3]
...
    }
    return 0;
}
int kgsl_drawobj_sync_add_sync(struct kgsl_device *device,
    struct kgsl_drawobj_sync *syncobj,
    struct kgsl_cmd_syncpoint *sync)
{
    struct kgsl_drawobj *drawobj = DRAWOBJ(syncobj);
...
    else if (sync->type == KGSL_CMD_SYNCPOINT_TYPE_TIMELINE)
        return drawobj_add_sync_timeline(device,
            syncobj, sync->priv, sync->size);   // [4]
...
}
static int drawobj_add_sync_timeline(struct kgsl_device *device,
        struct kgsl_drawobj_sync *syncobj, void __user *uptr,
        u64 usize)
{
...
    if (copy_struct_from_user(&sync, sizeof(sync), uptr, usize))
        return -EFAULT;
...
    id = syncobj->numsyncs++;   // [5]
...
    event->id = id;
...
    /* Set pending flag before adding callback to avoid race */
    set_bit(event->id, &syncobj->pending);  // [6] - event->id is user-controlled.
...
}
/**
...
 * @pending: Bitmask of sync events that are active
...
 */
struct kgsl_drawobj_sync {
...
    unsigned long pending;  // [7]
    struct timer_list timer;
    unsigned long timeout_jiffies;
};

The vulnerability is due to the lack of sanitization checks arround the number of sync points param->numsyncs [2] passed to IOCTL_KGSL_GPU_AUX_COMMAND ioctl handler [0] when KGSL_GPU_AUX_COMMAND_SYNC [1] is specified. Down the line this is leading down to a set of calls [3][4] ultimately reaching out to a set_bit call [6] with a first parameter nr directly derived [5] from the user supplied param->numsyncs from the ioctl. Depending on the underlying implementation of set_bit this could lead to a linear out-of-bounds write like it's the case on ARM CPU. The kgsl driver allows only KGSL_MAX_SYNCPOINTS sync points which is hard coded to 32 and logically the size in bits of syncobj->pending [7].

Patch analysis:

diff --git a/drivers/gpu/msm/kgsl.c b/drivers/gpu/msm/kgsl.c
index fc6f6ee55b814c6942c42528b1430b1368ff8cd4..57fd3c091255cc843bad207d8452b3fc3abf27f8 100644
--- a/drivers/gpu/msm/kgsl.c
+++ b/drivers/gpu/msm/kgsl.c
@@ -2091,6 +2091,10 @@ long kgsl_ioctl_gpu_aux_command(struct kgsl_device_private *dev_priv,
        (KGSL_GPU_AUX_COMMAND_TIMELINE)))
        return -EINVAL;

+   if ((param->flags & KGSL_GPU_AUX_COMMAND_SYNC) &&
+       (param->numsyncs > KGSL_MAX_SYNCPOINTS))
+       return -EINVAL;
+
    context = kgsl_context_get_owner(dev_priv, param->context_id);
    if (!context)
        return -EINVAL;

kgsl_ioctl_gpu_aux_command has simply been modified to make sure param->numsyncs is lower or equal to KGSL_MAX_SYNCPOINTS before doing any processing otherwise the ioctl returns an error.

Thoughts on how this vuln might have been found (fuzzing, code auditing, variant analysis, etc.):

This vulnerability could have been found via manual code review or using a fuzzer (e.g. in-memory specific kgsl fuzzer).

(Historical/present/future) context of bug: N/A

The Exploit

(The terms exploit primitive, exploit strategy, exploit technique, and exploit flow are defined here.)

Exploit strategy (or strategies): N/A

Exploit flow: N/A

Known cases of the same exploit flow: N/A

Part of an exploit chain? N/A

The Next Steps

Variant analysis

Areas/approach for variant analysis (and why):

Two approaches can be done here:

  • Find and review all set_bit, change_bit and clear_bit calls where first parameter can be user controlled and ensure proper sanitization checks have been made.
  • Review newly added kgsl GPU ioctl handlers and make sure all params are properly sanitized.

Found variants: N/A

Structural improvements

What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?

Ideas to kill the bug class:

Creating a safe implementation of xxx_bit APIs by, for example, adding the size of the bitmap and defining new APIs on top for commonly used sizes, e.g. set_bit32(nr, bitmap).

Ideas to mitigate the exploit flow:

Upcoming memory tagging designs such as MTE are likely to prevent the exploit flow used to exploit this bug.

Other potential improvements: N/A

0-day detection methods

What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected as a 0-day?

Other References