# CVE-2023-33106: Qualcomm Adreno GPU KGSL_GPU_AUX_COMMAND_SYNC OOB

Clement Lecigne, Google Threat Analysis Group

## The Basics

**Disclosure or Patch Date:** October 2, 2023

**Product:** Qualcomm Adreno GPU

**Advisory:**
https://docs.qualcomm.com/product/publicresources/securitybulletin/october-2023-bulletin.html

**Affected Versions:** N/A

**First Patched Version:** N/A

**Issue/Bug Report:** N/A

**Patch CL:**
https://git.codelinaro.org/clo/la/kernel/msm-5.4/-/commit/502b225c16ccd1e6adab632faf2637ff6fe74569

**Bug-Introducing CL:**
https://git.codelinaro.org/clo/la/kernel/msm-5.4/-/commit/36b196ce5c65ccade05e5e63ffe25c5661ad096e

**Reporter(s):** Clement Lecigne of Google's Threat Analysis Group

We would like to thank the Qualcomm product security team for their help while
figuring out this vulnerability.

## The Code

**Proof-of-concept:**

Pseudo-code triggering the overflow.

```c
// Opening KGSL.
kgsl = open("/dev/kgsl-3d0", O_RDWR, 0);
// Drawing context.
ioctl(fd_kgsl, IOCTL_KGSL_DRAWCTXT_CREATE, ...);
// Create timeline.
ioctl(fd_kgsl, IOCTL_KGSL_TIMELINE_CREATE, ...);
// Issue GPU aux command.
struct kgsl_gpu_aux_command aux;
aux.flags =  KGSL_CONTEXT_SYNC|KGSL_GPU_AUX_COMMAND_TIMELINE;
aux.numsyncs = 0x1337;  // > 32.
ioctl(fd_kgsl, IOCTL_KGSL_GPU_AUX_COMMAND, &aux);
```

**Exploit sample:** Not public.

**Did you have access to the exploit sample when doing the analysis?** Yes

## The Vulnerability

**Bug class:** Memory out-of-bounds write

**Vulnerability details:**

```c
long kgsl_ioctl_gpu_aux_command(struct kgsl_device_private *dev_priv,
        unsigned int cmd, void *data)   // [0]
{
...
    if (param->flags & KGSL_GPU_AUX_COMMAND_SYNC) { // [1]
        ...
        ret = kgsl_drawobj_sync_add_synclist(device, syncobj,
                u64_to_user_ptr(param->synclist),
                param->syncsize, param->numsyncs);  // [2]
        if (ret)
            goto err;
    }
...
}
```

```c
int kgsl_drawobj_sync_add_synclist(struct kgsl_device *device,
        struct kgsl_drawobj_sync *syncobj, void __user *ptr,
        unsigned int size, unsigned int count)
{
...
    syncobj->synclist = kcalloc(count,
        sizeof(struct kgsl_drawobj_sync_event), GFP_KERNEL);
...
    for (i = 0; i < count; i++) {
...
        ret = kgsl_drawobj_sync_add_sync(device, syncobj, &sync);   // [3]
...
    }
    return 0;
}
```

```c
int kgsl_drawobj_sync_add_sync(struct kgsl_device *device,
    struct kgsl_drawobj_sync *syncobj,
    struct kgsl_cmd_syncpoint *sync)
{
    struct kgsl_drawobj *drawobj = DRAWOBJ(syncobj);
...
    else if (sync->type == KGSL_CMD_SYNCPOINT_TYPE_TIMELINE)
        return drawobj_add_sync_timeline(device,
            syncobj, sync->priv, sync->size);   // [4]
...
}
```

```c
static int drawobj_add_sync_timeline(struct kgsl_device *device,
        struct kgsl_drawobj_sync *syncobj, void __user *uptr,
        u64 usize)
{
...
    if (copy_struct_from_user(&sync, sizeof(sync), uptr, usize))
        return -EFAULT;
...
    id = syncobj->numsyncs++;   // [5]
...
    event->id = id;
...
    /* Set pending flag before adding callback to avoid race */
    set_bit(event->id, &syncobj->pending);  // [6] - event->id is user-controlled.
...
}
```

```c
/**
...
 * @pending: Bitmask of sync events that are active
...
 */
struct kgsl_drawobj_sync {
...
    unsigned long pending;  // [7]
    struct timer_list timer;
    unsigned long timeout_jiffies;
};
```

The vulnerability is due to the lack of sanitization checks arround the number
of sync points `param->numsyncs` [2] passed to `IOCTL_KGSL_GPU_AUX_COMMAND`
ioctl handler [0] when `KGSL_GPU_AUX_COMMAND_SYNC` [1] is specified. Down the
line this is leading down to a set of calls [3][4] ultimately reaching out to a
`set_bit` call [6] with a first parameter `nr` directly derived [5] from the
user supplied `param->numsyncs` from the ioctl. Depending on the underlying
implementation of `set_bit` this could lead to a linear out-of-bounds write like
it's the case on ARM CPU. The kgsl driver allows only `KGSL_MAX_SYNCPOINTS` sync
points which is hard coded to 32 and logically the size in bits of
`syncobj->pending` [7].

**Patch analysis:**

```diff
diff --git a/drivers/gpu/msm/kgsl.c b/drivers/gpu/msm/kgsl.c
index fc6f6ee55b814c6942c42528b1430b1368ff8cd4..57fd3c091255cc843bad207d8452b3fc3abf27f8 100644
--- a/drivers/gpu/msm/kgsl.c
+++ b/drivers/gpu/msm/kgsl.c
@@ -2091,6 +2091,10 @@ long kgsl_ioctl_gpu_aux_command(struct kgsl_device_private *dev_priv,
        (KGSL_GPU_AUX_COMMAND_TIMELINE)))
        return -EINVAL;

+   if ((param->flags & KGSL_GPU_AUX_COMMAND_SYNC) &&
+       (param->numsyncs > KGSL_MAX_SYNCPOINTS))
+       return -EINVAL;
+
    context = kgsl_context_get_owner(dev_priv, param->context_id);
    if (!context)
        return -EINVAL;
```

`kgsl_ioctl_gpu_aux_command` has simply been modified to make sure
`param->numsyncs` is lower or equal to `KGSL_MAX_SYNCPOINTS` before doing any
processing otherwise the ioctl returns an error.

**Thoughts on how this vuln might have been found *(fuzzing, code auditing,
variant analysis, etc.)*:**

This vulnerability could have been found via manual code review or using a
fuzzer (e.g. in-memory specific kgsl fuzzer).

**(Historical/present/future) context of bug:** N/A

## The Exploit

(The terms *exploit primitive*, *exploit strategy*, *exploit technique*, and
*exploit flow* are
[defined here](https://googleprojectzero.blogspot.com/2020/06/a-survey-of-recent-ios-kernel-exploits.html).)

**Exploit strategy (or strategies):** N/A

**Exploit flow:** N/A

**Known cases of the same exploit flow:** N/A

**Part of an exploit chain?** N/A

## The Next Steps

### Variant analysis

**Areas/approach for variant analysis (and why):**

Two approaches can be done here:

*   Find and review all `set_bit`, `change_bit` and `clear_bit` calls where
    first parameter can be user controlled and ensure proper sanitization checks
    have been made.
*   Review newly added kgsl GPU ioctl handlers and make sure all params are
    properly sanitized.

**Found variants:** N/A

### Structural improvements

What are structural improvements such as ways to kill the bug class, prevent the
introduction of this vulnerability, mitigate the exploit flow, make this type of
vulnerability harder to exploit, etc.?

**Ideas to kill the bug class:**

Creating a safe implementation of `xxx_bit` APIs by, for example, adding the
size of the bitmap and defining new APIs on top for commonly used sizes, e.g.
`set_bit32(nr, bitmap)`.

**Ideas to mitigate the exploit flow:**

Upcoming memory tagging designs such as MTE are likely to prevent the exploit
flow used to exploit this bug.

**Other potential improvements:** N/A

### 0-day detection methods

What are potential detection methods for similar 0-days? Meaning are there any
ideas of how this exploit or similar exploits could be detected **as a 0-day**?

## Other References
