CVE-2023-33106: Qualcomm Adreno GPU KGSL_GPU_AUX_COMMAND_SYNC OOB
Clement Lecigne, Google Threat Analysis Group
The Basics
Disclosure or Patch Date: October 2, 2023
Product: Qualcomm Adreno GPU
Advisory: https://docs.qualcomm.com/product/publicresources/securitybulletin/october-2023-bulletin.html
Affected Versions: N/A
First Patched Version: N/A
Issue/Bug Report: N/A
Patch CL: https://git.codelinaro.org/clo/la/kernel/msm-5.4/-/commit/502b225c16ccd1e6adab632faf2637ff6fe74569
Bug-Introducing CL: https://git.codelinaro.org/clo/la/kernel/msm-5.4/-/commit/36b196ce5c65ccade05e5e63ffe25c5661ad096e
Reporter(s): Clement Lecigne of Google's Threat Analysis Group
We would like to thank the Qualcomm product security team for their help while figuring out this vulnerability.
The Code
Proof-of-concept:
Pseudo-code triggering the overflow.
// Opening KGSL.
kgsl = open("/dev/kgsl-3d0", O_RDWR, 0);
// Drawing context.
ioctl(fd_kgsl, IOCTL_KGSL_DRAWCTXT_CREATE, ...);
// Create timeline.
ioctl(fd_kgsl, IOCTL_KGSL_TIMELINE_CREATE, ...);
// Issue GPU aux command.
struct kgsl_gpu_aux_command aux;
aux.flags = KGSL_CONTEXT_SYNC|KGSL_GPU_AUX_COMMAND_TIMELINE;
aux.numsyncs = 0x1337; // > 32.
ioctl(fd_kgsl, IOCTL_KGSL_GPU_AUX_COMMAND, &aux);
Exploit sample: Not public.
Did you have access to the exploit sample when doing the analysis? Yes
The Vulnerability
Bug class: Memory out-of-bounds write
Vulnerability details:
long kgsl_ioctl_gpu_aux_command(struct kgsl_device_private *dev_priv,
unsigned int cmd, void *data) // [0]
{
...
if (param->flags & KGSL_GPU_AUX_COMMAND_SYNC) { // [1]
...
ret = kgsl_drawobj_sync_add_synclist(device, syncobj,
u64_to_user_ptr(param->synclist),
param->syncsize, param->numsyncs); // [2]
if (ret)
goto err;
}
...
}
int kgsl_drawobj_sync_add_synclist(struct kgsl_device *device,
struct kgsl_drawobj_sync *syncobj, void __user *ptr,
unsigned int size, unsigned int count)
{
...
syncobj->synclist = kcalloc(count,
sizeof(struct kgsl_drawobj_sync_event), GFP_KERNEL);
...
for (i = 0; i < count; i++) {
...
ret = kgsl_drawobj_sync_add_sync(device, syncobj, &sync); // [3]
...
}
return 0;
}
int kgsl_drawobj_sync_add_sync(struct kgsl_device *device,
struct kgsl_drawobj_sync *syncobj,
struct kgsl_cmd_syncpoint *sync)
{
struct kgsl_drawobj *drawobj = DRAWOBJ(syncobj);
...
else if (sync->type == KGSL_CMD_SYNCPOINT_TYPE_TIMELINE)
return drawobj_add_sync_timeline(device,
syncobj, sync->priv, sync->size); // [4]
...
}
static int drawobj_add_sync_timeline(struct kgsl_device *device,
struct kgsl_drawobj_sync *syncobj, void __user *uptr,
u64 usize)
{
...
if (copy_struct_from_user(&sync, sizeof(sync), uptr, usize))
return -EFAULT;
...
id = syncobj->numsyncs++; // [5]
...
event->id = id;
...
/* Set pending flag before adding callback to avoid race */
set_bit(event->id, &syncobj->pending); // [6] - event->id is user-controlled.
...
}
/**
...
* @pending: Bitmask of sync events that are active
...
*/
struct kgsl_drawobj_sync {
...
unsigned long pending; // [7]
struct timer_list timer;
unsigned long timeout_jiffies;
};
The vulnerability is due to the lack of sanitization checks arround the number
of sync points param->numsyncs
[2] passed to IOCTL_KGSL_GPU_AUX_COMMAND
ioctl handler [0] when KGSL_GPU_AUX_COMMAND_SYNC
[1] is specified. Down the
line this is leading down to a set of calls [3][4] ultimately reaching out to a
set_bit
call [6] with a first parameter nr
directly derived [5] from the
user supplied param->numsyncs
from the ioctl. Depending on the underlying
implementation of set_bit
this could lead to a linear out-of-bounds write like
it's the case on ARM CPU. The kgsl driver allows only KGSL_MAX_SYNCPOINTS
sync
points which is hard coded to 32 and logically the size in bits of
syncobj->pending
[7].
Patch analysis:
diff --git a/drivers/gpu/msm/kgsl.c b/drivers/gpu/msm/kgsl.c
index fc6f6ee55b814c6942c42528b1430b1368ff8cd4..57fd3c091255cc843bad207d8452b3fc3abf27f8 100644
--- a/drivers/gpu/msm/kgsl.c
+++ b/drivers/gpu/msm/kgsl.c
@@ -2091,6 +2091,10 @@ long kgsl_ioctl_gpu_aux_command(struct kgsl_device_private *dev_priv,
(KGSL_GPU_AUX_COMMAND_TIMELINE)))
return -EINVAL;
+ if ((param->flags & KGSL_GPU_AUX_COMMAND_SYNC) &&
+ (param->numsyncs > KGSL_MAX_SYNCPOINTS))
+ return -EINVAL;
+
context = kgsl_context_get_owner(dev_priv, param->context_id);
if (!context)
return -EINVAL;
kgsl_ioctl_gpu_aux_command
has simply been modified to make sure
param->numsyncs
is lower or equal to KGSL_MAX_SYNCPOINTS
before doing any
processing otherwise the ioctl returns an error.
Thoughts on how this vuln might have been found (fuzzing, code auditing, variant analysis, etc.):
This vulnerability could have been found via manual code review or using a fuzzer (e.g. in-memory specific kgsl fuzzer).
(Historical/present/future) context of bug: N/A
The Exploit
(The terms exploit primitive, exploit strategy, exploit technique, and exploit flow are defined here.)
Exploit strategy (or strategies): N/A
Exploit flow: N/A
Known cases of the same exploit flow: N/A
Part of an exploit chain? N/A
The Next Steps
Variant analysis
Areas/approach for variant analysis (and why):
Two approaches can be done here:
- Find and review all
set_bit
,change_bit
andclear_bit
calls where first parameter can be user controlled and ensure proper sanitization checks have been made. - Review newly added kgsl GPU ioctl handlers and make sure all params are properly sanitized.
Found variants: N/A
Structural improvements
What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?
Ideas to kill the bug class:
Creating a safe implementation of xxx_bit
APIs by, for example, adding the
size of the bitmap and defining new APIs on top for commonly used sizes, e.g.
set_bit32(nr, bitmap)
.
Ideas to mitigate the exploit flow:
Upcoming memory tagging designs such as MTE are likely to prevent the exploit flow used to exploit this bug.
Other potential improvements: N/A
0-day detection methods
What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected as a 0-day?