CVE-2021-33742: Internet Explorer out-of-bounds write in MSHTML

Maddie Stone, Google Project Zero & Threat Analysis Group

The Basics

Disclosure or Patch Date: 03 June 2021

Product: Microsoft Internet Explorer

Advisory: https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-33742

Affected Versions: For Windows 10 20H2 x64, KB5003173 and previous

First Patched Version: For Windows 10 20H2 x64, KB5003637

Issue/Bug Report: N/A

Patch CL: N/A

Bug-Introducing CL: N/A

Reporter(s): Clément Lecigne of Google’s Threat Analysis Group

The Code

Proof-of-concept:

Proof-of-concept by Ivan Fratric of Project Zero

<script>
var b = document.createElement("html");
b.innerHTML = Array(40370176).toString();
b.innerHTML = "";
</script>

Exploit sample:

Examples of the Word documents used to distribute this exploit:

Did you have access to the exploit sample when doing the analysis? Yes

The Vulnerability

Bug class: Out-of-bounds write

Vulnerability details:

The vulnerability is due to the size of the string of the inner html element being truncated (size&0x1FFFFFF) in the CTreePos structure while the non-truncated size is still in the text data object. Memory at [1] is allocated based on the size in the CTreePos structure, the truncated size.

The text data returned by MSHTML!Tree::TextData::GetText [2] includes the full non-truncated length of the string. The non-truncated length is then passed as the src length to wmemcpy_s [3] while the allocated destination memory uses the truncated length. While wmemcpy_s protects against the buffer overflow here, the source size is used as the increment even though that was not the number of bytes actually copied: the size of the allocation was. The index (v190) is incremented by the larger number. When that index is then used to access the memory allocated at [1], it leads to the out of bounds write at MSHTML!CSpliceTreeEngine::RemoveSplice+0xb1f.

if ( v172 >= 90000 && ((_BYTE)v4[21] & 4) != 0 )
{
  v70 = 1 - CTreePos::GetCp(v4[5]);
  v71 = CTreePos::GetCp(v4[6]); 			/*** v71 = Truncated size (orig_sz&0x1ffffff) ***/
  v72 = v4[6];
  v104 = (*(_BYTE *)v72 & 4) == 0;
  v189 = (CTreeNode *)(v70 + v71); 
  if ( !v104 )
  {
    v73 = CTreeDataPos::GetTextLength(v72);
    v189 = (CTreeNode *)(v73 + v74 - 1);
  }
  if ( v184 <= (int)v187 )
  {
    v77 = (struct CMarkup *)operator new[]( 	/*** [1] allocates based on truncated size ***/
                              (unsigned int)newAlloc,
                              (const struct MemoryProtection::leaf_t *)newAllocSz);
    v4[23] = v77;
    if ( v77 )
    {
      for ( i = v4[5]; i != *((struct CMarkup **)v4[6] + 5); i = (struct CMarkup *)*((_DWORD *)i + 5) )
      {
        if ( (*(_BYTE *)i & 4) != 0 )
        {

          /*** [2] srcTextSz is non truncated size ***/
          srcText = Tree::TextData::GetText(*((Tree::TextData **)i + 8), 0, &srcTextSz);    

          /*** [3] -- srcTextSz > newAllocSz ***/
          wmemcpy_s(srcText, srcTextSz, (const wchar_t *)newAlloc, (rsize_t)newAllocSz); 
          
          /*** memcpy only copied newAllocSz not srcTextSz so v190 is now > max ***/
          v190 += srcTextSz; 
        }
        else if ( (*(_BYTE *)i & 3) != 0 && (*(_BYTE *)i & 0x40) != 0 )
        {
          v80 = v190;
          *((_WORD *)v4[23] + (_DWORD)v190) = 0xFDEF;
          v190 = v80 + 1;
        }
      }
    }

Patch analysis:

The patch is in Tree::TreeWriter::NewTextPosInternal. The patch will cause a release assert if there is an attempt to add TextData greater than 0x1FFFFFFF to the HTML tree.

Thoughts on how this vuln might have been found (fuzzing, code auditing, variant analysis, etc.):

This vulnerability was likely found via fuzzing. A fuzzer may not have found this vulnerability if your fuzzer runs with a tight timeout since this vulnerability takes a few seconds to trigger. It still seems more likely that this would have been found via fuzzing rather than manual review.

(Historical/present/future) context of bug:

See this Google TAG blogpost for more info. Malicious Office documents loaded web content within Internet Explorer. The malicious document would fingerprint the device and then send this Internet Explorer website to users.

The Exploit

(The terms exploit primitive, exploit strategy, exploit technique, and exploit flow are defined here.)

Exploit strategy (or strategies): Still under analysis.

Exploit flow:

Known cases of the same exploit flow:

Part of an exploit chain?

This vulnerability was likely paired with a sandbox escape, but that was not collected.

The Next Steps

Variant analysis

Areas/approach for variant analysis (and why):

It seems possible that there would be more of these types of instances throughout the code base if CTreePos structures are truncating the sizes to 25 bits while other areas, such as TextData are not. The top 7 bits of the size in the CTreePos struct are used as flags.

Found variants: N/A

Structural improvements

What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?

Ideas to kill the bug class:

If truncating the size/length of an object, do the bounds checking/input validation of the size at the earliest point and only store the truncated size.
Kill the tab process when the size reaches the size that can no longer be properly represented.

Ideas to mitigate the exploit flow: N/A

Other potential improvements:

Microsoft has announced that Internet Explorer will be retired in June 2020. However, it also says that the retirement does not affect the MSHTML (Trident) engine. This means that mshtml.dll where this vulnerability exists is not planning to be retired. In the future, if a user enables IE mode in Edge, the mshtml engine would be used. It seems likely that Office will still have access to mshtml. Limiting access to mshtml and audit applications that use mshtml.

0-day detection methods

What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected as a 0-day?

Variants of this bug could potentially be detected by looking for javascript that tries to create objects with sizes greater than the allowed bounds.

Other References

July 2021: "How We Protect Users From 0-Day Attacks" by Google's Threat Analysis Group gives context about how this exploit was used.