CVE-2021-1647: Windows Defender mpengine remote code execution

Maddie Stone, Project Zero

The Basics

Disclosure or Patch Date: 12 January 2021

Product: Microsoft Windows Defender

Advisory: https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-1647

Affected Versions: Version 1.1.17600.5 and previous

First Patched Version: Version 1.1.17700.4

Issue/Bug Report: N/A

Patch CL: N/A

Bug-Introducing CL: N/A

Reporter(s): Anonymous

The Code

Proof-of-concept:

Exploit sample: 6e1e9fa0334d8f1f5d0e3a160ba65441f0656d1f1c99f8a9f1ae4b1b1bf7d788

Did you have access to the exploit sample when doing the analysis? Yes

The Vulnerability

Bug class: Heap buffer overflow

Vulnerability details:

There is a heap buffer overflow when Windows Defender (mpengine.dll) processes the section table when unpacking an ASProtect packed executable. Each section entry has two values: the virtual address and the size of the section. The code in CAsprotectDLLAndVersion::RetrieveVersionInfoAndCreateObjects only checks if the next section entry's address is lower than the previous one, not if they are equal. This means that if you have a section table such as the one used in this exploit sample: [ (0,0), (0,0), (0x2000,0), (0x2000,0x3000) ], 0 bytes are allocated for the section at address 0x2000, but when it sees the next entry at 0x2000, it simply skips over it without exiting nor updating the size of the section. 0x3000 bytes will then be copied to that section during the decompression, leading to the heap buffer overflow.

if ( next_sect_addr > sect_addr )// current va is greater than prev (not also eq)
{
    sect_addr = next_sect_addr;
    sect_sz = (next_sect_sz + 0xFFF) & 0xFFFFF000;
} 
// if next_sect_addr <= sect_addr we continue on to next entry in the table 

[...]
			new_sect_alloc = operator new[](sect_sz + sect_addr);// allocate new section
[...]

Patch analysis: There are quite a few changes to the function CAsprotectDLLAndVersion::RetrieveVersionInfoAndCreateObjects between version 1.1.17600.5 (vulnerable) and 1.1.17700.4 (patched). The directly related change was to add an else branch to the comparison so that if any entry in the section array has an address less than or equal to the previous entry, the code will error out and exit rather than continuing to decompress.

Thoughts on how this vuln might have been found (fuzzing, code auditing, variant analysis, etc.):

It seems possible that this vulnerability was found through fuzzing or manual code review. If the ASProtect unpacking code was included from an external library, that would have made the process of finding this vulnerability even more straightforward for both fuzzing & review.

(Historical/present/future) context of bug:

The Exploit

(The terms exploit primitive, exploit strategy, exploit technique, and exploit flow are defined here.)

Exploit strategy (or strategies):

The heap buffer overflow is used to overwrite the data in an object stored as the first field in the lfind_switch object which is allocated in the lfind_switch::switch_out function.
The two fields that were overwritten in the object pointed to by the lfind_switch object are used as indices in lfind_switch::switch_in. Due to no bounds checking on these indices, another out-of-bounds write can occur.
The out of bounds write in step 2 performs an or operation on the field in the VMM_context_t struct (the virtual memory manager within Windows Defender) that stores the length of a table that tracks the virtual mapped pages. This field usually equals the number of pages mapped * 2. By performing the 'or' operations, the value in the that field is increased (for example from 0x0000000C to 0x0003030c. When it's increased, it allows for an additional out-of-bounds read & write, used for modifying the memory management struct to allow for arbitrary r/w.

The second step of overwriting the lfind_switch struct is likely done because the VMM_context_t struct is very far from the buffer that is originally overflowed (0x3C0000+ in my test). Overwriting this amount of memory would likely make the exploit less stable.

Exploit flow:

The exploit uses "primitive bootstrapping" to to use the original buffer overflow to cause two additional out-of-bounds writes to ultimately gain arbitrary read/write.

Known cases of the same exploit flow: Unknown.

Part of an exploit chain? Unknown.

The Next Steps

Variant analysis

Areas/approach for variant analysis (and why):

Review ASProtect unpacker for additional parsing bugs.
Review and/or fuzz other unpacking code for parsing and memory issues.

Found variants: N/A

Structural improvements

What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?

Ideas to kill the bug class:

Building mpengine.dll with ASAN enabled should allow for this bug class to be caught.
Rust. A memory safe language could potentially protect against these types of memory corruption vulnerabilities.

Ideas to mitigate the exploit flow:

If possible, adding bounds checking to anywhere indices are used. For example, is there a way to add bounds check to when indices are used in lfind_switch::switch_in. It could have maybe prevented the 2nd out-of-bounds write which allowed this exploit to modify the VMM_context_t structure. This would be dependent on the attacker not being able to overwrite the bounds.

Other potential improvements:

It appears that by default the Windows Defender emulator runs outside of a sandbox. In 2018, there was this article that Windows Defender Antivirus can now run in a sandbox. The article states that when sandboxing is enabled, you will see a content process MsMpEngCp.exe running in addition to MsMpEng.exe. By default, on Windows 10 machines, I only see MsMpEng.exe running as SYSTEM. Sandboxing the anti-malware emulator by default, would make this vulnerability more difficult to exploit because a sandbox escape would then be required in addition to this vulnerability.
Open sourcing unpackers could allow more folks to find issues in this code, which could potentially detect issues like this more readily.
It did not appear that this code had been extensively fuzzed. If this is the case, incorporating fuzz-testing into the software development lifecycle could help catch these types of issues.

0-day detection methods

What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected as a 0-day?

Detecting these types of 0-days will be difficult due to the sample simply dropping a new file with the characteristics to trigger the vulnerability, such as a section table that includes the same virtual address twice. The exploit method also did not require anything that especially stands out.

Other References

February 2021: 浅析 CVE-2021-1647 的漏洞利用技巧("Analysis of CVE-2021-1647 vulnerability exploitation techniques") by Threatbook