Windows’ PsSetLoadImageNotifyRoutine Callbacks: the Good, the Bad and the Unclear (Part 1)
tl;dr: Security vendors and kernel developers beware – a programming error in the Windows kernel could prevent you from identifying which modules have been loaded at runtime.
During research into the Windows kernel, we came across an interesting issue with PsSetLoadImageNotifyRoutine which as its name implies, notifies of module loading.
The thing is, after registering a notification routine for loaded PE images with the kernel the callback may receive invalid image names.
After digging into the matter, what started as a seemingly random issue proved to originate from a coding error in the Windows kernel itself.
This flaw exists in the most recent Windows 10 release and past versions of the OS, dating back to Windows 2000.
The Good: Notification of Module Loading
Here’s where Microsoft introduced PsSetLoadImageNotifyRoutine, in Windows 2000. This mechanism, notifies registered drivers, from various parts in the kernel, when a PE image file has been loaded to virtual memory (kernel\user space).
Behind the Scenes:
There are several cases that will cause the notification routine to be invoked:
- Loading drivers
- Starting new processes
- Process executable image
- System DLL: ntdll.dll (2 different binaries for WoW64 processes)
- Runtime loaded PE images – import table, LoadLibrary, LoadLibraryEx, NtMapViewOfSection
Figure 1: All calls to PsCallImageNotifyRoutines in ntoskrnl.exe
When invoking the registered notification routines, the kernel provides them with a number of parameters in order to properly identify the PE image that is being loaded. These parameters can be seen in the prototype definition of the callback function:
VOID (*PLOAD_IMAGE_NOTIFY_ROUTINE)( _In_opt_ PUNICODE_STRING FullImageName, // The image name _In_ HANDLE ProcessId, // A handle to the process the PE has been loaded to _In_ PIMAGE_INFO ImageInfo // Information describing the loaded image (base address, size, kernel/user-mode image, etc) );
The Only Way to Go
In essence, this is the only documented method in the WDK to actually monitor PEs that are loaded to memory as executable code.
A different method, recommended by Microsoft, is to use a file-system mini-filter callback (IRP_MJ_ACQUIRE_FOR_SECTION_SYNCHRONIZATION). In order to tell that a section object is part of a loaded executable image, one must check for the existence of the SEC_IMAGE flag passed to NtCreateSection. However, the file-system mini-filter callback does not receive this flag, and it is therefore impossible to determine whether the section object is being created for the loading of a PE image or not.
The Bad: Wrong Module Parameter
The only parameter that can effectively identify the loaded PE file is the FullImageName parameter.
However, in each of the scenarios described earlier the kernel uses a different format for FullImageName.
At first glance, we noticed that while we do get the full path of the process executable file and constant values for system DLLs (that are missing the volume name), for the rest of the dynamically loaded user-mode PEs the paths provided are missing the volume name.
What’s more alarming is that not only does that path come without the volume name, sometimes the path is completely malformed, and could point to a different or non-existing file.
So as every researcher\developer does, the first thing we did was to go back to the documentation and make sure we understood it properly.
According to MSDN, the description of FullImageName implies it is the path of the file on disk since it “identifies the executable image file”. There is no mention of these invalid or non-existing paths.
The documentation does state that it may be NULL: “(The FullImageName parameter can be NULL in cases in which the operating system is unable to obtain the full name of the image at process creation time.)”. But clearly, if the parameter is not NULL, it means the kernel was able to successfully retrieve the correct image name.
There’s More than Just Typos in the Documentation
Another thing that caught our attention while perusing the documentation was that the function prototype as shown on MSDN is wrong. The Create parameter, which according to its description doesn’t even seem to be related to this mechanism, doesn’t exist in the function prototype from the WDK. Ironically, using the prototype specified on MSDN causes a crash due to stack corruption.
Under the Hood
nt!PsCallImageNotifyRoutines is in charge of invoking the registered callbacks. It merely passes along the UNICODE_STRING pointer it receives from its own caller to the callbacks as the FullImageName parameter. When nt!MiMapViewOfImageSection maps a section as an image this UNICODE_STRING is the FileName field of the FILE_OBJECT represented by that section.
Figure 2: FullImageName passed to the notification routine is actually the FILE_OBJECT’s FileName field
The FILE_OBJECT is obtained by going through the SECTION -> SEGMENT -> CONTROL_AREA. These are internal and undocumented kernel structures. The Memory Manager creates these structures when mapping a file into memory, and uses these structures internally as long as the file is mapped.
Figure 3: nt!MiMapViewOfImageSection obtaining the FILE_OBJECT before calling nt!PsCallImageNotifyRoutines
There’s a single SEGMENT structure per mapped image. This means that multiple sections of the same image that exists simultaneously, within the same process or across processes, use the same SEGMENT and CONTROL_AREA. This explains why the argument FullImageName was identical when the same PE file as loaded into different processes at the same time.
Figure 4: File mapping internal structures (simplified)
In order to understand how the FileName field is set and managed we went back to the documentation and according to MSDN using it is forbidden! “[The value] in this string is valid only during the initial processing of an IRP_MJ_CREATE request. This file name should not be considered valid after the file system starts to process the IRP_MJ_CREATE request” and at this point the FILE_OBJECT is clearly used after the file-system completed the IRP_MJ_CREATE request.
Now it’s obvious that the NTFS driver takes ownership of this UNICODE_STRING (FILE_OBJECT.FileName).
Using a kernel debugger, we found that ntfs!NtfsUpdateCcbsForLcbMove is the function responsible for the renaming operation. While looking at this function we inferred that during the IRP_MJ_CREATE request the file-system driver simply creates a shallow copy of FILE_OBJECT.FileName and maintains it separately. This means that only the address of the buffer is copied, not the buffer itself.
Figure 5: ntfs!NtfsUpdateCcbsForLcbMove updating the file name value
Root Cause Analysis
As long as the new path length won’t exceed the MaximumLength, the shared buffer will be overwritten without updating the Length field of FILE_OBJECT.FileName, which is where the kernel gets the string for the notification routine. If the new path length exceeds the MaximumLength, a new buffer will be allocated and the notification routine will get a completely outdated value.
Even though we finally figured out the cause for this bug something still didn’t add up. Why is it that even after all the handles to the image (from SECTIONs and FILE_OBJECTs) were closed we are still seeing these malformed paths? If all handles to the file were indeed closed, the next time the PE image will be opened and loaded a new FILE_OBJECT should be created without references and with the most up to date path.
Instead, the FullImageName still pointed to the old UNICODE_STRING. This proved that the FILE_OBJECT wasn’t closed although its handle count was 0, which means the reference count must have been higher than 0. We were also able to confirm this using a debugger.
As a ref count leak in the kernel isn’t very likely we are left with one immediate suspect: The Cache Manager. What seems to be caching behavior, along with the way the file-system driver maintains the file name and a severe coding error is what ultimately causes the invalid name issue.
Pausing to reflect
At this point we were sure we figured out what causes the problem though what eluded us was how can it be that this bug still exists? And there’s no obvious solution for it?
In our next post, we’ll cover our endeavors to find good answers for these questions.
Update 9/9 4:50 PM ET:
Given the recent attention to this post, we’ll release the 2nd part very soon. It details a workaround for this bug, again, not a vulnerability.
Note: majority of the analysis was done on a Windows 7 SP1 x86 fully patched and updated machine.
The findings were also verified to be present on Windows XP SP3, Windows 7 SP1 x64, Windows 10 Anniversary Update (Redstone) both x86 and x64 all fully patched and updated as well.