Thursday, 29 June 2017

Practical Reverse Engineering Exercise Solutions: ObFastDereferenceObject

First of all a quick reminder: This series of blog posts relates to exercises from the book Practical Reverse Engineering by Dang et al. Although it is called reverse engineering in general, it actually is mostly relevant to Microsoft Windows operating systems. This is simply due to the fact that Microsoft Windows is closed source in contrast to the Linux/Unix families, which means its source code is publicly available and so no reverse engineering endeavours are necessary.

Our next task is to decompile the ObFastDereferenceObject routine, with special consideration to be paid to its calling convention.
The most common calling conventions for functions are:

  • stdcall (arguments are pushed from right to left to the stack, the called function has to clean up the stack at the end of the routine)
  • cdecl (arguments are pushed from right to left to the stack, the calling function has to clean up the stack at the end of the routine)
  • fastcall (arguments are passed in registers ecx and/or edx first, the other arguments are pushed from right to the left onto the stack)
MSDN describes these and more conventions excellently: https://msdn.microsoft.com/en-us/library/984x0h58.aspx

Microsoft seems, however, not to have documented the function in question publicly. Yet, there is an unofficial resource at http://gate.upm.ro/os/LABs/Windows_OS_Internals_Curriculum_Resource_Kit-ACADEMIC/WindowsResearchKernel-WRK/WRK-v1.2/base/ntos/ob/fastref.c

It states:
NTKERNELAPI
VOID
FASTCALL
ObFastDereferenceObject (
    IN PEX_FAST_REF FastRef,
    IN PVOID Object
    )
/*++

Routine Description:
    This routine does a fast dereference if possible.

Arguments:
    FastRef - Rundown block to be used to dereference the object

Return Value:
    None.


We show the disassembly of the function of interest:


We notice two indicators that the function utilizes the fastcall convention: Firstly, the register value at edx is read without prior initialization. This means that arguments have been passed in these registers beforehand, thereby conforming to the fastcall convention. Secondly, the function name contains the keyword fast.
Moreover, the last line of the routine specifies that 4 bytes are removed from the stack (ret 4 instruction). This means that one of the two function arguments is passed via the stack rather via a register value. A quick glimpse into the disassembly shows that the value from [ebp+8] is read at the beginning of part +0x21. With fastcall, the first parameter(s) are passed in registers while the remaining ones are pushed on the stack. So we can infer that edx is the FastRef variable, while Object is saved at ebp+8.

According to https://www.nirsoft.net/kernel_struct/vista/EX_FAST_REF.html the PEX_FAST_REF datatype is defined as follows. 

typedef struct _EX_FAST_REF
{
     union
     {
          PVOID Object;
          ULONG RefCnt: 3;
          ULONG Value;
     };
} EX_FAST_REF, *PEX_FAST_REF;

Notice that thisdata structure is of type union. As I have learned C a couple of years ago and the type is not completely famililar to me any more, it is worthwhile to recapitulate its meaning. It basically allows to store multiple data types at the same memory location, so the meaning of the memory content varies depending on the referenced variable. While this can be memory-efficient, it is essential that the program always reads the variable that has been set for the last time, as it could otherwise contain invalid and potentially dangerous contents.

The Disassembly contains a somewhat peculiar function, namely lock cmpxchg dword ptr [edi],esi.

According to its name, cmpxchg seems to exchange something and its semantics are explained in detail at http://x86.renejeschke.de/html/file_module_x86_id_41.html.

Translating the instruction from above to C-pseudo code yields:
if (eax == [edi]) {
 [edi] = esi
}
else {
 eax = [edi]
}
The LOCK prefix means the instruction will be executed atomically and ensure that the processor has exclusive access to the memory region. It is thus a basic primitive for managing multi-processor / multi-threading environments and synchronization protocols. (see also http://x86.renejeschke.de/html/file_module_x86_id_159.html)


The first attempt to translate the function to C:

ObFastDereferenceObject (
    IN PEX_FAST_REF FastRef,
    IN PVOID Object
    )


edx = FastRef
[ebp+8] = Object

// Loop initialization

ecx = *FastRef
eax = *FastRef

goto loopcheck

loopbody:
esi = &(ecx->Object) + 1
edi = edx (FastRef)
eax = ecx
if (eax == [edi]) {
 [edi] = esi // eax will not be modified, i.e. afterwards we have eax==ecx
}
else {
 eax = [edi]
}

if (eax==ecx)
 goto finish //(the exchange operation actually was successful)
else  
 {
 ecx = eax (*FastRef)
 goto loopcheck
 }

loopcheck:
eax = eax XOR [ebp+8] 
if (eax < 7) // this means all higher bits are zero, i.e. 000000xxx and the least significant three bits are not all 1, i.e. 111 = 7 is not possible
 goto loopbody
else
 {
 ObDereferenceObject(Object)
 goto finish
 }

finish:
ret 4

Monday, 19 June 2017

Practical Reverse Engineering Exercise Solutions: KeInitializeApc Routine

To keep me motivated and document my progress, I will create a series of blog posts with answers to some of the exercises from the book "Practical Reverse Engineering" by Dang, Gazet and Bachaalany.

In the last post, we introduced the Windows Kernel Debugger (KD) and some of the functions. I have learned that rather than using KD directly, we can use WinDbg's interface which is more user-friendly.
When calling livekd, simply append the "-w" parameter and WinDbg will start up:


Let us now proceed with the task of decompiling the Windows Kernel routine KeInitializeApc.


The first odditiy that caught my attention was the  instruction mov edi, edi right at the beginning of the function. What is its purpose?
this instruction is a common practice in Windows and acts as a two-byte NOP, which is inserted to allow for hot-patching. In a nutshell, it is used for dynamically replacing the mov edi, edi instruction with a JUMP instruction.

Returning to the KeInitializeApc function, we should examine its signature first. Unfortunately, the function is not officially documented at MSDN. Nevertheless, it is documented in several forums (e.g., https://forum.sysinternals.com/howto-capture-kernel-stack-traces_topic19356.html) :

NTKERNELAPI VOID KeInitializeApc(
    PKAPC Apc,
    PKTHREAD Thread,
    KAPC_ENVIRONMENT Environment,
    PKKERNEL_ROUTINE KernelRoutine,
    PKRUNDOWN_ROUTINE RundownRoutine,
    PKNORMAL_ROUTINE NormalRoutine,
    KPROCESSOR_MODE ProcessorMode,
    PVOID NormalContext
    );

The PKAPC structure is defined as follows:


The KTHREAD structure is defined as follows: (the only referred property is at 0x134)
0: kd> dt nt!_kthread
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 CycleTime        : Uint8B
   +0x018 HighCycleTime    : Uint4B
   +0x020 QuantumTarget    : Uint8B
   +0x028 InitialStack     : Ptr32 Void
   +0x02c StackLimit       : Ptr32 Void
   +0x030 KernelStack      : Ptr32 Void
   +0x034 ThreadLock       : Uint4B
   +0x038 WaitRegister     : _KWAIT_STATUS_REGISTER
   +0x039 Running          : UChar
   +0x03a Alerted          : [2] UChar
   +0x03c KernelStackResident : Pos 0, 1 Bit
   +0x03c ReadyTransition  : Pos 1, 1 Bit
   +0x03c ProcessReadyQueue : Pos 2, 1 Bit
   +0x03c WaitNext         : Pos 3, 1 Bit
   +0x03c SystemAffinityActive : Pos 4, 1 Bit
   +0x03c Alertable        : Pos 5, 1 Bit
   +0x03c GdiFlushActive   : Pos 6, 1 Bit
   +0x03c UserStackWalkActive : Pos 7, 1 Bit
   +0x03c ApcInterruptRequest : Pos 8, 1 Bit
   +0x03c ForceDeferSchedule : Pos 9, 1 Bit
   +0x03c QuantumEndMigrate : Pos 10, 1 Bit
   +0x03c UmsDirectedSwitchEnable : Pos 11, 1 Bit
   +0x03c TimerActive      : Pos 12, 1 Bit
   +0x03c SystemThread     : Pos 13, 1 Bit
   +0x03c Reserved         : Pos 14, 18 Bits
   +0x03c MiscFlags        : Int4B
   +0x040 ApcState         : _KAPC_STATE
   +0x040 ApcStateFill     : [23] UChar
   +0x057 Priority         : Char
   +0x058 NextProcessor    : Uint4B
   +0x05c DeferredProcessor : Uint4B
   +0x060 ApcQueueLock     : Uint4B
   +0x064 ContextSwitches  : Uint4B
   +0x068 State            : UChar
   +0x069 NpxState         : Char
   +0x06a WaitIrql         : UChar
   +0x06b WaitMode         : Char
   +0x06c WaitStatus       : Int4B
   +0x070 WaitBlockList    : Ptr32 _KWAIT_BLOCK
   +0x074 WaitListEntry    : _LIST_ENTRY
   +0x074 SwapListEntry    : _SINGLE_LIST_ENTRY
   +0x07c Queue            : Ptr32 _KQUEUE
   +0x080 WaitTime         : Uint4B
   +0x084 KernelApcDisable : Int2B
   +0x086 SpecialApcDisable : Int2B
   +0x084 CombinedApcDisable : Uint4B
   +0x088 Teb              : Ptr32 Void
   +0x090 Timer            : _KTIMER
   +0x0b8 AutoAlignment    : Pos 0, 1 Bit
   +0x0b8 DisableBoost     : Pos 1, 1 Bit
   +0x0b8 EtwStackTraceApc1Inserted : Pos 2, 1 Bit
   +0x0b8 EtwStackTraceApc2Inserted : Pos 3, 1 Bit
   +0x0b8 CalloutActive    : Pos 4, 1 Bit
   +0x0b8 ApcQueueable     : Pos 5, 1 Bit
   +0x0b8 EnableStackSwap  : Pos 6, 1 Bit
   +0x0b8 GuiThread        : Pos 7, 1 Bit
   +0x0b8 UmsPerformingSyscall : Pos 8, 1 Bit
   +0x0b8 VdmSafe          : Pos 9, 1 Bit
   +0x0b8 UmsDispatched    : Pos 10, 1 Bit
   +0x0b8 ReservedFlags    : Pos 11, 21 Bits
   +0x0b8 ThreadFlags      : Int4B
   +0x0bc ServiceTable     : Ptr32 Void
   +0x0c0 WaitBlock        : [4] _KWAIT_BLOCK
   +0x120 QueueListEntry   : _LIST_ENTRY
   +0x128 TrapFrame        : Ptr32 _KTRAP_FRAME
   +0x12c FirstArgument    : Ptr32 Void
   +0x130 CallbackStack    : Ptr32 Void
   +0x130 CallbackDepth    : Uint4B
   +0x134 ApcStateIndex    : UChar
   +0x135 BasePriority     : Char
   +0x136 PriorityDecrement : Char
   +0x136 ForegroundBoost  : Pos 0, 4 Bits
   +0x136 UnusualBoost     : Pos 4, 4 Bits
   +0x137 Preempted        : UChar
   +0x138 AdjustReason     : UChar
   +0x139 AdjustIncrement  : Char
   +0x13a PreviousMode     : Char
   +0x13b Saturation       : Char
   +0x13c SystemCallNumber : Uint4B
   +0x140 FreezeCount      : Uint4B
   +0x144 UserAffinity     : _GROUP_AFFINITY
   +0x150 Process          : Ptr32 _KPROCESS
   +0x154 Affinity         : _GROUP_AFFINITY
   +0x160 IdealProcessor   : Uint4B
   +0x164 UserIdealProcessor : Uint4B
   +0x168 ApcStatePointer  : [2] Ptr32 _KAPC_STATE
   +0x170 SavedApcState    : _KAPC_STATE
   +0x170 SavedApcStateFill : [23] UChar
   +0x187 WaitReason       : UChar
   +0x188 SuspendCount     : Char
   +0x189 Spare1           : Char
   +0x18a OtherPlatformFill : UChar
   +0x18c Win32Thread      : Ptr32 Void
   +0x190 StackBase        : Ptr32 Void
   +0x194 SuspendApc       : _KAPC
   +0x194 SuspendApcFill0  : [1] UChar
   +0x195 ResourceIndex    : UChar
   +0x194 SuspendApcFill1  : [3] UChar
   +0x197 QuantumReset     : UChar
   +0x194 SuspendApcFill2  : [4] UChar
   +0x198 KernelTime       : Uint4B
   +0x194 SuspendApcFill3  : [36] UChar
   +0x1b8 WaitPrcb         : Ptr32 _KPRCB
   +0x194 SuspendApcFill4  : [40] UChar
   +0x1bc LegoData         : Ptr32 Void
   +0x194 SuspendApcFill5  : [47] UChar
   +0x1c3 LargeStack       : UChar
   +0x1c4 UserTime         : Uint4B
   +0x1c8 SuspendSemaphore : _KSEMAPHORE
   +0x1c8 SuspendSemaphorefill : [20] UChar
   +0x1dc SListFaultCount  : Uint4B
   +0x1e0 ThreadListEntry  : _LIST_ENTRY
   +0x1e8 MutantListHead   : _LIST_ENTRY
   +0x1f0 SListFaultAddress : Ptr32 Void
   +0x1f4 ThreadCounters   : Ptr32 _KTHREAD_COUNTERS
   +0x1f8 XStateSave       : Ptr32 _XSTATE_SAVE


Lastly, the _KAPC_ENVIRONMENT data type is an enum type:

typedef enum _KAPC_ENVIRONMENT
{
    OriginalApcEnvironment,
    AttachedApcEnvironment,
    CurrentApcEnvironment,
    InsertApcEnvironment
} KAPC_ENVIRONMENT, *PKAPC_ENVIRONMENT;

The initial attempt to translate the routine from above plainly into C/C++ code results in the following pseudo code:

eax = apc (ebp+0x8)
edx = environment (ebp+0x10)
ecx = thread (ebp+0xC)
eax->type = 0x12
eax->size = 0x30

if (edx != 2) goto 0x20

edx = ecx ->ApcStateIndex (offset 0x134)

0x20: 
eax->Thread = ecx
ecx = KernelRoutine (ebp+0x14)
eax->KernelRoutine = KernelRoutine
ecx = RundownRoutine (ebp+0x18)
eax->RundownRoutine = RundownRoutine
eax->ApcStateIndex = edx
ecx = NormalRoutine (ebp+0x1C)
eax->NormalRoutine = NormalRoutine
edx = 0
if (NormalRoutine == 0) goto 0x4C

ecx = ProcessorMode (ebp+0x20)
eax->ApcMode = ecx
ecx = NormalContext (ebp+0x24)
eax->NormalContext = ecx
goto 0x52

0x4C:
eax->ApcMode = 0
eax->NormalContext = 0

0x52:
eax->InsertedMode = 0
return

Obviously, this can be improved. The first if-statement compares the environment value to the integer value 2. According to the _KAPC_ENVIRONMENT datatype, the corresponding value for the integer 2 is CurrentApcEnvironment (starting from 0).

NTKERNELAPI VOID KeInitializeApc(
    PKAPC Apc,
    PKTHREAD Thread,
    KAPC_ENVIRONMENT Environment,
    PKKERNEL_ROUTINE KernelRoutine,
    PKRUNDOWN_ROUTINE RundownRoutine,
    PKNORMAL_ROUTINE NormalRoutine,
    KPROCESSOR_MODE ProcessorMode,
    PVOID NormalContext
    ) {
Apc->type = 0x12;
Apc->size = 0x30;
if (Environment == CurrentApcEnvironment){
Environment = Thread->ApcStateIndex;
}

init: 
Apc->Thread = Thread;
Apc->KernelRoutine = KernelRoutine;
Apc->RundownRoutine = RundownRoutine;
Apc->ApcStateIndex = Environment;
Apc->NormalRoutine = NormalRoutine;

if (NormalRoutine == 0) {
Apc->ApcMode = 0;
Apc->NormalContext = 0;
}
else {
Apc->ApcMode = ProcessorMode;
Apc->NormalContext = NormalContext;
}
Apc->InsertedMode = 0;
return;
}

If you have found any mistakes in the decompilation, I would really appreciate your feedback.
For now we have more or less stupidly translated the routine to C/C++ without understanding any of the internal mechanics, let alone what Apc actually stands for. As reverse engineering at Windows requires a deep knowledge of the Windows internals, let us approach MSDN (https://msdn.microsoft.com/de-de/library/windows/desktop/ms681951(v=vs.85).aspx):

It states:
An asynchronous procedure call (APC) is a function that executes asynchronously in the context of a particular thread. When an APC is queued to a thread, the system issues a software interrupt. The next time the thread is scheduled, it will run the APC function. An APC generated by the system is called a kernel-mode APC. An APC generated by an application is called a user-mode APC. A thread must be in an alertable state to run a user-mode APC.

Thus, our function generates a kernel-mode asynchronous procedure call (APC) based on the data submitted in the parameters. 

Saturday, 17 June 2017

Practical Reverse Engineering Exercise Solutions: Windows Kernel Routines

I am currently developing my reverse engineering skills and want to keep some important parts of this journey as well in this blog.

The first step of this series relates to disassembling Windows kernel routines, in my case Windows 7.

What are the prerequisites for this exercise?
  • Ideally, install Windows inside a virtual machine
  • From Windows Vista onwards, the Kernel debugging mode has to be enabled with: bcdedit /debug on
  • Install Debugging Tools for Windows (for example, as part of the Windows SDK - https://www.microsoft.com/en-us/download/details.aspx?id=3138 for Windows 7, which contains the Kernel Debugger (KD))
  • Install LiveKD from the SysInternals Suite 
    • IMPORTANT: the livekd.exe file should be placed in the system32 folder
Notice that since we use LiveKD, we are essentially debugging the Kernel locally without a second system. With this approach, functions cannot be debugged as LiveKD uses a Kernel read-only memory dump as a basis.

If you have any questions to the usage of KD, the best resource available to my knowledge is the Windows help file included with the Windows SDK, debugger.chm. It contains a plethora of information and is especially helpful for the command line usage. Searching is possible by using the command .hh and the command you are interested in as an argument. For example, .hh uf would display the help menu for the uf command.

In order to start the Kernel Debugger, open a command line prompt with administrative privileges and start the LiveKd executable, which will invoke the KD.exe included in the Windows SDK:


LiveKd v5.40 - Execute kd/windbg on a live system
Sysinternals - www.sysinternals.com
Copyright (C) 2000-2015 Mark Russinovich and Ken Johnson

Launching c:\Program Files\Debugging Tools for Windows (x86)\kd.exe:

Microsoft (R) Windows Debugger Version 6.12.0002.633 X86
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\livekd.dmp]
Kernel Complete Dump File: Full address space is available

Comment: 'LiveKD live system view'
Symbol search path is: srv*c:\Symbols*http://msdl.microsoft.com/download/symbo

Executable search path is:
Windows 7 Kernel Version 7601 (Service Pack 1) MP (4 procs) Free x86 compatibl
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 7601.19110.x86fre.win7sp1_gdr.151230-0600
Machine Name:
Kernel base = 0x8283c000 PsLoadedModuleList = 0x82987e30
Debug session time: Fri Jun 16 23:49:45.112 2017 (UTC - 7:00)
System Uptime: 0 days 2:01:39.545
WARNING: Process directory table base 00185000 doesn't match CR3 DF654620
WARNING: Process directory table base 00185000 doesn't match CR3 DF654620
Loading Kernel Symbols
...............................................................
................................................................
...................
Loading User Symbols

Loading unloaded module list
............
0: kd>

The Kernel Debugger has started successfully and we can proceed with our experiments.
Initially, we will disassemble the KeInitializeDpc Windows kernel routine, which is described in detail at: https://msdn.microsoft.com/en-us/library/windows/hardware/ff552130(v=vs.85).aspx

For obtaining the disassembly of this function, we will use the uf (unassemble function) command of the Kernel Debugger:


0: kd> uf keinitializedpc
nt!KeInitializeDpc:
828ddc4e 8bff            mov     edi,edi
828ddc50 55              push    ebp
828ddc51 8bec            mov     ebp,esp
828ddc53 8b4508          mov     eax,dword ptr [ebp+8]
828ddc56 33c9            xor     ecx,ecx
828ddc58 83601c00        and     dword ptr [eax+1Ch],0
828ddc5c c60013          mov     byte ptr [eax],13h
828ddc5f c6400101        mov     byte ptr [eax+1],1
828ddc63 66894802        mov     word ptr [eax+2],cx
828ddc67 8b4d0c          mov     ecx,dword ptr [ebp+0Ch]
828ddc6a 89480c          mov     dword ptr [eax+0Ch],ecx
828ddc6d 8b4d10          mov     ecx,dword ptr [ebp+10h]
828ddc70 894810          mov     dword ptr [eax+10h],ecx
828ddc73 5d              pop     ebp
828ddc74 c20c00          ret     0Ch

As mentioned on  MSDN, the function transforms a (K)DPC object, which is defined in the Windows kernel. The command for analyzing datatypes in KD is the dt (display type).

0: kd> dt nt!_kdpc
   +0x000 Type             : UChar
   +0x001 Importance       : UChar
   +0x002 Number           : Uint2B
   +0x004 DpcListEntry     : _LIST_ENTRY
   +0x00c DeferredRoutine  : Ptr32     void
   +0x010 DeferredContext  : Ptr32 Void
   +0x014 SystemArgument1  : Ptr32 Void
   +0x018 SystemArgument2  : Ptr32 Void
   +0x01c DpcData          : Ptr32 Void

I found two other blogs on the Internet that published their results as well. You should definitely have a look at them likewise:

https://zerosum0x0.blogspot.de
https://johannesbader.ch/