Wednesday, 20 September 2017

Practical Reverse Engineering Exercise Solutions: Page 35 / Exercise 11

Read the Virtual Memory chapter in Intel Software Developer Manual, 
Volume 3 and AMD64 Architecture Programmer’s Manual, Volume 2: System Programming. Perform a few virtual address to physical address translations yourself and verify the result with a kernel debugger. Explain how data execution prevention (DEP) works.

For this exercise, we first have to set up a remote kernel debugging session. (see https://codemetrix.net/windows-kernel-debugging-setup/https://securityblog.gr/3253/debug-user-mode-processes-using-a-kernel-debugger/ and http://securityblog.gr/3023/windows-kernel-debugging/ for excellent explanations)
Local kernel debugging is not an option in this case since examining register contents requires remote kernel debugging.
As a reminder, WinDbg has two different commands for inspecting memory contents:

d* commands (e.g., db): Display memory data at a specified virtual address
!d* commands (e.g., !db): Display data at a specified physical address

We perform a couple of translations from virtual addresses to physical addresses:

1) Virtual address: 0x8283c054

kd> db 8283c054 L8
8283c054  72 6f 67 72 61 6d 20 63 rogram c

The binary representation of this address yields:

10000010100000111100000001010100‬

By splitting this binary representation into groups, we obtain the indices for the PT:

‭10 000010100 000111100 000001010100‬

Index into page directory pointer table (PDPT): 10  (0x2)
Index into page directory (PD): 000010100 (0x14)
Index into  page table (PT):000111100 (0x3C)
Page offset: 000001010100‬ (0x54)

kd> r @cr3
cr3=00185000

The CR3 register contains the base address of the page directory pointer table (PDPT).
We continue to calculate the PDPT entry:

kd> !dq @cr3+2*8 L1
#  185010 00000000`00188001

Of course, we could employ another program for converting, such as hexadecimal to binary representations.
But WinDbg has already implemented a command for this very purpose, namely .formats.

kd> .formats 00000000`00188001
Evaluate expression:
  Hex:     00188001
  Decimal: 1605633
  Octal:   00006100001
  Binary:  00000000 00011000 10000000 00000001
  Chars:   ....
  Time:    Mon Jan 19 15:00:33 1970
  Float:   low 2.24997e-039 high 0
  Double:  7.93288e-318

The bottom 12 bits of the PDPT entry have to be cleared, which yields:

00000000 00011000 10000000 00000000 

Converted to hex: 0x188000

The page directory entry is in turn calculated by adding the index into PD to the PDPT base:

kd> !dq 188000+0x14*8 L1
#  1880a0 00000000`001d0063

Converted to binary: 00000000 00011101 00000000 01100011

Similarly to the previous step, we have to clear the lowest 12 bits to calculate the base of the page table (PT):

00000000 00011101 00000000 00000000

Converted to hex: 0x‭1D0000

The page table entry is calculated as follows:

kd> !dq 1D0000+0x3C*8 L1
#  1d01e0 00000000`0283c963

Converted to binary: 00000010 10000011 11001001 01100011

Similarly to the previous step, we clear the lowest 12 bits and obtain:

00000010 10000011 11000000 00000000

Converted to hex: 0x283C000

Finally, we can add the page offset to the page entry base to calculate the page's physical address:

kd> !db 0x283C000+0x54 L8
# 283c054 72 6f 67 72 61 6d 20 63 rogram c .L.....

We can confirm the contents of physical address 0x283c054 and virtual address 0x8283c054 are identical:

kd> db 0x8283c054 L8
8283c054  72 6f 67 72 61 6d 20 63 rogram c

As the manual calculation is a rather cumbersome process, WinDbg has already implemented a function to calculate the physical address of a virtual address. In addition to the virtual address, it requires the base address of the page directory pointer table as an input (https://zerosum0x0.blogspot.de/2015/01/practical-reverse-engineering-p-36-11.html)

kd> !vtop 00185000 8283c054
X86VtoP: Virt 8283c054, pagedir 185000
X86VtoP: PAE PDPE 185010 - 0000000000188001
X86VtoP: PAE PDE 1880a0 - 00000000001d0063
X86VtoP: PAE PTE 1d01e0 - 000000000283c963
X86VtoP: PAE Mapped phys 283c054
Virtual address 8283c054 translates to physical address 283c054.

Nevertheless, we perform another translation from virtual to physical address manually:

2) Virtual address: 0x83e31738 

kd> db 83e31738 L8
83e31738  41 db e8 59 e9 0c 00 00

Binary representation of virtual address:

‭10 000011111 000110001 011100111000‬

PDPT Index: 0x2
PD Index: 0x1F
PT Index: 0x31
Page Offset: 0738

kd> r @cr3
cr3=00185000

PDPT base: 
kd> !dq @cr3+0x2*8 L1
#  185010 00000000`00188001

Binary: 110001000000000000001
Truncated: 110001000000000000000 (0x188000)

PD Base: 
kd> !dq 188000+0x1F*8 L1
#  1880f8 00000000`1ff80863

Binary: 11111111110000000100001100011
Truncated: 11111111110000000000000000000 (0x1FF80000)

PT Base:
kd> !dq 0x1FF80000+0x31*8 L1
#1ff80188 00000000`1fc31963

Binary: 11111110000110001100101100011
Truncated: 11111110000110001000000000000 (0x1FC31000)

Physical address and contents:
kd> !db 1FC31000+0x738 L8
#1fc31738 41 db e8 59 e9 0c 00


How does DEP work?

Processors supporting the PAE (Physical Address Extension) can set a flag in the page directory entry, which specifies whether or not the contents of the page can be executed. More precisely, bit 63 of the page directory entry sets the NX flag, which control whether or not instruction fetches from the memory region controlled by this entry are allowed.

Sunday, 17 September 2017

Practical Reverse Engineering Exercise Solutions: Page 35 / Exercise 10

Our task:

If the current privilege level is encoded in CS, which is modifiable by user-mode code, why can’t user-mode code modify CS to change CPL?

For a change, this is now a more theoretical than hands-on challenge.
In order to address the exercise appropriately, we have to make sure we understood it correctly.
CS (code segment) is the CPU segment register that contains the current ring level in bits 0 and 1. This encoded level is also commonly referred to as CPL (current privilege level).
More information is provided in the Intel 64 and IA-32 Architectures Software Developer's Manual at https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf.

We learn that the processor fetches instructions from the code segment by using a logical address consisting of both the value in the CS register and the value in the EIP register. EIP contains the offset within the code segment of the next instruction to be executed.
Furthermore, the Intel Manual sets out:

"The CS register cannot be loaded explicitly by an application program. Instead, it is loaded implicitly by instructions or internal processor operations that change program control (such as, procedure calls, interrupt handling, or task switching)."

This is very similar to the EIP register, which is only modified indirectly when executing instructions such as JMP, RET or CALL.

Therefore, the CS register can only be modified with instructions such as INT, SYSCALL/SYSRET (64bit only) or SYSENTER/SYSEXIT.

With the instruction SYSENTER, user-mode code (i.e. CPL 3) can access special operating system (kernel-mode) code at privilege level 0. For instance, special privileged instructions can be accessed at CPL 0, such as the LGDT instruction to load the GDT register.

Friday, 15 September 2017

Practical Reverse Engineering Exercise Solutions: Page 35 / Exercise 9

Our task:
Sample L. Explain what function sub_1000CEA0 does and then decompile it back to C.

Here we have the function's disassembly:

                push    ebp
                mov     ebp, esp
                push    edi
                mov     edi, [ebp+8]
                xor     eax, eax
                or      ecx, 0FFFFFFFFh
                repne scasb
                add     ecx, 1
                neg     ecx
                sub     edi, 1
                mov     al, [ebp+0Ch]
                std
                repne scasb
                add     edi, 1
                cmp     [edi], al
                jz      short loc_1000CEC7
                xor     eax, eax
                jmp     short loc_1000CEC9

loc_1000CEC7:                       
                mov     eax, edi

loc_1000CEC9:              
                cld
                pop     edi
                leave
                retn
endp


Firstly, the function takes two arguments, at ebp+0x8 (arg1) and ebp+0x0C (arg2) respectively. It follows the stdcall convention that arguments are pushed from right to left on the stack and the callee cleaning up the stack.


Presumably, arg1 is a string as the function makes use of the scasb function.
As I keep constantly forgetting about the meaning of scasb in conjunction with repne, here is an excellent refresher from stackoverflow, which I shamelessly copy/paste (https://reverseengineering.stackexchange.com/questions/2774/what-does-the-assembly-instruction-repne-scas-byte-ptr-esedi, thanks peter ferrie):

The SCAS instruction is used to scan a string (SCAS = SCan A String). It compares the content of the accumulator (ALAX, or EAX) against the current value pointed at by ES:[EDI].
When used together with the REPNE prefix (REPeat while Not Equal), SCAS scans the string searching for the first string element which is equal to the value in the accumulator.
As we put simply the value 0x0 in the eax register, the function searches for the first occurrence of a null byte value in arg1. Meanwhile, it increments the value of edi and decrements ecx for every compared character. When the null byte has been found, the value in ecx is incremented by one and a bitwise not operation is performed, to get the two's complement value of ecx.
In other words, we thereby obtain the length of the string stored in arg1 (including the trailing null byte).

The function continues to store the byte value of arg2 (i.e. type char) in the register al and uses the interesting function std, which I haven't heard of yet. The std command in assembly sets the direction flag, which actually reverses the way string operations such as scasb work. Instead of incrementing the value stored in edi for every processed character, edi is being decremented and the string therefore is processed from the end to start.

In order to ignore the null byte, edi is decremented beforehand. Afterwards, repne scasb is performed again to search for the last occurrence of the character arg2 in the string arg1. Note that it is crucial for ecx to hold the length of the string at the start of the respne scasb procedure, as otherwise the function would have no knowledge when the inspected string in edi ends.

When the repne scasb function is completed, the value of edi is incremented and the character compared to the passed arg2 value. If it matches, we have found the last occurrence of arg2 in the string arg1 and the function returns the pointer to the corresponding memory address.
In the other case, a null value is returned. Furthermore, it is worthwhile to mention that the operation cld is invoked to clear the previously set direction flag.

Finally, we provide a C-decompilation of the function with more comprehensible variable names:

char* getLastOccurrenceOfCharacter(char* string, char key) {
int countChars = 0;
while (*string) {
countChars++;
string++;
}

while (countChars) {
if (key == *string) {
return string;
}
countChars --;
string --;
}

return 0;
}

UPDATE:

Unfortunately, there is a bug in the disassembly from above. When the last and only occurrence of key in string is at the very first character, the function will not return the pointer correctly. Therefore, we have to adjust the second while-loop to take into consideration position 0 as well:

char* getLastOccurrenceOfCharacter(char* string, char key) {
int countChars = 0;
while (*string) {
countChars++;
string++;
}

while (countChars >= 0) {
if (key == *string) {
return string;
}
countChars --;
string --;
}

return 0;
}

Thursday, 14 September 2017

Practical Reverse Engineering Exercise Solutions: Page 35 / Exercise 8

Our task as formulated in exercise 8:

Sample H. Decompile sub_11732 and explain the most likely programming construct used in the original code.

The function's disassembly:

sub_1172E:
push    esi
mov     esi, [esp+8]
dec     esi
jz      short loc_1175F
dec     esi
jz      short loc_11755
dec     esi
jz      short loc_1174B
sub     esi, 9
jnz     short loc_1176B
mov     esi, [eax+8]
shr     esi, 1
add     eax, 0Ch
jmp     short loc_11767
; ---------------------------------------------------------------------------

loc_1174B:                             
mov     esi, [eax+3Ch]
shr     esi, 1
add     eax, 5Eh
jmp     short loc_11767
; ---------------------------------------------------------------------------

loc_11755:                           
mov     esi, [eax+3Ch]
shr     esi, 1
add     eax, 44h
jmp     short loc_11767
; ---------------------------------------------------------------------------

loc_1175F:                             
mov     esi, [eax+3Ch]
shr     esi, 1
add     eax, 40h

loc_11767:                             
  
mov     [ecx], esi
mov     [edx], eax

loc_1176B:                             
pop     esi
retn    4

Obviously, the sought-after programming construct in this case is a switch...case statement. Translating the assembly code from above in pseudo-C-code yields:

function(eax, ecx, edx, enum)
{
switch (enum):
case 1: 
goto 5F;
case 2: 
goto 55;
case 3: 
goto 4B;
case 12:
var = *(eax+8);
var >> 1; // equal to var / 2
eax = eax + 0x0C
goto 67;
default:
goto 6B;


4B:
var = *(eax+0x3C)
var >> 1; // equal to var / 2
eax = eax + 0x5E;
goto 67;


55:
var = *(eax+0x3C)
var >> 1; // equal to var / 2
eax = eax + 0x44;
goto 67;

5F:
var = *(eax+0x3C)
var >> 1; // equal to var / 2
eax = eax + 0x40;

67:
*ecx = var;
*edx = eax;

6B:
return eax;
}