## Tuesday, 5 December 2017

### Practical Reverse Engineering Exercise Solutions: Page 79 / Exercise 8

Exercise 8 on page 79 of the book Practical Reverse Engineering specifies the following ARM disassembly of a function called mystery8:

01:             mystery8
02: 2D E9 78 48   PUSH.W   {R3–R6,R11,LR}
03: 0D F2 10 0B   ADDW     R11, SP, #0x10
04: 0C 4E         LDR      R6, =byteArray
05: 09 E0         B        loc_100E34C

06:      loc_100E338
07: 05 78         LDRB     R5, [R0]
08: 01 3A         SUBS     R2, #1
09: 4D B1         CBZ      R5, loc_100E352
10: 0B 78         LDRB     R3, [R1]
11: 9C 5D         LDRB     R4, [R3,R6]
12: AB 5D         LDRB     R3, [R5,R6]
13: A3 42         CMP      R3, R4
14: 04 D1         BNE      loc_100E352
15: 01 30         ADDS     R0, #1
16: 01 31         ADDS     R1, #1

17:      loc_100E34C
18: 00 2A         CMP      R2, #0
19: F3 DC         BGT      loc_100E338
20: 01 3A         SUBS     R2, #1

21:      loc_100E352
22: 00 2A         CMP      R2, #0
23: 01 DA         BGE      loc_100E35A
24: 00 20         MOVS     R0, #0
25: 04 E0         B        locret_100E364

26:      loc_100E35A
27: 0B 78         LDRB     R3, [R1]
28: 9A 5D         LDRB     R2, [R3,R6]
29: 03 78         LDRB     R3, [R0]
30: 9B 5D         LDRB     R3, [R3,R6]
31: 98 1A         SUBS     R0, R3, R2

32:      locret_100E364
33: BD E8 78 88   POP.W    {R3–R6,R11,PC}
34:             ; End of function mystery8

The function was compiled in Thumb mode, as we can see from the presence of 16 bit instructions, PUSH and POP instructions and Thumb-specific instructions, e.g. CBZ and instructions with the .W suffix.

It apparently takes three input arguments in R0, R1 and R3. Both R0 and R1 are pointers to a structure from which successive single bytes are read (lines 7, 10, 27 and 29), so we assume type char* for these two arguments (i.e. a string). R2, on the other hand, is a numerical value being decreased (lines 8, 20) and compared to 0. Thus, we assume it is a signed 32 bit integer value and used for counting.

The function either returns 0 (lines 24 and 25) or performs a subtraction involving R3 and R2 (line 31).  Therefore, we assume the return value to also be a 32 bit integer value.

Our provisional function prototype is as follows:

sint32 mystery8 (char* arg1, char* arg2, sint32 arg3);

As mentioned in the description, the variable byteArray is an array of 256 byte values, i.e. byteArray[] = {0,1,..,0xFE, 0xFF}

Due to the multitude of conditionals and load operations, it is in our opinion easier to first decompile it and afterwards analyze in detail what its purpose could potentially be. While the following C program can certainly be optimized, the resemblance to the disassembly could make it easier to understand:

sint32 mystery8 (char* arg1, char* arg2, sint32 arg3) {
while (arg3 > 0) { //lines 18 and 19
arg3 --; // line 8

if (*arg1 == 0) { // line 9
break;
}

if (byteArray[*arg1] != byteArray[*arg2]) { // lines 10-14
break;
}

arg1++; // line 15
arg2++; // line 16
}

if ((arg3-1) < 0) { // lines 8,20,22,23
return 0;
}

return byteArray[*arg1] - byteArray[*arg2]; // lines 27-31

}

The function takes two string values and inputs and uses their single characters for indexing into the array byteArray. However, for each possible index in the range 0, 1, ... , 0xFF, the value of byteArray[index] is exactly the index itself. So it is not entirely clear why the byteArray is actually used. We feel it can be completely omitted for simplification:

sint32 mystery8 (char* arg1, char* arg2, sint32 arg3) {
while (arg3 > 0) { //lines 18 and 19
arg3 --; // line 8

if (*arg1 == 0) { // line 9
break;
}

if (*arg1 != *arg2) { // lines 10-14
break;
}

arg1++; // line 15
arg2++; // line 16
}

if ((arg3-1) < 0) { // lines 8,20,22,23
return 0;
}

return *arg1 - *arg2; // lines 27-31

}

The central loop always compares the same positions of arg1 and arg2 and exits if and only if one of the following conditions is satisfied:

1. The counter variable arg3 reaches 0 OR
2. We have reached the last character of the first string arg1, i.e. the null character OR
3. We have found a character in the strings arg1 and arg2 that do not match.
Before exiting the function, the counter variable is decremented and checked for non-negativity. If the value is negative, we know that either arg3 was originally <= 0 or we have exited the loop because arg3 was decremented until it reached 0. The return value in this case is 0.

If it is not negative, we know the counter variable was not the limiting factor, but rather the input strings did not match at a certain character position or the end of the first string argument was reached. The return value in this case is the difference between the character values of the input strings arg1 and arg2 at the lastly changed position. Thus, the possible return values are:
• 0 when *arg1 == *arg2 (i.e. the characters are equal)
• < 0 when *arg1 < * arg2 (i.e. the character of arg1 is numerically smaller than the character of arg2)
• >0 when *arg1 > *arg2 (i.e. the character of arg1 is numerically greater than the character of arg2)
Lastly, let us consider a few examples
1. mystery8("abc","abc",5) returns: 0 (because *arg1 - *arg2 = 0 -0 = 0)
2. mystery8("abc","abc",2) returns: 0 (because the counter variable reached the limit)
3. mystery8("abb","aba",5) returns: 1 (because *arg1 - *arg2 = 'b' - 'a' = 0x62 - 0x61 = 1)
4. mystery8("abb","aba",2) returns: 0 (because the counter variable reached the limit)
5. mystery8("aba","abb",5): returns: -1 (because *arg1 - *arg2 = 'a' - 'b' = 0x61 - 0x62 = -1)
6. mystery8("aba","abb",2): returns: 0 (because the counter variable reached the limit)
To conclude, we can summarize the function behaviour in more descriptive words. The function takes two strings and a limiting position variable as inputs and compares the input strings for equality from the first character up to the position imposed by the limit variable. If no difference in the strings is found up to the limiting position, it returns 0. When a difference is found, it returns the numerical difference between the first mismatching character of the input strings.

Lastly, we provide a version of mystery8 with more descriptive names:

sint32 mystery8 (char* str1, char* str2, sint32 limit) {
while (limit > 0) { //lines 18 and 19
limit --; // line 8

if (*str1 == 0) { // line 9
break;
}

if (*str1 != *str2) { // lines 10-14
break;
}

str1++; // line 15
str2++; // line 16
}

if ((limit-1) < 0) { // lines 8,20,22,23
return 0;
}

return *str1 - *str2; // lines 27-31

}