0x05 - CSAW 2019 beleaf
Today’s sample was one of the RE challenges during CSAW 2019. CSAW is a beginner-friendly CTF, but I found this challenge was not so simple.
| binary | beleaf |
|---|---|
| sha256 | bb28a152966bed0a369f30149a912982ea33b408794bfbd82e73c87ff4e184ff |
Running file reveals that this is a 64-bit position-independent ELF executable,and that it is stripped. This means debugging information has been removed, as well as symbols and other stuff that would help us reverse this sample :(
file beleaf
beleaf: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=6d305eed7c9bebbaa60b67403a6c6f2b36de3ca4, stripped
As we can see from readelf, the symbol table has been removed:
❯ readelf -S beleaf | grep '.symtab'
That means we won’t find any helpful function and variable names, and most importantly I didn’t find any function named main when I opened up the file in Ghidra. So, where to start?
Sometimes if it’s not immediately clear where I should start looking, I will check the strings using radare2. Most programs display some sort of output to the user, especially usage output, and strings in the .rodata section is a pretty common place for that stuff to be found. Ghidra can also search for strings, but I’ve found that radare2 sometimes does a better job for quick searches:
❯ rabin2 -z beleaf
[Strings]
nth paddr vaddr len size section type string
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0 0x00000a64 0x00000a64 19 20 .rodata ascii Enter the flag\n>>>
1 0x00000a7b 0x00000a7b 10 11 .rodata ascii Incorrect!
2 0x00000a86 0x00000a86 8 9 .rodata ascii Correct!
3 0x00001020 0x00201020 7 32 .data utf32le wf{_ny}
From the output above, I think good place to start would be to figure out which function is printing the “Enter the flag” string. It will likely lead us to the function that takes input and checks our key validation algorithm.
The address 00000a64 in the rabin2 output is the file offset of the string. When the file is loaded into memory by the OS, it’ll reside at the image base address + that offset. Since this is a PIE, the image base isn’t static at runtime. None of this is really important for static analysis, but if we were debugging this program in GDB, we’d need to figure out the image base address in order to jump to the offset of this string.
Ghidra uses 0x01000000 by default as the image base address, so the string in question is located at 0x01000000 + 0xa46, and we can use that information to jump to the string in Ghidra. Following the references to that string leads to a function called FUN_001008a1, which seems to contain most of the functionality for the key validation, so I rename it to main. The first important part is this:
|
|
What’s happening here is two library calls, one to scanf, and the other to strlen, and then a check to see if the value of sVar1 is less than 0x21, which is 33 in decimal.
The call to scanf takes two parameters. If we follow DAT_00100a78 in Ghidra, we find that it’s a hard-coded format string stored in memory, which contains the string %s, null terminated, so I’ll call that format_string.
The local_98 variable is declared earlier in the function as a char array of length 136. Since scanf is an unsafe function, this means that any input larger than 136 will crash the program. I thought that maybe this was a potential issue worth digging into, however, the program has been compiled with stack smashing protection, which we can verify using checksec:
❯ checksec --file=./beleaf --format=json | json_pp
{
"./beleaf" : {
"canary" : "yes",
"fortified" : "0",
"fortify-able" : "1",
"fortify_source" : "no",
"nx" : "yes",
"pie" : "yes",
"relro" : "full",
"rpath" : "no",
"runpath" : "no",
"symbols" : "no"
}
}
This is a reversing, not an exploitation challenge, and so the goal here is probably not to overwrite this buffer. The rest of the function uses the strlen function to get the length of the user submitted string, which I’ll call input_length
The next part of the function does a little bit more:
|
|
This code loops through the input buffer starting at 0, and continuing to the end. It sets a local variable lVar2 using another functioned called FUN_001007fa.
It then compares the result of FUN_001007fa call to a 64-bit value which is stored, along with some others, at an array in DAT_003014e0. It uses the same index (local_b0) for indexing the input buffer, and the DAT_ data. By deductive reasoning, we can guess some more things:
FUN_001007famust have to return a 64-bit value which it somehow derives from a single byte (character)DAT_003014e0contains a sequence of 33 “target” values, which I’ll callexpected_values
So, if we can take these 33, 64-bit values from the area that DAT_003014e0 points to, and perform whatever transformation FUN_001007fa is doing but in reverse and check them against these values, we’ll be able to get our key.
|
|
So what about FUN_001007fa? What this function is actually doing is searching a binary tree that exists in memory, looking for a given character, and returning the index. We can tell, because it’s using the input character to iterate over some data at DAT_00301020based on whether the input character is higher or lower than the character at the given index, and it’s stopping when the input character is matched. This might seem convoluted, but it’s really just a way to store data in a way that it’s easily retrievable… and we don’t have to really work any of this out for ourselves because the program gives us the algorithm.
Now that we sort of understand what’s happening, we can rename this function and variables to make it clearer:
|
|
The total set of each characters we can use for our flag can be called our “character set”, which is stored as a binary tree. We need to supply set of 33 characters that, when looked up in our binary tree, results in the same set of 33 indexes stored in expected_values.
Everything in the expected_values array is stored as 8-byte (64-bit) values, and to make sure we understand what’s going on, we can print them out using this go function to confirm they show up in the same little-endian format as they do in Ghidra:
expected_values := []uint64{0x01, 0x09, 0x11, 0x27, 0x02, 0x00, 0x12, 0x03, 0x08, 0x12, 0x09, 0x12, 0x11, 0x01, 0x03, 0x13, 0x04, 0x03, 0x05, 0x15, 0x2E, 0x0A, 0x03, 0x0A, 0x12, 0x03, 0x01, 0x2E, 0x16, 0x2E, 0x0A, 0x12, 0x06}
for _, val := range expected_values {
buf := make([]byte, 8)
binary.LittleEndian.PutUint64(buf, val)
fmt.Printf("0x%x\n", buf)
}
The binary tree is stored as a linear sequence of 32-bit values, which is why the search function is able to iterate over them by using the i = i * 2 + 1 and i = (i + 1) * 2 indexes. Each branch in the tree is terminated by 0xFFFFFFFF, which just means that that particular node has no children. I created some more go code to print all the characters that are stored in the tree:
binary_tree := []uint32 {0x0000077, 0x00000066, 0x0000007B, 0x0000005F, 0x0000006E, 0x00000079, 0x0000007D, 0xFFFFFFFF, 0x00000062, 0x0000006C,
0x00000072, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x00000061, 0x00000065, 0x00000069, 0xFFFFFFFF, 0x0000006F,
0x00000074, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x00000067,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF}
for _, val := range binary_tree {
fmt.Printf("%q\n", rune(val))
}
The key to this challenge is the binary search function, because it tells us exactly how to look up a given character in the binary tree and retrieve an index which we can check against the expected values.
We have enough information to brute force the solution one character at a time:
- Generate a list of the allowed characters in the flag
- For each character in the provided flag, 0 to 33, search the binary_tree array for the character using the algorithm provided
- If the lookup for the provided character results in the expected value, print the character
I wrote the logic for this up as a Go package:
package solver
import (
"fmt"
)
var binary_tree = []uint32{
0x00000077, 0x00000066, 0x0000007B, 0x0000005F, 0x0000006E,
0x00000079, 0x0000007D, 0xFFFFFFFF, 0x00000062, 0x0000006C,
0x00000072, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0x00000061, 0x00000065, 0x00000069,
0xFFFFFFFF, 0x0000006F, 0x00000074, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x00000067,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0x00000075}
var expected_values = []uint64{
0x01, 0x09, 0x11, 0x27, 0x02, 0x00, 0x12, 0x03, 0x08, 0x12,
0x09, 0x12, 0x11, 0x01, 0x03, 0x13, 0x04, 0x03, 0x05, 0x15,
0x2E, 0x0A, 0x03, 0x0A, 0x12, 0x03, 0x01, 0x2E, 0x16, 0x2E,
0x0A, 0x12, 0x06}
func generate_characters() string {
res := []rune{}
for _, val := range binary_tree {
if val != 0xFFFFFFFF {
res = append(res, rune(val))
}
}
return string(res)
}
func search_bintree(c rune) int {
i := 0
for c != rune(binary_tree[i]) {
if c < rune(binary_tree[i]) {
i = (i * 2) + 1
} else {
i = (i + 1) * 2
}
}
return i
}
func Solve() {
for i := range 33 {
for _, c := range generate_characters() {
a := search_bintree(c)
if a == int(expected_values[i]) {
fmt.Printf("%c", c)
break
}
}
}
fmt.Println()
}
When we call solver.Solve() from a main, the program executes, and we get the key:
❯ go run ./cmd/main.go
flag{we_beleaf_in_your_re_future}