InCTF 2018 also consisted of an individual CTF apart from the main Attack and Defense CTF. There were 5 total challenges in this CTF. So in this post i am going to discuss the “vm” challenge that I solved during the CTF. There were multiple approaches that could have been used for this challenge since the source code of the vm was given. During the CTF i didn’t use this approach and i directly went after the flag by ignoring the VM since it was a very simple VM as source code was given but I decided to solve the challenge again and this time without using the source code because I usually have some trouble solving complex VMs (that 10th flare-on level this year with a kvm was a pain) and this looks like a good challenge to learn something new.

What is a VM?

In the context of this challenge, VM(Virtual Machine) is an application/machine that reads and executes the custom bytecode. We need to write a parser/disassembler so as to convert this bytecode to readable format/pseudo assembly.

Challenge

We are given 3 files in this challenge – net.bsd, vm, vm.c. We are not gonna look at the source in this method and will try to reverse engineer the VM. Also, “net.bsd” contains the custom bytecode which is executed by the VM. Taking a look at the .bss section it is pretty clear that there are 4 virtual registers. Analyzing the Switch Jump Table at 0x2024 and the different functions (handle_….) it is clear that there are 7 different virtual opcodes in the VM and there is a default function which is called when VM encounters an invalid opcode. Now onto the main function, the vm opens net.bsd files and reads it line by line. Each line is converted to integer and then different parts of the instruction are extracted from this integer.

Code Snippet of VM converting the integer to the different parts of the virtual instruction:

.text:0000000000001214                 mov     eax, [rbp+var_C]
.text:0000000000001217                 shr     eax, 18h
.text:000000000000121A                 mov     [rbp+var_10], eax
.text:000000000000121D                 mov     eax, [rbp+var_C]
.text:0000000000001220                 shr     eax, 14h
.text:0000000000001223                 and     eax, 0Fh
.text:0000000000001226                 mov     [rbp+var_14], eax
.text:0000000000001229                 mov     eax, [rbp+var_C]
.text:000000000000122C                 shr     eax, 10h
.text:000000000000122F                 and     eax, 0Fh
.text:0000000000001232                 mov     [rbp+var_18], eax
.text:0000000000001235                 mov     eax, [rbp+var_C]
.text:0000000000001238                 shr     eax, 0Ch
.text:000000000000123B                 and     eax, 0Fh
.text:000000000000123E                 mov     [rbp+var_1C], eax
.text:0000000000001241                 mov     eax, [rbp+var_C]
.text:0000000000001244                 and     eax, 0FFFFh
.text:0000000000001249                 mov     [rbp+var_20], eax
.text:000000000000124C                 cmp     [rbp+var_10], 6
.text:0000000000001250                 ja      loc_12E7

Some important points we can derive from our knowledge till now:

  • Each line in net.bsd contains an integer which is actually an virtual instruction.
  • Different parts of the integer are different parts of the instruction.

With the further analysis of the above code snippet we can see that the integer is divided into various parts:

The last byte that is 24th to 32nd bit of the integer are saved separately and we will refer this as 1st part further in the post. The bits from 20th to 24th bit form the 2nd part. The bits from 16th to 20th bit form the 3rd part and 12th to 16th bit form the 4th part of the instruction. 0 to 16 bit are also saved and lets say this is the 5th part of the instruction. Now lets see what these parts are and where they are used in the VM.

So now we have almost reached to the core of the VM. The 1st part that was found earlier is now compared with 6 and if its greater then handle_default is called and VM exits. So it is the 1st part which decides which function will be called and rest all the parts are actually the custom instructions or the parameters to the functions. Now since we know how the functions are called from the integer lets take a look at all the 7 functions that is the 7 opcodes supported by the VM. We will analyze all the 7 functions and will try to parse the code at last.

Functions

Analyzing all the functions will help us write a parser for this vm :

handle_add – On analysis we can see that this function expects three parameters, edi and esi contain first 2 parameters which are actually the 2nd part (20th to 24th bit) and 3rd part (16th to 20th bit) of the integer which we got in the earlier part. The third parameter is actually the 16 bit integer (5th part). If 2nd or 3rd parameter are greater than 3 then again the VM exits. And this made me remember that there are just 4 virtual registers which are stored in continuous memory as array. First two parameters are actually the indexes of 2 registers which can be further confirmed by looking at the code.

.text:000000000000136C                 mov     eax, [rbp+var_8]
.text:000000000000136F                 cdqe
.text:0000000000001371                 lea     rdx, ds:0[rax*4]
.text:0000000000001379                 lea     rax, reg
.text:0000000000001380                 mov     edx, [rdx+rax]
.text:0000000000001383                 mov     eax, [rbp+var_C]
.text:0000000000001386                 lea     ecx, [rdx+rax]
.text:0000000000001389                 mov     eax, [rbp+var_4]
.text:000000000000138C                 cdqe
.text:000000000000138E                 lea     rdx, ds:0[rax*4]
.text:0000000000001396                 lea     rax, reg
.text:000000000000139D                 mov     [rdx+rax], ecx
.text:00000000000013A0                 nop
.text:00000000000013A1                 leave
.text:00000000000013A2                 retn

So what happens in above code is the value in the virtual register at the index (2nd param) is added to the 3rd param and is stored in vreg (1st param). So its kind of like reg[1st param] = reg[2nd param] + 3rd param.

 

handle_xor – This function also has 3 parameters but this time the 2nd, 3rd and 4th parts of the integer are passed as parameters. what happens in this function is the register at first param is set equal to the xor of the data at 2nd and 3rd register.

.text:00000000000013D0                 mov     eax, [rbp+var_8]
.text:00000000000013D3                 cdqe
.text:00000000000013D5                 lea     rdx, ds:0[rax*4]
.text:00000000000013DD                 lea     rax, reg
.text:00000000000013E4                 mov     edx, [rdx+rax]
.text:00000000000013E7                 mov     eax, [rbp+var_C]
.text:00000000000013EA                 cdqe
.text:00000000000013EC                 lea     rcx, ds:0[rax*4]
.text:00000000000013F4                 lea     rax, reg
.text:00000000000013FB                 mov     eax, [rcx+rax]
.text:00000000000013FE                 mov     ecx, edx
.text:0000000000001400                 xor     ecx, eax
.text:0000000000001402                 mov     eax, [rbp+var_4]
.text:0000000000001405                 cdqe
.text:0000000000001407                 lea     rdx, ds:0[rax*4]
.text:000000000000140F                 lea     rax, reg
.text:0000000000001416                 mov     [rdx+rax], ecx
.text:0000000000001419                 nop
.text:000000000000141A                 leave
.text:000000000000141B                 retn

 

handle_out – This function has a single parameter. It expects the index of the virtual register whose data has to be printed. Only a single character is printed at a time.

handle_in This function reads a value into a virtual register. Only a single character is read at a time.

handle_mov – This function moves a 16 bit value to a register. The index of the register and the value are passed as parameter.

handle_chk – This function checks if 2 virtual registers are equal or not.

handle_hlt – Exits the VM.

This was just some brief info about the VM functions, Now as to parse the custom bytecode set we would need to convert the custom bytecode to the corresponding opcodes. So here is the format of the different opcodes which we will get after parsing the code.

  • ADD R1, R2, val – Sets R1 equal to sum of R2 and val.
  • XOR R1, R2, R3 – Sets R1 equal to xor of R2 and R3.
  • IN R1 Reads a character from the stdin to a virtual register R1.
  • OUT R1 Prints data from virtual register R1 to stdout.
  • MOV R1, val – Moves val to R1.
  • CHK R1, R2 – Checks if R1 is equal to R2 or not, Sets flag = 1 if they are unequal.
  • HLT – VM exit

So from the knowledge of all the above opcodes and their working it is pretty easy to write a parser, so here is the parser i wrote for this VM :

# virtual registers - R0, R1, R2, R3
opcodes = ["ADD", "XOR", "CHK", "OUT", "IN", "MOV", "HLT"]	# virtual opcodes

netbsd = open("net.bsd", "r")
virt_code = []

for x in netbsd:
    virt_code.append(int(x))

netbsd.close()

for inst in virt_code:
    instr_str = ""
    opcode = inst >> 24
    p2 = inst >> 20 & 0xf
    p3 = inst >> 16 & 0xf
    p4 = inst >> 12 & 0xf
    p5 = inst & 0xffff
    if opcode == 0:
        print opcodes[0] + " R" + str(p2) + ", " + "R" + str(p3) + ", " + str(hex(p5))
    elif opcode == 1:
        print opcodes[1] + " R" + str(p2) + ", " + "R" + str(p3) + ", " + "R" + str(p4)
    elif opcode == 2:
        print opcodes[2] + " R" + str(p2) + ", " + "R" + str(p3)
    elif opcode == 3:
        print opcodes[3] + " R" + str(p2)
    elif opcode == 4:
        print opcodes[4] + " R" + str(p2)
    elif opcode == 5:
        print opcodes[5] + " R" + str(p2) + ", " + str(hex(p5))
    elif opcode == 6:
        print opcodes[6]
    else:
        print "ERROR!!!!"

Code executed by the VM

By using the above script we get the code executed by the VM:

MOV R1, 0x45                    ; moves 0x45 ('E') to R1
OUT R1                          ; prints 'E'
MOV R1, 0x6e                    ; moves 0x6e ('n') to R1
OUT R1                          ; prints 'n'
MOV R1, 0x74                    ; moves 0x74 ('t') to R1
OUT R1                          ; prints 't'
MOV R1, 0x65                    ; moves 0x65 ('e') to R1
OUT R1                          ; prints e
MOV R1, 0x72                    ; moves 0x72 ('r') to R1
OUT R1                          ; prints 'r'
MOV R1, 0x20                    ; moves 0x20 (' ') to R1
OUT R1                          ; prints ' '
MOV R1, 0x70		
OUT R1                          ; prints 'P'
MOV R1, 0x61		
OUT R1                          ; prints 'a'
MOV R1, 0x73		
OUT R1                          ; prints 's'
MOV R1, 0x73		
OUT R1                          ; prints 's'
MOV R1, 0x20
OUT R1                          ; prints ' '
MOV R1, 0x3a
OUT R1                          ; prints ':'
MOV R2, 0x20
OUT R2                          ; prints ' '
MOV R0, 0x69                    ; Moves 0x69 to R0
IN R1                           ; Reads a char from stdin to R1
CHK R0, R1                      ; Checks if input char is equal to 0x69 ('i')
MOV R1, 0x6e
IN R2
CHK R1, R2                      ; Checks if input is equal to 0x6e ('n')
MOV R2, 0x63
IN R1
CHK R1, R2                      ; Checks if input == 'c'
MOV R0, 0x74
IN R2
CHK R0, R2                      ; Checks if input == 't'
MOV R0, 0x66
IN R2
CHK R0, R2                      ; Checks if input == 'f'
MOV R0, 0x7b
IN R1
CHK R0, R1                      ; Checks if input == '{'
MOV R0, 0x26                    ; moves 0x26 to R0
IN R1                           ; input char to R1
MOV R2, 0x17                    ; moves 0x17 to R2
XOR R2, R2, R1                  ; XOR R2 and R1 and stores result in R2
CHK R0, R2                      ; checks R0 == R2, this gives us the char  0x26 ^ 0x17 == 0x31 ('1') 
MOV R0, 0x15
IN R1
MOV R2, 0x7b
XOR R2, R2, R1
CHK R0, R2                      ; 0x15 ^ 0x7b == 0x6e ('n')
MOV R0, 0x7b
IN R1
MOV R2, 0x38
XOR R2, R2, R1
CHK R0, R2                      ; 0x7b ^ 0x38 == 0x43 ('C')
MOV R0, 0x55
IN R1
MOV R2, 0x62
XOR R2, R2, R1
CHK R0, R2                      ; 0x55 ^ 0x62 == 0x37 ('7')
MOV R0, 0x1b
IN R1
MOV R2, 0x5d
XOR R2, R2, R1
CHK R0, R2                      ; 0x1b ^ 0x5d == 0x46 ('F')
MOV R0, 0x72
IN R1
MOV R2, 0x2d
XOR R2, R2, R1
CHK R0, R2                      ; 0x72 ^ 0x2d == 0x5f ('_')
MOV R0, 0x7a
IN R1
MOV R2, 0x38
XOR R2, R2, R1
CHK R0, R2                      ; 0x7a ^ 0x38 == 0x42 ('B')
MOV R0, 0x7a
IN R1
MOV R2, 0x49
XOR R2, R2, R1
CHK R0, R2                      ; 0x7a ^ 0x49 == 0x33 ('3')
MOV R0, 0x7
IN R1
MOV R2, 0x54
XOR R2, R2, R1
CHK R0, R2                      ;  0x7 ^ 0x54 == 0x53 ('S')
MOV R0, 0x23
IN R1
MOV R2, 0x57
XOR R2, R2, R1
CHK R0, R2                      ; 0x23 ^ 0x57 == 0x74 ('t')
MOV R1, 0x7d
IN R2
CHK R1, R2                      ; checks R2 == '}' ?
HLT                             ; VM exits

Since the VM only supports 7 virtual opcodes and 4 virtual registers and in the above instructions the ADD opcode and 4th register are not even used, So the analysis of the code is really easy. The comments in the parsed code can help us to understand what the VM is doing.

The flag that we get from code is – inctf{1nC7F_B3St}

This is pretty good challenge to play with if you want to get started with VMs.

Categories: CTFsWriteups