Monday, November 18, 2013

Binaries and Process Tracing

A little bit about Linux Programs

The Linux ABI (Application Binary Interface) is used to bind an executable to its imported functions at runtime through several functions provided by the libc sysdeps and the linux linker (ld). For example, when a programmer writes code that contains a call to “printf”, the ABI is responsible for extracting a pointer (in the form of a memory address) from libc.so, then writing it into the executable's import table so that it can be called from the executable more practically. The Program Interpreter is a component that can be specified to the ABI for customized executable formats. All dynamically-linked linux applications have what is called an INTERP header (or .interp), you can see this using the command line utility readelf, like so:

user@host $ grep interpreter <(readelf -a $(which ls))
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]

Because I am using a 64-bit system for this demonstration, all of my dynamically linked binaries in my testing environment specify /lib64/ld-linux-x86-64.so.2, on a 32-bit system, executables will specify a 32-bit counterpart.

A little about process tracing

Process tracing in a linux environment can be performed using several different debugging tools-- namely strace, ltrace, ftrace, and interactive debuggers (such as gdb). While strace is an excellent tool for monitoring I/O and certain system calls, it falls short around shared object monitoring capability. That is why ltrace and ftrace were born: they are able to show the actual function calls as they occur from a process to shared objects (*.so files) imported by the executable. This allows administrators and programmers trying to debug issues with an application to determine where in its calls to shared objects things begin to go wrong. Process tracing and debuggers can also be helpful for malware analysis and detection. As such, attackers frequently target these utilities to find evasion methodology amongst other bugs (debugger exploits, anyone?)

Self-linking code

When I wrote the dynamic engine for shellcodecs, I implemented my own version of program interpretation. Why? Because there is no guarantee that a given executable will have required functions in its import table for shellcode to run properly. So, I wrote a piece of assembly code capable of parsing an ELF64 shared object to isolate pointers to the functions I wanted to call, similar to dlsym() from libdl. Recently I was entertaining the idea of writing an all-assembly rootkit, so I checked into how calls made by the shellcodecs engine were handled by different tracing methods. I put together a couple of programs to see how things got handled and what information actually got revealed by tracing the processes. I got some pretty interesting results.

Test programs and results

My test programs were relatively simple. Here is a normal set of C code that prints “ohai” and then calls exit(2), and its correlating ltrace output:

#include <stdio.h>
#include <dlfcn.h>
#include <stdlib.h>

int main(void) {
    printf("ohai");
    exit(2);
}

And its ltrace output:

user@host $ ltrace ./ltrace-test
__libc_start_main(0x400544, 1, 0x7fff70b45d88, 0x400570, 0x400600 
printf("ohai")                                                                               = 4
exit(2ohai 
+++ exited (status 2) +++

Notice the tracer caught the call to printf as well as the call to exit. It shows both exit(2) as well as "exited (status 2)". This is an important distinction for our next test:

#include <stdio.h>
#include <dlfcn.h>
#include <stdlib.h>

// Compile: gcc ltraced.c -o ltraced -ldl

int main(void)
{
    void *libc;
    int (*putstr)(char *);
    int (*exitp)(int);
    libc = dlopen("/lib/i386-linux-gnu/i686/cmov/libc.so.6",RTLD_LAZY);
    *(void **)&putstr = dlsym(libc,"puts");
    *(void **)&exitp  = dlsym(libc,"exit");
    putstr("ohai");
    exitp(2);
}

And its ltrace results:

user@host $ ltrace ./ltraced
__libc_start_main(0x400594, 1, 0x7fff36ae94b8, 0x400610, 0x4006a0 
dlopen("/lib/i386-linux-gnu/i686/cmov/li"..., ) = NULL
dlsym(NULL,"puts")                              = 0x7f400a7e0ce0
dlsym(NULL,"exit")                              = 0x7f400a7ab970
ohai
+++ exited (status 2) +++

Notice this time it didn't actually catch the call to exit or puts itself -- it only catches the calls to dlsym and dlopen -- but it doesnt catch the calls to puts() or exit() themselves. Reason being, puts() and exit() never appear in the binary's import table, as you can see with the following:

user@host $ objdump -R ./ltraced

./ltraced:     file format elf64-x86-64

DYNAMIC RELOCATION RECORDS
OFFSET           TYPE              VALUE 
0000000000600fe0 R_X86_64_GLOB_DAT  __gmon_start__
0000000000601000 R_X86_64_JUMP_SLOT  __libc_start_main
0000000000601008 R_X86_64_JUMP_SLOT  dlopen
0000000000601010 R_X86_64_JUMP_SLOT  dlsym

Implications and further testing

Since I realized ltrace was only capable of tracing functions in the executable's import table, I wondered if its possible to completely evade ltrace for called functions with an assembly application. The results were phenomenal.

user@host $ ltrace ./full_import_test 
__libc_start_main(0x400554, 1, 0x7fff92666938, 0x400690, 0x400720Successfully called puts without import
 
+++ exited (status 2) +++

I was able to get these results with the following assembly program:

.global main
.section .data
.section .bss

# MUST BE COMPILED:
# gcc full_import_test.s -ldl -Wl,-z,relro,-z,now -o full_import_test
libc_base:
    .align 8 
libdl_base:
    .align 8

.section .text

main:
  xor %rdi, %rdi
  mov $0x400130, %rbx
  mov (%rbx), %rcx
  add 0x10(%rbx), %rcx
  mov 0x20(%rcx, %rdi, 2), %rbx     # grab pointer to dlclose()

find_base:
  dec %rbx
  cmpl $0x464c457f, (%rbx)          # grab base of libdl
jne find_base

save_libdl:
  mov $libdl_base, %rdi
  mov %rbx, (%rdi)
  xor %rdi, %rdi

dlopen_libc:
  push $0x25764b07       # Function hash for dlopen()
  pop %rbp              

  mov $libc, %rdi        # libc.so.6

  push $0x01             
  pop %rsi               # RTLD_LAZY
  call invoke_function   # (%rax) = dlopen('libc.so.6',RTLD_LAZY);

save_libc:
  mov (%rax), %rcx
  mov $libc_base, %rax
  mov %rcx, (%rax)
 
jmp _world

 
################
#
#  Takes a function hash in %rbp and base pointer in %rbx
#  >Parses the dynamic section headers of the ELF64 image
#  >Uses ROP to invoke the function on the way back to the
#  -normal return location
#
#  Returns results of function to invoke.
#
invoke_function:
  push %rbp
  push %rbp
  push %rdx
  xor %rdx, %rdx
  push %rdi
  push %rax
  push %rbx      
  push %rsi
  push %rbp
  pop %rdi
 
  read_dynamic_section:
    push %rbx
    pop %rbp
 
   push $0x4c
   pop %rax
   add (%rbx, %rax, 4), %rbx
 
  check_dynamic_type:
    add $0x10, %rbx
    cmpb $0x5, (%rbx)
  jne check_dynamic_type
 
  string_table_found:
    mov 0x8(%rbx), %rax       # %rax is now location of dynamic string table
    mov 0x18(%rbx), %rbx      # %rbx is now a pointer to the symbol table.
 
  check_next_hash:
    add $0x18, %rbx
    push %rdx
    pop %rsi
    xorw (%rbx), %si
    add %rax, %rsi
 
    calc_hash:
      push %rax
      push %rdx
 
      initialize_regs:
        push %rdx
        pop %rax
        cld
 
        calc_hash_loop:
          lodsb
          rol $0xc, %edx
          add %eax, %edx
          test %al, %al
          jnz calc_hash_loop
 
      calc_done:
        push %rdx
        pop %rsi
 
      pop %rdx 
      pop %rax
 
  cmp %esi, %edi
 
  jne check_next_hash
 
  found_hash:
    add 0x8(%rbx,%rdx,4), %rbp
    mov %rbp, 0x30(%rsp)
    pop %rsi
    pop %rbx
    pop %rax
    pop %rdi
    pop %rdx
    pop %rbp
ret

# push hashes_array_index
# call fast_invoke
fast_invoke:
  push %rbp
  push %rbx
  push %rcx

  mov 0x20(%rsp), %ecx

  mov $libc_base, %rax
  mov (%rax), %rbx

  mov $hashes, %rax
  mov (%rax, %rcx, 4), %ebp

  # Registers required for link to work:
  # rbp - function hash
  # rbx - base pointer to lib
  call invoke_function

  mov 0x18(%rsp), %rcx # grab retptr
  mov %rcx, 0x20(%rsp) # kill the function argument
  pop %rcx
  pop %rbx
  pop %rbp
  add $0x8, %rsp
  ret

# freed developer registers: 
# rax rbp rbx rcx r11 r12 r13 r14 r15
#
# a libc call:
# function(%rdi,  %rsi,  %rdx,  %r10,  %r8,  %r9)
_world:
  mov $hiddenmsg, %rdi  # arg1
  push $0x1             # function array index in hashes label for puts()
  call fast_invoke      # puts("Successfully called puts without import")


  push $0x02            #
  pop %rdi              # arg1

  push $0x00            # array index in hashes label for exit()
  call fast_invoke      # exit(2);

  ret                   # Exit normally from libc, exit(0)
  #  after execution, echo $? shows 2 and not 0 ;)

force_import:
    call dlclose

libc: 
    .asciz "libc.so.6"

hashes:
    .long 0x696c4780, 0x74773750

hiddenmsg:
    .asciz "Successfully called puts without import"

And its import table does not contain dlopen(), puts(), or exit():

user@host $ objdump -R full_import_test

full_import_test:     file format elf64-x86-64

DYNAMIC RELOCATION RECORDS
OFFSET           TYPE              VALUE 
0000000000600ff8 R_X86_64_GLOB_DAT  __gmon_start__
0000000000600fe8 R_X86_64_JUMP_SLOT  __libc_start_main
0000000000600ff0 R_X86_64_JUMP_SLOT  dlclose

In this example, we call dlopen() on libc, then use the shellcodecs implementation of dlsym() to call functions. The tricky bit was getting dlopen() to work without showing up in the ltrace call. For this, I ended up putting a "call dlclose" at the end of the application in the force_import label (though it is never actually called or used). By compiling with full relro, I was able to use the pointer to dlclose from the GOT as a way to pivot back to the base of libdl, then re-parse its export table to traverse back to dlopen(). As a result, none of the shared objects opened by dlopen are noticed by ltrace or ftrace. Depending on your runtime environment and your compiler, the offset may be subject to change. The following line is responsible for extracting the dlclose pointer:

  mov 0x20(%rcx, %rdi, 2), %rbx     # grab pointer to dlclose()

If for some reason this code isn't working on your system, you can probably achieve your desired result by modifying the offset from 0x20 to either 0x18 or 0x28. This is a static offset assigned during compile time. We could also iterate over the string table to determine if we were grabbing the right pointer (e.g. make sure we are getting the pointer to dlclose), but that was not the purpose of these tests. So when it comes to binaries like this, strace (for now) is the only non-interactive option for tracing available, and it won't show you some of those shared-object calls that might be vital to your research.


No comments:

Post a Comment