Due to the Von-Neumann architecture 2 (basically what all our processors use), code and data are indistinguishable. A pretty great result of this is that you can technically intersperse code with data, and vice versa. What does that mean? It means that what would normally be an input to your program, can actually be the program itself.
The concept of injecting code isn’t new. But the way we’ll do it here is actually quite interesting (not very unique though). I found seeing how this worked at an assembly level, helped me understand how things like JIT compilation and meta-programming would be implemented under the hood. Assuming you can write code, which actually modifies the code it will execute next (or even later), it doesn’t seem like a far stretch to add a few conditions to generate new optimized code at runtime (essentially JIT compilation at a very high level).
Contents:
Quickstart on vagrant setup for this:
> vagrant init ubuntu/bionic64
> vagrant up
> vagrant ssh
Couple of linux tools that we’ll be using (most come with major distros anyway):
So for this next part, what we’ll do is write a bit of assembly to read and print out a file. Pretty simple right?
> sudo vi /file1
> chmod 777 /file1
Even if you’re not too comfortable with assembly, this should be decently understandable. All we’ll be doing is moving some values to appropriate registers and invoke syscalls. I’m not too great with assembly, but here’s a shot at it:
.data
_somefile:
.string "/file1"
.text
.global _start
_start:
.intel_syntax noprefix
push 2
pop rax
push _somefile
mov rdi, rsp
mov rsi, 0
mov rdx, 0
call _sysc
sendfile:
mov rdi, 1
mov rsi, rax
mov rdx, 0
mov r10, 100
mov rax, 40
call _sysc
closefd:
mov rdi, rsi
mov rax, 3
call _sysc
exit:
mov rax, 60
call _sysc
_sysc:
syscall
ret
All we did here was open up file /file1
, use the sendfile syscall to send 100 bytes of content to stdout (fd 1), and then some cleanup. We’ll see in a bit why I didn’t directly invoke syscall and instead put it as part of a function call. Let’s save the above in a file called normal.s
.
A great resource on syscall usage is 1.
Let’s now generate an ELF (executable linkable format) file from this, so that we can actually run it.
> gcc -static -nostdlib normal.s -o normal-elf
> ./normal-elf
hello this is file1
The ‘code’ part of an ELF file is in the text section. This can be extracted by:
> objcopy --dump-section .text=normal.raw normal-elf
Running cat
on the file will yield some gibberish. However, if we do a hexdump of it, we can start looking at essentially what the computer sees of this file.
> hd normal.raw
00000000 6a 02 58 ff 34 25 36 01 40 00 48 89 e7 48 c7 c6 |j.X.4%6.@.H..H..|
00000010 00 00 00 00 48 c7 c2 00 00 00 00 e8 3f 00 00 00 |....H.......?...|
00000020 48 c7 c7 01 00 00 00 48 89 c6 48 c7 c2 00 00 00 |H......H..H.....|
00000030 00 49 c7 c2 64 00 00 00 48 c7 c0 28 00 00 00 e8 |.I..d...H..(....|
00000040 1b 00 00 00 48 89 f7 48 c7 c0 03 00 00 00 e8 0c |....H..H........|
00000050 00 00 00 48 c7 c0 3c 00 00 00 e8 00 00 00 00 0f |...H..<.........|
00000060 05 c3 |..|
00000062
The thing to note here is that all the instructions we wrote in assembly, have been converted into bytes. Let’s use this to reverse engineer a bit (cause who has time to look through documentation right?)
What if we want to find out which bytes correspond to the syscall instruction? An easy way to figure it out would be to just remove that instruction, and then diff the hexdumps (this is why we kept syscall as part of a function call :) ).
Simply commenting out syscall, and following the previous steps (gcc
, objcopy
, hd
).. we get:
> gcc -static -nostdlib nosyscall.s -o nosyscall-elf
> objcopy --dump-section .text=nosyscall.raw nosyscall-elf
> hd nosyscall.raw
00000000 6a 02 58 ff 34 25 34 01 40 00 48 89 e7 48 c7 c6 |j.X.4%4.@.H..H..|
00000010 00 00 00 00 48 c7 c2 00 00 00 00 e8 3f 00 00 00 |....H.......?...|
00000020 48 c7 c7 01 00 00 00 48 89 c6 48 c7 c2 00 00 00 |H......H..H.....|
00000030 00 49 c7 c2 64 00 00 00 48 c7 c0 28 00 00 00 e8 |.I..d...H..(....|
00000040 1b 00 00 00 48 89 f7 48 c7 c0 03 00 00 00 e8 0c |....H..H........|
00000050 00 00 00 48 c7 c0 3c 00 00 00 e8 00 00 00 00 c3 |...H..<.........|
00000060
A quick diff shows us that the bytes corresponding to the syscall
instruction are: 0f 05
. But so what? How is this useful to us?
Now let’s go back to our code, and replace the syscall instruction with these bytes The part that you’d change is:
// before
// _sysc:
// syscall
// ret
_sysc:
.byte 0x0f, 0x05
ret
Running gcc
you’ll see that this generates an ELF file identical to that which uses the syscall instruction itself (can do hd
as well to confirm). This is expected though right, after all instructions are just a small semantic abstraction.
We can go on figuring out the instruction-byte mapping by doing this a couple times (or, you know, go read documentation), but more than a single instance isn’t really required for our purposes now.
Once in bytes form, code is indistinguishable from data. Which also means that we should be able to modify our code right? The instruction to be executed is stored somewhere, and if our code can modify that, then we have self-modifying code.
Normally, there are restrictions on making the text
portion of the ELF file writable. These can be bypassed by compiling using a few additional flags (the flags are actually passed to the linker):
> gcc -static -nostdlib -Wl,-N modifying.s -o modifying-elf
What’s left then? Nothing.
Say that instead of writing .byte 0x0f, 0x05
, we wrote .byte 0x00. 0x05
. Try it.
Trying to run it, you’ll get a segmentation fault. Clearly this code doesn’t work right? Now let’s add some code, to modify this line, while the program runs. Somewhere at the top, add: mov byte ptr [rip+_sysc], 0x0f
.
For clarity our file now looks like this:
.data
_somefile:
.string "/file1"
.text
.global _start
_start:
.intel_syntax noprefix
push 2
pop rax
// Here's the runtime modification
mov byte ptr [rip+_sysc], 0x0f
push _somefile
mov rdi, rsp
mov rsi, 0
mov rdx, 0
call _sysc
sendfile:
mov rdi, 1
mov rsi, rax
mov rdx, 0
mov r10, 100
mov rax, 40
call _sysc
closefd:
mov rdi, rsi
mov rax, 3
call _sysc
exit:
mov rax, 60
call _sysc
_sysc:
.byte 0x00, 0x05
ret
Compile (using the aformentioned flags), and run it. No more seg fault :) In fact, go a step further and again analyze the hexdump of the file. You’ll see your bytes: 0x00 and 0x05. This implies they were definitely part of your program…and yet they never ran because they were modified.
What we’ve just seen here is the basis for larger concepts like metaprogramming and JIT compilation. At runtime, you can convert what is ‘data’ into ‘code’. Implementing some naiive JIT optimizations doesn’t seem too difficult either then. All you would need to do is add a few lines to check for some continuation (ex. check if a loop was hit 100 times), after which you could overwrite the main branch by moving in new assembly.