You can find the git repo with the finished submission here: BootNoodle

A few days ago (as of writing), @netspooky announced the first (hopefully annual!) Binary Golf Grand Prix on Twitter. The objective was to create a binary of any sort that is the same forwards as it is byte-reversed, but with an emphasis on creating as small a binary as possible, hence golfing.

This was one of those challenges where I thought that I had no chance of even creating a qualifying submission, so it’s better to just wait for the results and admire the work of everyone else. However, I did start thinking about how it would even be possible to create such a binary (netspooky already pointed out that COM files and like don’t count for fairly obvious reasons).

I decided that if I was going to attempt something like this, I’d have to first settle on a file format. In the end, I think I took the easy option and chose to create a palindromic x86 bootloader. The main reasons for this were that bootloaders are essentially format-less - the only requirement for a valid bootloader is bytes 511 and 512 are 0x55 and 0xaa respectively - the rest of it is just raw x86 instructions. That brings us to the second reason: technically (as far as I am aware) the absolute minimum size of an x86 bootloader is 512 bytes. This felt like a bit of a double-edged sword - just enough space to do something interesting, but still fairly limited - especially given that it has to read the same backwards!

Workflow

I knew that the first thing I had to get right was actually generating a palindromic file - whether or not anything actually executed. The bootloader itself was going to be written in NASM, so I could then just use dd to snip off the first 256 bytes, reverse it with a bit of Perl from stackoverflow and cat the two halves together. I stuck all this into a shell script called build.sh and started to get to work.

mkdir bin 2>/dev/null

nasm -f bin -o bin/bootnoodle.bin src/bootnoodle.asm
dd if=bin/bootnoodle.bin of=bin/tmp.bin bs=1 count=256
rm bin/bootnoodle.bin
perl -0777pe '$_=reverse $_'  bin/tmp.bin > bin/tmp2.bin
cat bin/tmp.bin bin/tmp2.bin > bin/bootnoodle.bin
rm bin/tmp*

When this completes, then we can simply use QEMU to run the bootloader with qemu-system-x86_64 bin/bootnoodle.bin.

x86 Bootloaders

Creating an x86 bootloader is actually a surprising easy task. After fighting with a few different ideas for what to do, I settled on printing a nice bit of “BGGP” ascii art and a link to this blog. You might be wondering how on earth you can fit a printing function into just 256 bytes - it turns out that a huge amount of functionality, from printing characters to graphics primitives, are built into the BIOS. Which of these built-in functions we call is controlled by placing a byte into the AH register and invoking a particular interrupt.

For example, we can call the “Teletype Output” routine (“printing a character” to you and me) by placing 0x0e into AH, the ascii character you want to print into AL and using interrupt 0x10. There are couple of extra options to this function (like page number and foreground colour) that we put in to the BX register. In general, this is the workflow of a bootloader; load registers, interrupt, load registers, interrupt…

The infinitely useful x86 Interrupt Table is part of what makes x86 bootloader programming so easy!

The rough flow of execution of my bootloader is as follows:

  • Clear the screen with the scroll up window routine
  • Set the cursor to the position we want to start printing from with the set cursor position routine
  • Load the memory address of our null-terminated string into the SI register
  • Call the string printing routine
  • Halt

The reason we have to clear the screen is that otherwise we’re left with fragments of information about the BIOS cluttering the screen up (in QEMU’s case, you get the SeaBIOS copyright string stuck at the top of the screen). Also, one of the extra options we get with the scroll up window is that we can change the background/foreground colour by setting BH - in my case 0x03 which keeps the background black but makes the foreground cyan.

The only thing left as far as the actual programming goes is the printString function. The BIOS only provides us with a character printing function (teletype out), so we have to handle the looping logic and checking for a NULL byte ourselves. The way we do this is illustrated below (sadly Hugo doesn’t seem to have proper NASM highlighting build in yet):

printString:
	pusha
	.loop:
		lodsb
		test al, al
		jz .end
		call printChar
		call delay
	jmp .loop
	.end:
		popa
		ret

First, we push all the registers onto the stack with pusha. Entering the .loop, we use lodsb which loads a single byte from the SI register into AL, and incremements SI. We check for a NULL byte with test al, al, and if found, jump to .end where we restore the saved registers with popa and return. If we don’t have a NULL byte in AL, then we call the printChar and delay routines. These are less interesting and very similar to the clearScreen and setCursor routines: set some registers, interrupt.

The last thing worth mentioning is why we have the call to delay, which uses the wait BIOS function. The reason is actually very simple: introducing a 20ms delay between printing each character results in a poor man’s animation effect!

There is still one thing that we’re forgetting! Earlier, I mentioned that a bootloader, while being exactly 512 bytes long, must finish with bytes 0x55 0xaa. Because we’re creating a palindrome, this means that our binary must start with 0xaa 0x55. Execution starts at offset 0, so we cannot avoid these two bytes being executed as instructions. They disassemble to:

aa ;	stosb es:di
55 ;	push bp

The stosb instruction is similar to lodsb; store whatever is in AL in DI (ignoring segment registers as they aren’t really relevant to this discussion). We don’t care about this because, (1) we aren’t using DI and (2) we’re about to load AL with the address of our string, so whatever happens to be loaded in AL beforehand is irrelevant to us (it’s probably just 0x0, but might vary from BIOS to BIOS).

Clearly, push bp also doesn’t matter to us. In theory, we should clean up the stack later by poping bp, but seeing as we’re halting into infinite loop with the jmp $ instruction after printing our string, it really doesn’t matter either.

So, thankfully we don’t actually have to worry about these two extra instructions. Merely starting the source file with db 0xaa, 0x55 before going straight into the _start entrypoint is enough to get us out of trouble.

Palindrome Time

So far, we’ve only used up 0xf5 bytes of the 0x1ff available to us. This means that when we run build.sh, we end up with a 512-byte binary that’s reflected about the 256-byte boundary - with a patch of 20 NULL bytes positioned neatly in the middle. While we technically have a palindrome, netspooky is way ahead of us - as can be seen by the stipulation in the rules on the contest page:

An easy solution would be to just have the binary end, and append the binary backwards at the end of the original file. Because of this, in order to qualify for entry, your binary must at a minimum execute > 50% of the bytes in your binary, and must execute past the halfway mark in your binary as well.

So far, we just about meet the 50%+ byte execution mark thanks to the 0xaa 0x55 bytes at the very beginning. Unfortunately, we don’t yet execute past the halfway mark though, so we’ve got to do some thinking.

We’ve still got to do something a little more interesting than just producing a bootloader in under 256 bytes and flipping it back on itself. There’s not a huge amount we can do about the data part of the binary (which makes up about 63% of all the bytes) unless the text itself is symmetric (which it isn’t) - and besides, the rule above specifically mentions that execution has to pass the 50% mark. That leaves us to look at what can be done with the code.

My idea was to purposefully reverse portions of the code in the “upper half”, so that they are re-reversed in the “lower half”. This means that I also need to fix the call offsets manually because NASM won’t be able to calculate them for me.

There’s probably an easier way to disassemble a bootloader than with Ghidra, but Ghidra worked pretty well for me. It also makes it very easy to compare the NASM source with the disassembled bytes. Because my routines are all very short, I just wrote in the bytes next to the functions that I wanted to reverse (chosen alternately so the execution jumps around as much as possible - and lives up to it’s noodly name). For example, clearScreen looks like this:

clearScreen:
	pusha                  ;	60
	mov ah, 0x06           ;	b4 06
	xor al, al             ;	30 c0
	mov bh, 0x03           ;	b7 03
	xor cx, cx             ;	31 c9
	mov dx, 0x184f         ;	ba 4f 18
	int 0x10               ;	cd 10
	popa                   ;	61
	ret                    ;	c3

We could have worked this out without Ghidra and just used a hexeditor, but hey, Ghidra is faster and takes out the guesswork. We can replace this with the raw bytes, but in reverse order:

clearScreen db 0xc3, 0x61, 0x10, 0xcd, 0x18, 0x4f, 0xba, 0xc9, 0x31, 0x03, 0xb7, 0xc0, 0x30, 0x06, 0xb4, 0x60

But we also have to take a look at the _start entrypoint where we call clearScreen by name. This clearly will no longer work once we comment out clearScreen in favour of the reversed bytes above - you can try running build.sh but NASM will dump out with a load of errors.

The solution here is that we need to replace call clearScreen with a raw “short call”. As explained here, a short (or “near”) call is a call to a memory address relative to the next instruction, where a ret is expected to be encountered eventually. This means that we can replace the line call clearScreen with a simple db 0xe8, 0x00, 0x00. This won’t actually work yet, but it will let us assemble the binary.

After running build.sh, we get a binary in bin/ that we can disassemble with Ghidra again. Even after picking “x86 Real Mode” from the list of languages in Ghidra, we’re left with not very much. However, simply clicking on the first byte 0xaa and pressing “D” gives us the beginning of the disassembly.

After the two bogus instructions caused by the reversed 0x55 0xaa signature, we immediately get two calls. These are to clearScreen and setCursor which appear at the top of the source file! In particular, note that the first call is to 0x00 0x00 - this is what we need to change.

In order to know which offset to set this to, we need the address of the instruction after this call, which is the call to setCursor at 0x5, and the address of the re-reversed clearScreen routine. Scrolling down the disassembly, once we cross the half-way point, Ghidra doesn’t know what it’s looking at. Keep going, and eventually, towards the end, you’ll find another bit of disassembled code. Checking it carefully, you’ll see that it matches perfectly with the disassembly of clearScreen above! The address of this routine is 0x1e0. Subtracting 0x5 from 0x1e0 gives 0x1db, which is the relative offset that we need to set our hand-made call instruction to!

Going back to the source file, we change db 0xe8, 0x00, 0x00 to db 0xe8, 0xdb, 0x01 (remember x86 is little-endian!). Rebuild.shing gives us a working bootloader again!

I repeated this trick for the printChar routine using the exact same steps as for clearScreen: reverse the bytes by hand, replace any calls to printChar with e8 00 00, fire up Ghidra and find calculate the correct relative memory address, and fix the call by hand again.

For fun I also decided to reverse the data in the binary too. This meant that I had to fix line in _start that loads the address of the data into SI. This was done by just reversing the data first and rebuilding (it builds fine because the code is unaffected, running it will just print the string backwards). Then, using a hexeditor, I looked for the start of the re-reversed string, and found it at 0x10a.

The instruction for moving into SI is 0xbe followed by a memory location or register, as described here. However, there is one caveat and it involves the very first line of the source file. Here, I’ve put org 0x7c00, which tells NASM that we are expecting our bootloader to be loaded to memory address 0x7c00 before being executed.

This is beyond our control and it is how the BIOS manages bootloaders. The reason for this seemingly arcane choice of address is that 512 after 0x7c00 is 0x8000 which is where the kernel is normally loaded. The usual purpose of a bootloader is to simply load the kernel from a hard disk into memory address 0x8000 and then jmp to it. Seeing as a bootloader has to be 512 bytes in size, it makes sense to always load it in the memory region immediately prior to where the kernel will be copied to. Why the kernel is loaded at 0x8000, I’m not sure. I imagine it has something to do with the fact that 0x8000 in binary only has the 16th bit set.

For a nicely commented example of this and how it works, check out my ThugBoot project. If you’ve been following this post, then you shouldn’t have any issue reading the NASM source there.

Anyway, all this means for us is that we have to add 0x7c00 to the address within the file of the re-reversed string we want to print, so end up with a final address of 0x7d0a, so our manual mov instruction becomes db 0xbe, 0x0a, 0x7d.

And with that, the palindromic bootloader is done!

I’d like to thank the good folks at ThugCrowd and in the ThugCrowd matrix server for being so encouraging and helpful. It was there that I first got an interest in exploring x86 bootloaders and lead to the ThugBoot project, which gave me the starting point for this submission!