You can find the git repo with the finished submission here: BootNoodle
A few days ago (as of writing), @netspooky announced the first (hopefully annual!) Binary Golf Grand Prix on Twitter. The objective was to create a binary of any sort that is the same forwards as it is byte-reversed, but with an emphasis on creating as small a binary as possible, hence golfing.
This was one of those challenges where I thought that I had no chance of even creating a qualifying submission, so it’s better to just wait for the results and admire the work of everyone else. However, I did start thinking about how it would even be possible to create such a binary (netspooky already pointed out that COM files and like don’t count for fairly obvious reasons).
I decided that if I was going to attempt something like this, I’d have to first settle on a file format. In the end, I think I took the easy option and chose to create a palindromic x86 bootloader. The main reasons for this were that bootloaders are essentially format-less - the only requirement for a valid bootloader is bytes
0xaa respectively - the rest of it is just raw x86 instructions. That brings us to the second reason: technically (as far as I am aware) the absolute minimum size of an x86 bootloader is 512 bytes. This felt like a bit of a double-edged sword - just enough space to do something interesting, but still fairly limited - especially given that it has to read the same backwards!
I knew that the first thing I had to get right was actually generating a palindromic file - whether or not anything actually executed. The bootloader itself was going to be written in NASM, so I could then just use
dd to snip off the first
256 bytes, reverse it with a bit of Perl from stackoverflow and
cat the two halves together. I stuck all this into a shell script called
build.sh and started to get to work.
mkdir bin 2>/dev/null nasm -f bin -o bin/bootnoodle.bin src/bootnoodle.asm dd if=bin/bootnoodle.bin of=bin/tmp.bin bs=1 count=256 rm bin/bootnoodle.bin perl -0777pe '$_=reverse $_' bin/tmp.bin > bin/tmp2.bin cat bin/tmp.bin bin/tmp2.bin > bin/bootnoodle.bin rm bin/tmp*
When this completes, then we can simply use QEMU to run the bootloader with
Creating an x86 bootloader is actually a surprising easy task. After fighting with a few different ideas for what to do, I settled on printing a nice bit of “BGGP” ascii art and a link to this blog. You might be wondering how on earth you can fit a printing function into just 256 bytes - it turns out that a huge amount of functionality, from printing characters to graphics primitives, are built into the BIOS. Which of these built-in functions we call is controlled by placing a byte into the
AH register and invoking a particular interrupt.
For example, we can call the “Teletype Output” routine (“printing a character” to you and me) by placing
AH, the ascii character you want to print into
AL and using interrupt
0x10. There are couple of extra options to this function (like page number and foreground colour) that we put in to the
BX register. In general, this is the workflow of a bootloader; load registers, interrupt, load registers, interrupt…
The infinitely useful x86 Interrupt Table is part of what makes x86 bootloader programming so easy!
The rough flow of execution of my bootloader is as follows:
- Clear the screen with the scroll up window routine
- Set the cursor to the position we want to start printing from with the set cursor position routine
- Load the memory address of our null-terminated string into the
- Call the string printing routine
The reason we have to clear the screen is that otherwise we’re left with fragments of information about the BIOS cluttering the screen up (in QEMU’s case, you get the SeaBIOS copyright string stuck at the top of the screen). Also, one of the extra options we get with the scroll up window is that we can change the background/foreground colour by setting
BH - in my case
0x03 which keeps the background black but makes the foreground cyan.
The only thing left as far as the actual programming goes is the
printString function. The BIOS only provides us with a character printing function (teletype out), so we have to handle the looping logic and checking for a NULL byte ourselves. The way we do this is illustrated below (sadly Hugo doesn’t seem to have proper NASM highlighting build in yet):
printString: pusha .loop: lodsb test al, al jz .end call printChar call delay jmp .loop .end: popa ret
First, we push all the registers onto the stack with
pusha. Entering the
.loop, we use
lodsb which loads a single byte from the
SI register into
AL, and incremements
SI. We check for a NULL byte with
test al, al, and if found, jump to
.end where we restore the saved registers with
return. If we don’t have a NULL byte in
AL, then we call the
delay routines. These are less interesting and very similar to the
setCursor routines: set some registers, interrupt.
The last thing worth mentioning is why we have the call to
delay, which uses the wait BIOS function. The reason is actually very simple: introducing a 20ms delay between printing each character results in a poor man’s animation effect!
There is still one thing that we’re forgetting! Earlier, I mentioned that a bootloader, while being exactly 512 bytes long, must finish with bytes
0x55 0xaa. Because we’re creating a palindrome, this means that our binary must start with
0xaa 0x55. Execution starts at offset
0, so we cannot avoid these two bytes being executed as instructions. They disassemble to:
aa ; stosb es:di 55 ; push bp
stosb instruction is similar to
lodsb; store whatever is in
DI (ignoring segment registers as they aren’t really relevant to this discussion). We don’t care about this because, (1) we aren’t using
DI and (2) we’re about to load
AL with the address of our string, so whatever happens to be loaded in
AL beforehand is irrelevant to us (it’s probably just
0x0, but might vary from BIOS to BIOS).
push bp also doesn’t matter to us. In theory, we should clean up the stack later by
bp, but seeing as we’re halting into infinite loop with the
jmp $ instruction after printing our string, it really doesn’t matter either.
So, thankfully we don’t actually have to worry about these two extra instructions. Merely starting the source file with
db 0xaa, 0x55 before going straight into the
_start entrypoint is enough to get us out of trouble.
So far, we’ve only used up
0xf5 bytes of the
0x1ff available to us. This means that when we run
build.sh, we end up with a 512-byte binary that’s reflected about the 256-byte boundary - with a patch of 20 NULL bytes positioned neatly in the middle. While we technically have a palindrome, netspooky is way ahead of us - as can be seen by the stipulation in the rules on the contest page:
An easy solution would be to just have the binary end, and append the binary backwards at the end of the original file. Because of this, in order to qualify for entry, your binary must at a minimum execute > 50% of the bytes in your binary, and must execute past the halfway mark in your binary as well.
So far, we just about meet the 50%+ byte execution mark thanks to the
0xaa 0x55 bytes at the very beginning. Unfortunately, we don’t yet execute past the halfway mark though, so we’ve got to do some thinking.
We’ve still got to do something a little more interesting than just producing a bootloader in under 256 bytes and flipping it back on itself. There’s not a huge amount we can do about the data part of the binary (which makes up about 63% of all the bytes) unless the text itself is symmetric (which it isn’t) - and besides, the rule above specifically mentions that execution has to pass the 50% mark. That leaves us to look at what can be done with the code.
My idea was to purposefully reverse portions of the code in the “upper half”, so that they are re-reversed in the “lower half”. This means that I also need to fix the
call offsets manually because NASM won’t be able to calculate them for me.
There’s probably an easier way to disassemble a bootloader than with Ghidra, but Ghidra worked pretty well for me. It also makes it very easy to compare the NASM source with the disassembled bytes. Because my routines are all very short, I just wrote in the bytes next to the functions that I wanted to reverse (chosen alternately so the execution jumps around as much as possible - and lives up to it’s noodly name).
clearScreen looks like this:
clearScreen: pusha ; 60 mov ah, 0x06 ; b4 06 xor al, al ; 30 c0 mov bh, 0x03 ; b7 03 xor cx, cx ; 31 c9 mov dx, 0x184f ; ba 4f 18 int 0x10 ; cd 10 popa ; 61 ret ; c3
We could have worked this out without Ghidra and just used a hexeditor, but hey, Ghidra is faster and takes out the guesswork. We can replace this with the raw bytes, but in reverse order:
clearScreen db 0xc3, 0x61, 0x10, 0xcd, 0x18, 0x4f, 0xba, 0xc9, 0x31, 0x03, 0xb7, 0xc0, 0x30, 0x06, 0xb4, 0x60
But we also have to take a look at the
_start entrypoint where we call
clearScreen by name. This clearly will no longer work once we comment out
clearScreen in favour of the reversed bytes above - you can try running
build.sh but NASM will dump out with a load of errors.
The solution here is that we need to replace
call clearScreen with a raw “short call”. As explained here, a short (or “near”) call is a call to a memory address relative to the next instruction, where a
ret is expected to be encountered eventually. This means that we can replace the line
call clearScreen with a simple
db 0xe8, 0x00, 0x00. This won’t actually work yet, but it will let us assemble the binary.
build.sh, we get a binary in
bin/ that we can disassemble with Ghidra again. Even after picking “x86 Real Mode” from the list of languages in Ghidra, we’re left with not very much. However, simply clicking on the first byte
0xaa and pressing “D” gives us the beginning of the disassembly.
After the two bogus instructions caused by the reversed
0x55 0xaa signature, we immediately get two
calls. These are to
setCursor which appear at the top of the source file! In particular, note that the first
call is to
0x00 0x00 - this is what we need to change.
In order to know which offset to set this to, we need the address of the instruction after this
call, which is the
0x5, and the address of the re-reversed
clearScreen routine. Scrolling down the disassembly, once we cross the half-way point, Ghidra doesn’t know what it’s looking at. Keep going, and eventually, towards the end, you’ll find another bit of disassembled code. Checking it carefully, you’ll see that it matches perfectly with the disassembly of
clearScreen above! The address of this routine is
0x1db, which is the relative offset that we need to set our hand-made
call instruction to!
Going back to the source file, we change
db 0xe8, 0x00, 0x00 to
db 0xe8, 0xdb, 0x01 (remember x86 is little-endian!). Re
build.shing gives us a working bootloader again!
I repeated this trick for the
printChar routine using the exact same steps as for
clearScreen: reverse the bytes by hand, replace any
e8 00 00, fire up Ghidra and find calculate the correct relative memory address, and fix the
call by hand again.
For fun I also decided to reverse the data in the binary too. This meant that I had to fix line in
_start that loads the address of the data into
SI. This was done by just reversing the data first and rebuilding (it builds fine because the code is unaffected, running it will just print the string backwards). Then, using a hexeditor, I looked for the start of the re-reversed string, and found it at
The instruction for
0xbe followed by a memory location or register, as described here. However, there is one caveat and it involves the very first line of the source file. Here, I’ve put
org 0x7c00, which tells NASM that we are expecting our bootloader to be loaded to memory address
0x7c00 before being executed.
This is beyond our control and it is how the BIOS manages bootloaders. The reason for this seemingly arcane choice of address is that 512 after
0x8000 which is where the kernel is normally loaded. The usual purpose of a bootloader is to simply load the kernel from a hard disk into memory address
0x8000 and then
jmp to it. Seeing as a bootloader has to be 512 bytes in size, it makes sense to always load it in the memory region immediately prior to where the kernel will be copied to. Why the kernel is loaded at
0x8000, I’m not sure. I imagine it has something to do with the fact that
0x8000 in binary only has the 16th bit set.
For a nicely commented example of this and how it works, check out my ThugBoot project. If you’ve been following this post, then you shouldn’t have any issue reading the NASM source there.
Anyway, all this means for us is that we have to add
0x7c00 to the address within the file of the re-reversed string we want to print, so end up with a final address of
0x7d0a, so our manual
mov instruction becomes
db 0xbe, 0x0a, 0x7d.
And with that, the palindromic bootloader is done!