Guess who’s back?

The Binary Golf Grand Prix is back for it’s fourth annual outing! Sadly, after getting nerdsniped last year by the one and only @netspooky, I ended up not submitting an entry for last year’s competition (you can see what I got up to instead here).

This year, the theme was self replication. To avoid being sent active virii, the Binary Golf Association required that the entry created only a single copy of itself, and did not re-execute itself.

Simply creating a copy of yourself isn’t by itself very hard, but doing so in very few bytes poses as interesting challenge. In the end, I came up with three different entry ideas. You might not be interested in all three, so here are some quick links:

All the code can be found on my GitHub here.

Idea 1: Commodore 64 PRG

My initial idea was for a C64 BASIC program that looked something like the following:

1 PRINT 4
2 SAVE "4",8

Clearly this would both print a 4 (as required by the rules), and save a copy of itself (titled 4) to diskette. This already seems pretty small, but is there any room for golfing once we get down to the bytes?

In order to verify the checksums of the original and copied file, I had to figure out how to get a copy of the “file” as the C64 sees it. To do that, I used the VICE emulator and created a C64 virtual disk image to attach to the running machine. I entered the above BASIC into the C64 and then ran SAVE "BGGP",8. After shutting the machine off and looking at the resulting d64 file, I was presented with another issue - it was 171KB in size.

Now, I don’t want to golf the disk image itself, just the program that’s on the disk. I’d read before about how floppy disks were laid out physically in terms of sectors and tracks, but now was the time to dive in as I need to be able to prove that the two files on the disk (the original BGGP and subsequent 4 that’s created) are identical.

Detour: D64 Image Files

The most common file format for representing 5.25" floppy disks seems to be D64 and is indeed what the VICE emulator defaults to. Looking at the VICE documentation we can start to get an idea of how they’re laid out.

The first thing to understand is that information is encoded on floppy disks in a circular pattern. Why does this matter? Well, if each bit takes up the same amount of physical space (roughly speaking), then you can clearly fit more bits into a ring that’s closer to the edge of the floppy than one that’s closer to the center.

You may already be aware that data on a floppy is broken up into 256-byte sectors (the term boot sector is still used today!). We group sectors into “tracks”, which are arranged in a spiral from the outer edge of the disk. In order to solve the spacing problem I mentioned above, we have to adjust the number of sectors per track. As the tracks get closer to the center of the disk, they are literally shorter, so we can fit fewer sectors on them.

Putting it all together, a standard 5.25" floppy disk has 683 sectors for a total of 174,848 bytes. The sectors are arranged into tracks in the following way:

Track Sectors Per Track Cumulative Sector Count
1 - 17 21 357
18 - 24 19 490
25 - 30 18 598
31 - 35 17 683

One awkward wrinkle we have to keep in mind when parsing a D64 file: track numbering starts at 1, whereas sector numbering starts at 0, i.e. the first sector on the disk is track 1, sector 0.

With the physical layout of sectors out the way, how do we go about finding what’s actually on the disk? Things like file names and sizes? Well, Commodore decided that track 18 is special - it’s the directory track.

The directory track begins with a sector dedicated to the Block Availability Map (BAM) with subsequent sectors being individual directory entries. An important point here is that we only have 18 sectors available for directory entries. A single entry is 32 bytes long, which means we get 8 entries per sector, leading us to a limit of 18*8 = 144 entries per disk. By today’s standards, that’s not much - but it was surely a different story back in 1982!

This is the format of the BAM. Note that track/sector pairs are often written as x/y, e.g. 18/1 refers to the track 18, sector 1.

struct bam {
  // Location of first directory sector.
  // This is usually ignored and first entry is is assumed to be 18/1
  // Offset $00
  uint8_t first_track;
  uint8_t first_sector;

  // DOS version type. Anything other than $00 or $41 implies write protection
  // Offset $02
  uint8_t dos_version;

  // Unused
  // Offset $03
  uint8_t unused1;

  // BAM entries for each track (see further down).
  // There are 35 tracks with each BAM entry is 4 bytes long, totalling 140 bytes.
  // Offset $04
  struct dir_entry_map entries[35];

  // Disk Name (padding byte is $a0)
  // Offset $90
  char disk_name[16];

  // Unused (set to $a0 on real disks)
  // Offset $a0
  uint8_t unused2[2];

  // Disk ID
  // Offset $a2
  uint16_t disk_id;

  // Unused (usually set to $a0 on real disks)
  // Offset $a4
  uint8_t unused3;

  // DOS Type (usually $32 $41 == "2A")
  // Offset $a5
  char dos_type[2];

  // Unused (usually set to $a0 on real disks)
  // Offset $a7
  uint8_t unused4[4];

  // Unused (usually set to $00 on real disks)
  // Offset $ab
  uint8_t unused5[85];
}

struct dir_entry_map {
  uint8_t free_sectors;

  // Free Sector Bitmasks
  uint8_t sector00: 1;
  uint8_t sector01: 1;
  uint8_t sector02: 1;
  uint8_t sector03: 1;
  uint8_t sector04: 1;
  uint8_t sector05: 1;
  uint8_t sector06: 1;
  uint8_t sector07: 1;

  uint8_t sector08: 1;
  uint8_t sector09: 1;
  uint8_t sector10: 1;
  uint8_t sector11: 1;
  uint8_t sector12: 1;
  uint8_t sector13: 1;
  uint8_t sector14: 1;
  uint8_t sector15: 1;

  uint8_t sector16: 1;
  uint8_t sector17: 1;
  uint8_t sector18: 1;  // Only for tracks <31
  uint8_t sector19: 1;  // Only for tracks <25
  uint8_t sector20: 1;  // Only for tracks <18
  uint8_t sector21: 1;  // Only for tracks <18
}

Oof, that’s quite a lot. Those bitmasks are pretty fiddly in Python too, but fortunately the VICE documentation gives an example BAM and BAM entry so we can make sure that everything gets parsed as we expect.

Let’s gather our thoughts. As mentioned, we have a disk with a bunch of tracks. Each track has some number of 256-byte sectors. Track 18 is special because it’s first sector is the block availability map (BAM) which (for the most part) contains a bitmap of the free sectors on each track.

The rest of the directory track (track 18) is composed of directory entries. This is where the actual files and their sizes are specified. The directory entry is fairly simple:

struct dir_entry {
  // Track and sector of next directory entry
  // Offset $00
  uint16_t next_dir_entry;

  // File Type
  // 0 => DEL   "Deleted File"
  // 1 => SEQ   "Sequential File"
  // 2 => PRG   "Program File"
  // 3 => USR   "User File"
  // 4 => REL   "Relative File"
  // Offset $02
  uint8_t file_type :4;
  uint8_t unused    :1;
  uint8_t xxx       :1;
  uint8_t locked    :1;
  uint8_t closed    :1;

  // Track and sector of first sector of the file data
  // Offset $03
  uint16_t first_data;

  // Filename (PETSCII), padded with $a0
  // Offset $05
  char filename[16];

  // Track and sector of first side-sector block
  // Only for REL files
  // Offset $15
  uint16_t first_side_sector;

  // File record length
  // Only for REL files
  // Offset $17
  uint8_t file_record_length;

  // Unused
  // Offset $18
  uint8_t unused[6];

  // File size in sectors
  // Offset $1e
  uint16_t file_sector_size;
}

Fortunately, I’m not interested in a fair amount of these fields (remember - I only want to parse this file enough to read out filenames and calculate sha256sums of the contents).

At this point, we know what to do:

  1. Break out the disk image into tracks, with each track broken up into sectors
  2. Parse track 18/sector 0 as the block availability map to determine which sectors are being used
  3. Parse the directory entires to know what files are on the disk and what the filenames are
  4. Resolve each file to their relevant sector(s)
  5. Compute sha256 hashes of each file

If we give it a go with the BGGP file currently on the disk, we see:

$ ./d64.py disk.d64
+----------------------------------------------------------------------------------------------------------+
|                                            Files on Disk " "                                             |
+----------+-----------+-------+--------+------------------------------------------------------------------+
| Filename | Sector(s) | Track | Sector |                            Sha256Sum                             |
+----------+-----------+-------+--------+------------------------------------------------------------------+
|   BGGP   |     1     |   17  |   0    | 419fa20ffd1f6baf878525252a8d255b8695ba2ebf2b76b8cfab03a53309c9bd |
+----------+-----------+-------+--------+------------------------------------------------------------------+

Back to BASIC

Now that we can parse the d64 image to see the file(s) inside, we can get back to golfing the PRG file - so, what’s the structure?

  • The first 2 bytes form a little endian pointer to the location that the PRG expects to be loaded at. This can be ignored by the C64 LOAD instruction if we set the (optional) second argument to 1. I made use of this option two years ago for BGGP 2, but seeing as we can’t avoid having these two bytes in the file, we might as well as set them to the default value of $8001.

  • The next two bytes are another little endian pointer, but this time to the second line of BASIC. The first line of BASIC immediately follows these two bytes.

  • Now, we have a line of BASIC.

    • Two bytes for a line number, e.g. 0a 00 for line 10.
    • A BASIC token identifying which instruction this is, e.g. PRINT, SAVE, etc
    • The arguments to the BASIC instruction in PETSCII (PETSCII has enough overlap with ASCII that we can usually pretend it’s just ASCII, as you’ll see)

Let’s see an example. The first line of BASIC in our PRG so far is 1 PRINT 4, this ends up being:

01 00 99 34 00
┃     ┃  ┃
┃     ┃  ┗━━━━━ "4\0": Null-terminated 4 in PETSCII
┃     ┗━━━━━━━━ BASIC Token $99 = PRINT
┗━━━━━━━━━━━━━━ Line Number $0001 = 1

Fairly simple, right? Likewise, the second (and final) line of BASIC is 2 SAVE "4",8 which is just:

02 00 94 22 34 22 2c 38 00
┃     ┃  ┃
┃     ┃  ┗━━━━━━━━━━━━━━━━━ '"4",8\0': Null-terminated "4",8 in PETSCII
┃     ┗━━━━━━━━━━━━━━━━━━━━ BASIC Token $94 = SAVE
┗━━━━━━━━━━━━━━━━━━━━━━━━━━ Line Number $0002 = 2

So, now our PRG looks like:

01 08 08 08 01 00 99 34 00 13 08 02 00 94 22 34 22 2c 38 00 00 00 00 00
┃     ┃     ┃              ┃     ┃                          ┃     ┃
A     B     C              D     E                          F     G
Label Description
A $0801 is the pointer to the start of BASIC RAM (the default load location for PRGs)
B $0808 is the pointer to the second line of BASIC
C Basic Line 1: 1 PRINT 4
D $0813 is the pointer to the third line of BASIC (which doesn’t exist!)
E Basic Line 2: 2 SAVE "4",8
F $0000 is the null pointer indicating that there are no more lines of BASIC
G We end with two null bytes to show this “third” line of BASIC is empty

Sadly, it seems that there really isn’t much to golf in the PRG format (though I’d love to be proven wrong!). The only thing I changed was removing a few unnecessary spaces from the BASIC arguments. However, it was a fun learning exercise and gave me an excuse to figure out how tracks and sectors really worked on floppy disks.

Once we run the PRG and re-examine the D64 image file, we see:

$ ./d64.py disk.d64
+----------------------------------------------------------------------------------------------------------+
|                                            Files on Disk " "                                             |
+----------+-----------+-------+--------+------------------------------------------------------------------+
| Filename | Sector(s) | Track | Sector |                            Sha256Sum                             |
+----------+-----------+-------+--------+------------------------------------------------------------------+
|   BGGP   |     1     |   17  |   0    | 419fa20ffd1f6baf878525252a8d255b8695ba2ebf2b76b8cfab03a53309c9bd |
|    4     |     1     |   17  |   1    | 419fa20ffd1f6baf878525252a8d255b8695ba2ebf2b76b8cfab03a53309c9bd |
+----------+-----------+-------+--------+------------------------------------------------------------------+

Below is the final BGGP file, totalling 20 bytes.

┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 01 08 08 08 01 00 99 34 ┊ 00 13 08 02 00 94 22 34 │•••••⋄×4┊⋄•••⋄×"4│
│00000010│ 22 2c 38 00             ┊                         │",8⋄    ┊        │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

Idea 2: Python ZIP

Did you know that Python can execute ZIP files directly? I certainly didn’t until a friend told me. This gave me an idea to try golfing a Python-executable ZIP file.

ZIP files are fairly simple beasts, apart from the oft-quoted fact that they’re “parsed backwards”. What this means is that a ZIP candidate is scanned from the end of the file backwards until the parser finds the magic of the “End Of Central Directory” structure - from there it can continue to parse the rest of the file via file offsets.

This “end of central directory” structure looks like this:

struct PKZipEndOfCentralDirectory {
  // 0x50 0x4b 0x05 0x06
  char magic[4];

  // Unused
  uint8_t unused1[6];

  // Number of file entries
  uint16_t file_entries;

  // Size of the Central Directory
  uint32_t central_directory_size;

  // File offset to the Central Directory
  uint32_t central_directory_offset;

  // Unused
  uint16_t unused2[2];
}

So, once we’ve got the end of the central directory, we can follow the central_directory_offset to the start of the central directory. This can be anywhere in the file and it details all the files that are in this ZIP.

Similarly, the central directory itself is fairly straightforward:

struct PKZipStartOfCentralDirectory {
  // 0x50 0x4b 0x01 0x02
  char magic[4];

  // Unused
  uint8_t unused1[6];

  // Compression Method (0 = NONE)
  uint16_t compression_method;

  // Unused
  uint8_t unused2[8];

  // Compressed File Size
  uint32_t compressed_size;

  // Uncompressed File Size
  uint32_t uncompressed_size;

  // Filename Size
  uint16_t filename_size;

  // Extra Field Size
  uint16_t extra_field_size;

  // File Comment Size
  uint16_t file_comment_size;

  // Unused
  uint8_t unused3[8];

  // File offset to the File Header
  uint32_t file_header_offset;

  // Filename
  char filename[filename_size];
}

There may be several file entries, but for a Python-ZIP, we only need one, but the filename must be __main__.py so that Python knows to begin execution here.

Lastly, you’ll see that there’s a file offset to something called the “File Header”. This is the last header we care about, and looks like this:

struct PKZipFileHeader {
  // 0x50 0x4b 0x03 0x04
  char magic[4];

  // Unused
  uint8_t unused[22];

  // Filename Size
  uint32_t filename_size;

  // Filename
  char filename[filename_size];

  // File Contents
  uint8_t file_contents[...];
}

As you can probably see, the size of file_contents will come from the compressed_size field of PKZipStartOfCentralDirectory. One handy trick here (and the only real golfing trick I could find!) is that you don’t have to re-define the filename in the file header. You can simply set filename_size to 0 in PKZipFileHeader and omit the filename field altogether, going straight to the file_contents.

An interesting point here as well is that the Python required to both print 4 and make a copy of ourself is below the threshold of compression gains for the methods supported by PKZIP. This means that we actually save bytes by NOT compressing the file contents.

So, what are we going to use for our file contents? Admittedly, I didn’t try very hard to make this super small as this file format doesn’t seem to offer many avenues for golfing (I’m sure another BGGP entry this year will come up with some smaller Python).

The string I settled on is simply:

import os;print('4');os.system('cp a 4')

which has the obvious downside that the file must be named a in order for this to work (importing sys just to get argv[0] seemed like too many bytes to justify).

Putting it all together, this is the resulting file:

┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 50 4b 03 04 58 58 58 58 ┊ 58 58 58 58 58 58 58 58 │PK••XXXX┊XXXXXXXX│
│00000010│ 58 58 58 58 58 58 58 58 ┊ 58 58 00 00 00 00 69 6d │XXXXXXXX┊XX⋄⋄⋄⋄im│
│00000020│ 70 6f 72 74 20 6f 73 3b ┊ 70 72 69 6e 74 28 27 34 │port os;┊print('4│
│00000030│ 27 29 3b 6f 73 2e 73 79 ┊ 73 74 65 6d 28 27 63 70 │');os.sy┊stem('cp│
│00000040│ 20 61 20 34 27 29 50 4b ┊ 01 02 58 58 58 58 58 58 │ a 4')PK┊••XXXXXX│
│00000050│ 00 00 58 58 58 58 58 58 ┊ 58 58 28 00 00 00 28 00 │⋄⋄XXXXXX┊XX(⋄⋄⋄(⋄│
│00000060│ 00 00 0b 00 00 00 00 00 ┊ 58 58 58 58 58 58 58 58 │⋄⋄•⋄⋄⋄⋄⋄┊XXXXXXXX│
│00000070│ 00 00 00 00 5f 5f 6d 61 ┊ 69 6e 5f 5f 2e 70 79 50 │⋄⋄⋄⋄__ma┊in__.pyP│
│00000080│ 4b 05 06 58 58 58 58 58 ┊ 58 01 00 39 00 00 00 46 │K••XXXXX┊X•⋄9⋄⋄⋄F│
│00000090│ 00 00 00 58 58          ┊                         │⋄⋄⋄XX   ┊        │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

Idea 3: RISC-V Elf

ELF golfing is not new. In fact, BGGP Chief Organizer @netspooky is an authority on the subject. However, I’d been spending some time with RISC-V and it got me wondering about how the ISA change from x86_64 could impact golf-ability.

To start off, I needed a convenient way to run RISC-V ELFs and Ubuntu conveniently offer a preinstalled image ready for QEMU. Once that was up and running, I used the same technique as described in tmp0ut 1:1 to create a minimum viable ELF. That article also describes which entries in the ELF and program headers are ignored by the kernel’s ELF loader.

To make editing easier, I ended up transcribing this ELF into a nasm file. @netspooky’s excellent series on ELF Binary Mangling is a great place to start if you’ve never done that before.

With the rough ELF outline done, I had to actually write some RISC-V assembly to meet the BGGP 4 entry requirements; namely to print or return 4 and create a copy of ourself.

Clearly, the way to achieve these goals in the least amount of bytes is with syscalls, which means we need to know how to call them on RISC-V. If you’ve ever done syscall programming in the past on another ISA, this will likely feel very familiar to you.

RISC-V’s registers can be referred to either by their “real” name (which runs x0 to x31) or by their ABI name (which is far more convenient). The ABI registers for function arguments are a0 through a7 with the return value being put in a0. All the a* registers are saved by the caller.

On Linux, the syscall number is put in register a7 and the arguments going into a0-a6. We jump into the kernel to perform the syscall using the ecall instruction.

With that, the order of syscalls is what we’d expect:

  1. fd = SYS_OPENAT( AT_FWCD, "4", O_WRONLY|O_CREAT, O_WRONLY)
  2. SYS_WRITE(fd, LOADADDR, BINSIZE)
  3. SYS_CLOSE(fd)
  4. SYS_EXIT(4)

Where LOADADDR is the load address of the ELF in virtual memory and BINSIZE is the size of the file.

In order to calculate the LOADADDR value, we can just subtract a precalculated value from pc. However, this requires that the entire ELF is loaded into memory. The way to do this is to set the p_offset field of the program header to 0x0 rather than to the offset of the start of the text segment. This is also reflected in the value of the e_entry field of the ELF header being set to the file offset of the first instruction.

However, we don’t actually have to call SYS_CLOSE because the kernel will handle that for us once we call SYS_EXIT automatically, so let’s forget about closing the file.

Looking up the syscalls, we find that they are:

  • SYS_OPENAT = 56
  • SYS_WRITE = 64
  • SYS_EXIT = 93

Let’s start writing some RISC-V assembly but stay aware of possible savings as we go. To do that, we need to know a couple of things about RISC-V compressed instructions.

RISC-V Compressed Instructions

RISC-V is an ISA with many possible extensions. One of the most common (and in our case, most useful) is the C extension which stands for “Compression”. If you’re familar with (or even just heard of) ARM’s Thumb mode, this will be familiar to you. Ordinarily, RISC-V instructions are 32 bits wide, but in certain circumstances there are 16-bit instructions that achieve the same thing if your CPU has the C extension. The really cool thing about this extension is that it’s implicit from the instruction whether it’s 32-bit or 16-bit! This means that you don’t have to jump to an address with the LSB set or anything similar like you have to with Thumb mode on ARM. This is also really useful from a golfing standpoint because it means we can mix compressed and non-compressed instructions however we like!

Before we start writing our RISC-V assembly, we should know when the assembler will emit compressed instructions over non-compressed ones. To better understand this, we can use the specification. Of particular interest will be add immediate and load immediate instructions because that’s what our code will mostly be comprised of.

Jumping to page 10 of the spec, we find that the c.li instruction will load a sign-extended 6-bit immediate into a register. This means that we will only get a compressed load immediate if the immediate is <= 31. Reading on towards the end of page 10, we see that the same restriction applies to the c.addi instruction as well.

The other compressed instruction we will want to optimise for is the c.j compressed jump. On page 9 of the spec, we discover that 11-bits of the c.j instruction are used as another sign-extended integer which is added to pc. It also conveniently informs us that this corresponds to a range of ±2KiB, which is enough that we actually shouldn’t have to worry (however, we’ll see that we really do have to worry but for different reasons).

Back to Assembly

Armed with this newfound knowledge, we can start writing some RISC-V assembly. It’ll be useful to have the RISC-V Specification open as we go.

The first thing we have to do is set up the SYS_OPENAT syscall to open the file for writing. The manpage (man 2 openat) helpfully informs us that it has the prototype:

int openat(int dirfd, const char *pathname, int flags, mode_t mode);
  • The first argument (in a0) dirfd needs to be set to AT_FDCWD (defined in fcntl.h as -100) to indicate that we’re using the CWD and that pathname is a relative path
  • The second argument (in a1) pathname needs to be a pointer to the filename "4\0"
  • The third argument (in a2) flags must be set to O_CREAT | O_WRONLY (defined in fcntl.h as 64 and 1 respectively)
  • The fourth argument (in a3) mode just has to be O_WRONLY (1)

Most of these are easy to set, except for the pathname pointer in a1. It’s easy enough to put the string 4\0 in our binary somewhere, but how do we get a pointer to it? Fortunately, RISC-V provides an instruction to help us: auipc, “Add Upper Immediate to PC”. This instruction takes a register and a 20-bit immediate. It then shifts the 20-bit immediate left 12 bits and adds this resulting number to pc before storing it in the register provided. In our case, auipc a1, 0 will simply put pc in the register a1 (this is the only RISC-V way of obtaining the value in pc!). Once we know where execution is in our binary, we can add/subtract a relative offset to get a pointer to 4\0 in the register a1.

So, our SYS_OPENAT call is going to look something like this:

li a0, -100             ; Put AT_FDCWD into a0
auipc a1, 0             ; Put PC into a1
addiw a1, a1, +??       ; Add ?? to a1                      ; 16-bits if ?? <= 31
li a2, 65               ; Put O_CREAT | O_WRONLY into a2
li a3, 1                ; Put O_WRONLY into a3              ; 16-bits
li a7, 56               ; Put SYS_OPENAT into a7
ecall                   ; Do syscall

Fairly simple, right? Well, that’s the hard one out the way already. Before we continue, let’s think about what values are in the registers after the ecall instruction.

  • a0 will contain the return value of SYS_OPENAT, i.e. the file descriptor
  • a1 will still contain a pointer to the pathname
  • a2 will still contain 65 (O_CREAT | O_WRONLY)
  • a4 will still contain 1 (O_WRONLY)

The prototype for SYS_WRITE is:

ssize_t write(int fd, const void buf[.count], size_t count);

We don’t need to touch a0 at all - it already contains the file descriptor! The buf argument in a1 needs to be set to the start of the ELF in virtual memory. Luckily, we already have a pointer to somewhere in the ELF in a1 (specifically to the pathname), so we can just pre-calculate an offset to the start of the ELF header and subtract it from a1. Lastly, we need the total size of the binary in a2. This is a little tricky.

We want to make as much use of compressed instructions as possible, but the total size of the binary is going to be more than 31 bytes more than the 65 we already have in a2. Is there anything we can do avoid using a full-width instruction?

So, it turns out that the SYS_OPENAT syscall checks the value in a2 by ORing it against the various flags it knows about. This means that we can put a value other than 65 into a2 and, as long as it ORs true with both O_CREAT and O_WRONLY, the syscall will still work! We can use this to our advantage by putting a larger value into a2 using the full-width li we’re already stuck with, and then get away with using a compressed 16-bit wide instruction in preparation for SYS_WRITE!

Lastly, the jump from 56 (SYS_OPENAT) to 64 (SYS_WRITE) in the a7 register is already small enough to be a compressed instruction.

addi a1, a1, -??        ; Subtract ?? from a1 to get a pointer to the start of the ELF  ; 16-bits if ?? <= 31
addi a2, a2, ??         ; Add ?? to a2 to get the total file size                       ; 16-bits if ?? <= 31
addiw a7, a7, 8         ; Add 8 to a7 to get 64 (SYS_WRITE)                             ; 16-bits
ecall                   ; Do syscall

Okay, we’re gradually getting somewhere, but still leaving a lot of values to be filled in later on. The final thing we need to do is do the SYS_EXIT syscall. We have no choice but to use a full-width load to get 4 into a0, but then we can use a compressed addiw to add 29 to a7 in order to get 93 (SYS_EXIT).

li a0, 4                ; Put 4 into a0
addiw a7, a7, 29        ; Add 29 to a7 to get 93 (SYS_EXIT)     ; 16-bits
ecall                   ; Do syscall

Abusing ELF and Program Header Entries

If you’ve read my article in tmp0ut, Dead Bytes, you’ll be familiar with the entries of the ELF and program headers that are ignored by the Linux ELF loader. We can use these unchecked fields to store some of our instructions - as long as we leave at least 2 bytes free at the end of each run for a compressed c.j instruction!

The first decent run we have begins at offset 4, where we have a full 12 bytes to play with. This is room for the first three instructions, with two spare bytes for the jump.

The next unused chunk is just after e_phoff in the ELF header at offset 0x28 where we have just 8 bytes. Here we can put the next two instructions in our SYS_OPENAT block followed by another compressed jump.

Lastly is room for 6 bytes just after e_phnum at offset 0x3a which is just enough for the full-width li a7, 56 load instruction followed by the compressed jump. At this point, our final jump just continues execution after the end of the program header where the rest of the instructions are.

After putting everything in its rightful place, we have two tasks left:

  1. Insert the three c.j instructions
  2. Update all the ?? values now that we can pre-calculate all the offsets

Compressed Jumps

Up until this point, I had been using the GNU Assembler (gas) to assemble the RISC-V instructions. However, when it came to the compressed jumps, I found that gas simply refused to emit them! It always gave me full-width 32-bit jumps, but I knew from the spec that c.j was a valid instruction.

My solution was to assemble them manually using the awesome capstone engine. The following python will try every possible 16-bit integer, attempt to disassemble it as a compressed RISC-V instruction and print it if matches the c.j mnemonic. Is there a better way of doing this? Almost certainly, but this is simple and it works.

#!/usr/bin/env python3

from capstone import *

md = Cs(CS_ARCH_RISCV, CS_MODE_RISCV64 | CS_MODE_RISCVC)

for i in range(0x10000):
    disas = md.disasm( int.to_bytes(i,2,'little'), 0x1000 )
    for inst in disas:
        if inst.mnemonic == 'c.j':
            print(f'{hex(i)} => {inst.mnemonic} {inst.op_str}')

You will need to install capstone version 5.0.0-rc4 or later to use the CS_ARCH_RISCV machine. With pip it’s just pip install capstone==5.0.0-rc4.

Armed with all these possible instructions, simply greping the output for the precise jump offset required is enough to hardcode all the jumps in place.

Pre-calculating the remaining offsets

The first value we need to update is the offset to the pathname in the addiw a1, a1, ?? instruction. Before we can do this, we need to put the string "4\0" into memory. Very nearby is the e_version field, which also happens to not be checked. Setting this entry to 34 00 gives us our pathname string at offset 0x14. The auipc instruction is now at offset 0x8 which is 12 bytes before the string. Therefore we update the addiw instruction to addiw a1, a1, +12. As 12 <= 31, this remains a 16-bit instruction.

The next value we need to tweak is the flags argument we put into the a2 register. We want this value to be within 31 of the total file size. Well, our total size is now 142 which means we need a2 to be in the range 113-175. The other requirement on this value is that it ORs true with both 64 (O_CREAT) and 1 (O_WRONLY). One solution is simply 125, so the load becomes li a2, 125.

Lastly is the buffer length in the a2 register when we call SYS_WRITE. As we just set a2 to 125 above, we know that we need to add 17 to the total filesize of 142. Therefore the instruction becomes addi a2, a2, 17.

Altogether Now!

At long last we’re done! Here’s is the final ELF:

┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 7f 45 4c 46 13 05 c0 f9 ┊ 97 05 00 00 b1 25 29 a8 │•ELF••××┊ו⋄⋄×%)×│
│00000010│ 02 00 f3 00 34 00 58 58 ┊ 04 00 01 00 00 00 00 00 │•⋄×⋄4⋄XX┊•⋄•⋄⋄⋄⋄⋄│
│00000020│ 40 00 00 00 00 00 00 00 ┊ 13 06 d0 07 85 46 31 a0 │@⋄⋄⋄⋄⋄⋄⋄┊••×•×F1×│
│00000030│ 05 00 00 00 40 00 38 00 ┊ 01 00 93 08 80 03 2d a8 │•⋄⋄⋄@⋄8⋄┊•⋄וו-×│
│00000040│ 01 00 00 00 05 00 00 00 ┊ 00 00 00 00 00 00 00 00 │•⋄⋄⋄•⋄⋄⋄┊⋄⋄⋄⋄⋄⋄⋄⋄│
│00000050│ 00 00 01 00 00 00 00 00 ┊ 00 00 01 00 00 00 00 00 │⋄⋄•⋄⋄⋄⋄⋄┊⋄⋄•⋄⋄⋄⋄⋄│
│00000060│ 8a 00 00 00 00 00 00 00 ┊ 8a 00 00 00 00 00 00 00 │×⋄⋄⋄⋄⋄⋄⋄┊×⋄⋄⋄⋄⋄⋄⋄│
│00000070│ 00 10 00 00 00 00 00 00 ┊ 73 00 00 00 b1 15 45 06 │⋄•⋄⋄⋄⋄⋄⋄┊s⋄⋄⋄וE•│
│00000080│ a1 28 73 00 00 00 11 45 ┊ f5 28 73 00 00 00       │×(s⋄⋄⋄•E┊×(s⋄⋄⋄  │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

Let’s break it all down:

7f454c46                                    ; ELF Magic \x7fELF
1305c0f9 97050000 b125 29a8                 ; Code Block 1
0200                                        ; e_type:       ET_EXEC
f300                                        ; e_machine:    EM_RISCV
3400                                        ; pathname:     "4\x00"
5858                                        ; unused
0400010000000000                            ; e_entry:      0x10004
4000000000000000                            ; e_phoff:      0x40
1306d007 8546 31a0                          ; Code Block 2
05000000                                    ; e_flags:      EF_RISCV_RVC | EF_RISCV_FLOAT_ABI_DOUBLE
4000                                        ; e_ehsize:     0x40
3800                                        ; e_phentsize:  0x38
0100                                        ; e_phnum:      0x01
93088003 2da8                               ; Code Block 3
01000000                                    ; p_type:       PT_LOAD
05000000                                    ; p_flags:      PF_X | PF_R
0000000000000000                            ; p_offset:     0x0
0000010000000000                            ; p_vaddr:      0x10000
0000010000000000                            ; p_paddr:      0x10000
8a00000000000000                            ; p_filesz:     0x8a = 142 - 4
8a00000000000000                            ; p_memsz:      0x8a = 142 - 4
0010000000000000                            ; p_align:      0x1000
73000000                                    ; Code Block 4
b115 4506 a128 73000000 1145 f528 73000000

The code blocks disassemble to:

; Code Block 1 @ Offset 0x4
li, a0, -100        ; Arg1 (dirfd) = AT_FDCWD
auipc a1, 0x0       ; a1 = pc + 0
addiw a1, a1, +12   ; Arg2 (pathname) = pc + 12
c.j +0x1a           ; Jump to file offset 0x28

; Code Block 2 @ Offset 0x28
li a2, 125          ; Arg3 (flags) = O_CREAT | O_WRONLY | 61
li a3, 1            ; Arg4 (mode) = O_WRONLY
c.j +0xc            : Jump to file offset 0x3a

; Code Block 3 @ Offset 0x3a
li a7, 56           ; Syscall = 56 (SYS_OPENAT)
c.j +0x3a           ; Jump to file offset 0x78

; Code Block 4 @ Offset 0x78
ecall               ; Do the syscall
addi a1, a1, -20    ; Arg2 (buf) = 0x10000
addi a2, a2, 17     ; Arg3 (len) = 142
addiw a7, a7, 8     ; Syscall = 64 (SYS_WRITE)
ecall               ; Do the syscall
li a0, 4            ; Arg1 (ret) = 4
addiw a7, a7, 29    ; Syscall = 93 (SYS_EXIT)
ecall               ; Do the syscall

And here’s the final proof:

$ make
rm -f 4
nasm -f bin -o elf elf.asm
./elf || /bin/true

sha256sum elf 4
001709c49a83c14cdad71d0182d62bdb33a3f43647d867299208a6a899deb422  elf
001709c49a83c14cdad71d0182d62bdb33a3f43647d867299208a6a899deb422  4

ls -la elf
-rwxrwxr-x 1 ubuntu ubuntu 142 Aug 19 17:27 elf

Conclusion

This year’s theme was a lot of fun to get stuck into. If you made it this far then you can see I had quite a few ideas for how best to go about constructing an entry, but the most important part is that I learnt a lot fleshing out each idea.

Similar to previous years, I didn’t actually submit any of these entries for scoring as I actually help out with scoring other people’s entries with the Binary Golf Association.

If you didn’t submit an entry this year, keep an eye out for BGGP5 next year! I really can’t recommend it enough :)

Until next time…