What’s Going On?

Back in February 2020, there were some stirrings on the LKML about unexporting kallsyms_lookup_name() from the kernel. The main reason for this is that unscrupulous module developers will often simply add MODULE_LICENSE('GPL') to their code (without actually licensing their module as such). Then, using kallsyms_lookup_name(), they can use any other exported kernel function to their heart’s content. The kernel developers don’t like this because it enables out-of-tree modules to call non-exported functions.

Clearly, this is a problem for us! In particular, ftrace_helper.h uses kallsyms_lookup_name() to get the address of functions that we want to hook. Indeed, as of kernel version 5.7, we can no longer use this function anymore (diff).

Along with this change, there are a couple of other changes to the procfs system that came with the release of version 5.6. While these fixes are relatively minor, there’s a bit more work to be done in order to get ftrace_helper.h working on newer kernels.

It’s worth noting that, as of writing, the latest kernel available for Ubuntu 20.04 is 5.4.0-60-generic, so these changes won’t actually affect you yet if you’re on an LTS. But it’s nice to be ahead of the curve!

ProcFS Changes

Let’s deal with the changes to proc_create() first. This is a pretty simple fix, but illustrates a nice method for handling such changes, without breaking existing support. Looking at the declaration in v5.5.19, we see:

struct proc_dir_entry *proc_create(const char *name, 
        umode_t mode, 
        struct proc_dir_entry *parent, 
        const struct file_operations *proc_fops);

This declaration is what we made use of in escape.c from Privileged Container Escapes with Kernel Modules. However, as of version 5.6, proc_create() now looks like this:

struct proc_dir_entry *proc_create(const char *name, 
        umode_t mode, 
        struct proc_dir_entry *parent, 
        const struct proc_ops *proc_ops);

Notice that the final argument has changed from a file_operations struct to a proc_ops struct? We need to account for this change in our code. There are two main differences between these structs that we care about:

  • There is no longer a .owner field in proc_ops
  • The .read / .write fields for the IO handlers are now called .proc_read / .proc_write respectively

So, what’s the best way to handle these changes? With the preprocessor! In particular, <linux/version.h> provides us with the LINUX_VERSION_CODE and KERNEL_VERSION macros. These let us implement these changes very simply:

#if LINUX_VERSION_CODE >= KERNEL_VERSION(5,6,0)
// proc_ops version
static const struct proc_ops proc_file_fops_escape = {
    .proc_write = escape_write,
};

static const struct proc_ops proc_file_fops_output = {
    .proc_write = output_write,
    .proc_read = output_read,
};
#else
// file_operations version
static const struct file_operations proc_file_fops_escape = {
    .owner = THIS_MODULE,
    .write = escape_write,
};

static const struct file_operations proc_file_fops_output = {
    .owner = THIS_MODULE,
    .write = output_write,
    .read = output_read,
};
#endif

And with that, the first problem is solved! The docker escape now compiles on 5.10.6-arch1-1 and works as expected. The same code still compiles on 5.4.0-60-generic on Ubuntu 20.04.

This fix has been merged into the repo. You can see the change mentioned above here.

The Kallsyms Problem

Now, we come to a slightly more difficult problem. Without kallsyms_lookup_name(), we can’t easily resolve a symbol name to a memory address, which means we can’t hook functions with ftrace (recall that we use ftrace to register callbacks that are triggered when $rip equals the memory address of the function we want to hook).

My original idea was to look for a different kernel function (which is still exported) that could be used to inadvertently resolve symbol names. I settled on sprint_symbol() which does the opposite of kallsyms_lookup_name(), i.e. given a memory address, it returns the name of the function at that address.

Using this, I decided to just loop over addresses from the base address up, calling sprint_symbol() each time and strncmp()ing until I found that function I wanted. While slightly inelegant, it worked surprisingly well. It looked something like this:

/*
 * kaddr is an unsigned long which holds the memory address being looped over
 * fname_lookup is a kernel buffer which stores the name of the function at kaddr
 * fname is a kernel buffer storing the function we're searching for
 */

/*
 * Trick to get the kernel base address
 * sprint_symbol() is less than 0x100000 bytes from the base address, so
 * we can just AND-out the last 3 bytes from it's address to obtain the address
 * of startup_64 (the kernel load address)
 */
kaddr = (unsigned long) &sprint_symbol;
kaddr = &= 0xffffffffff000000;

/* During testing, all the interesting functions were found below this limit */
for ( i = 0x0 ; i < 0x100000 ; i++ )
{
    sprint_symbol(fname_lookup, kaddr);

    if (strncmp(fname_lookup, fname, strlen(fname)) == 0)
    {
        /* Match! Clean up and exit */
        kfree(fname_lookup);
        return kaddr;
    }

    /* Kernel function addresses are all aligned, so we skip 0x10 bytes */
    kaddr += 0x10;
}
kfree(fname_lookup);

If I didn’t end up using this technique, why have I bothered to tell you about it? For two reasons; first is to illustrate that there is always more than one way to skin a cat. Second is because of the trick I used above to get the kernel base address. The problem that I faced was how to know where to start brute-forcing from. The address that the kernel is loaded from is called startup_64 (you can find it in /proc/kallsyms), but kernel address space layout randomization means that this address will change at every boot. However, even though we can’t use kallsyms_lookup_name(), we can still get the address of any exported kernel function by using the & operator.

If you check the address of sprint_symbol and startup_64 on your system, you’ll notice that only the last 3 bytes are different. This is because sprint_symbol is less than 0x100000 bytes from the beginning of the kernel. That difference does not change between reboots. Therefore, we can just drop those last three bytes and we get the base address! Although it’s already in the snippet above, I’ll lay it out here again because I think it’s pretty cool:

/* Get the address of sprint_symbol() */
kaddr = (unsigned long) &sprint_symbol;

/* Set the last 3 bytes of the address to 0x00 */
kaddr &= 0xffffffffff000000;

While I was working on refining this technique, @f0lg0 opened an issue on GitHub bringing up exactly this problem, and proposed a cool technique that used kprobes instead.

The Kprobe system lets you dynamically insert breakpoints into a running kernel. All we’re going to use it for is to do the job of kallsyms_lookup_name() to lookup itself!

After a bit of back-and-forth, they came up with a a very neat solution to the problem. Their code in that comment illustrates the main idea really well. We simply declare a kprobe struct with the .symbol_name field preset to kallsyms_lookup_name. Once the kprobe is registered, we can dereference the .addr field to obtain the memory address!

In order to implement this technique effectively and neatly, I wanted all the changes to be in ftrace_helper.h only. The trick here is to use the macros provided by <linux/version.h> as mentioned above, to check the kernel version, and then resolve kallsyms_lookup_name() manually before using it as we would normally.

Initially, we just include <linux/kprobes.h> and declare the kprobe struct: (see it in place here):

#if LINUX_VERSION_CODE >= KERNEL_VERSION(5,7,0)
#define KPROBE_LOOKUP 1
#include <linux/kprobes.h>
static struct kprobe kp = {
    .symbol_name "kallsyms_lookup_name"
};
#endif

With that in-place, before we attempt to use kallsyms_lookup_name(), we just add the following snippet. All that needs to be done is register the kprobe, assign the .addr field to a symbol called kallsyms_lookup_name (after appropriately casting it), and then unregister the kprobe once we’re done (see it in place here).

#ifdef KPROBE_LOOKUP
    /* typedef for kallsyms_lookup_name() so we can easily cast kp.addr */
    typedef unsigned long (*kallsyms_lookup_name_t)(const char *name);
    kallsyms_lookup_name_t kallsyms_lookup_name;

    /* register the kprobe */
    register_kprobe(&kp);

    /* assign kallsyms_lookup_name symbol to kp.addr */
    kallsyms_lookup_name (kallsyms_lookup_name_t) kp.addr;

    /* done with the kprobe, so unregister it */
    uregister_kprobe(&kp);
#endif

Of course, if we’re not compiling on kernel 5.7+, then none of this will trigger and kallsyms_lookup_name() will be resolved by the kernel headers (as has been the case before now). This way, we don’t have to make any changes to existing code in ftrace_helper.h - and kernel versions prior to 5.7 are unaffected!

The Syscall Name Problem

Finally, there is another small patch which fixes something that’s been bothering me. Despite sharing the same name, there were actually two slightly different ftrace_helper.h files in the repo. The reason is that I was using a macro to add __x64_ to syscall names, but the problem is that there isn’t an easy way (that I know of) to only add __x64_ to strings that start with sys_. To solve this, I had simply removed the corresponding macro from ftrace_helper.h when I wasn’t hooking a syscall.

This is very inelegant, so I instead decided to remove the macro altogether, and simply manually add __x64_ to any rootkit.cs that hook syscalls. The downside is that 32-bit kernels are no longer supported automically (you’d have to remove __x64_ from the HOOK() macro in rootkit.c and recompile), but 32-bit isn’t too much of a concern nowadays (I haven’t actually tested anything on 32-bit, so I don’t even know which modules are broken and which work!).

And with that…

Now the rootkit techniques on the repo work with the latest kernel! Thanks again to @f0lg0 for their idea to use kprobes to resolve kallsyms_lookup_name() - definitely neater than brute-forcing the address.

Until next time…