What’s In A Name?

Back in August, the NSA and FBI jointly issued a Cybersecurity Advisory on a previously undisclosed piece of malware developed by the Russian GRU called “Drovorub” - a name that comes from the Russian words “дрово” and “руб”, which together translate to “woodcutter” or, as I’m taking it, “lumberjack”.

What made this particular malware more interesting than usual is that it included a kernel module rootkit! In this post, I want to go through some of the techniques that this kernel module uses and how it relates to the techniques that we’ve already covered in other posts.

Somewhat frustratingly, the report is fairly sparse on details. It tells us what the malware is capable of, but not precisley how it accomplishes it. For example, we are told that Drovorub hooks kernel functions “either by patching functions directly, or by overwriting function pointers that point to the functions”. What method is used to do this? The phrase “overwriting function pointers that point to the functions” could refer to modifying the syscall table (stored in the sys_call_table kernel object) - a rather old and messy technique that only works for hooking syscalls. However, the report appears to indicate that the Drovorub authors favour hooking regular kernel functions, as no mention of syscalls can be found.

As far as “patching functions directly” goes - I’m not sure. I highly doubt that Drovorub patches the function code directly in kernel memory. While this is certainly a feasible attack in general (take a look at this example for FreeBSD 12), I can’t see it working well under Linux. Instead, these words may refer to the Ftrace method that we’ve been using throughout this series. Sadly, this is just guesswork on my part though. If anyone has any better ideas or insight, I’d be very interested to hear it!

As we go through some of the techniques mentioned by the report, I’ll link to any relevant earlier posts in this series, but here is the full list if you’ve missed any.

The privileged container escape I wrote will also be relevant:

If you’re reading through the official report, then the interesting parts are in the “Host-based Communications” and “Evasion” sections.

A quick note: I’m not going to attempt to recreate any of the code that went into the Drovorub malware. There is more than enough information contained within this series if you want to do that, but I neither endorse, nor encourage it. Please learn what you can in order to better understand the threats posed by these kinds of APTs.

Communication with Userspace

First off, let’s look at the way that the Drovorub kernel module communicates with userspace. The method outlined by the report is very clever and involves a combination of 2 techniques we’ve already looked at.

The main technique is a variation on the method we used in the privileged docker escape, but instead of a procfs file, Drovorub communicates via one of the pseudo-devices under /dev/. In our example, we implemented a brand new /proc/escape “file” by writing a custom read/write handler and registering the procfile with the kernel.

What makes the method used by Drovorub so clever is that they are hijacking an existing device file instead of creating a new one. The report states that /dev/zero is the device that’s used, but later we’ll see why that’s a strange choice.

You may recall from Part 4 that we’ve already implemented this functionality ourselves when we hijacked the random_read() kernel function to return sequences of 0x00 instead of random bytes when /dev/random was read from.

By combining these two methods, Drovorub is able to communicate between it’s kernelspace and userspace components in the following way:

Userland -> Kernel:

  • Userland process writes a command to /dev/zero
  • Kernel intercepts write to /dev/zero by hooking the corresponding write handler
  • Kernel carries out any functionality associated to the command that was sent

Kernel -> Userland:

  • Kernel sends a SIGUSR1 signal to the userland process to indicate that there is something for it to read back from /dev/zero
  • Userland process reads the contents of /dev/zero
  • Kernel returns a buffer of data to the userland process by hooking the read handler (after this transaction, subsequent reads to /dev/zero will behave normally)

Exactly how the rootkit determines which PID to send a signal to isn’t explained. Table XIII of the report details the command format used by Drovorub. It’s quite possible that the PID of the userland process is passed as part of the appended data to any command that requires some kind of output (At least that would make sense in my mind, but I’m totally guessing here).

By looking at drivers/char/mem.c, we find the zero_fops file_operations struct which stores the handlers for the /dev/zero pseudo-device. In particular, we can see that the read handler and write handler is set to read_zero and write_zero respectively.

The read handler can be found further up at line 729. This would be a very easy function to write a hook for - a simple if statement to check for some condition would decide whether we should fill the supplied buffer with some secret data for the user, or just go ahead to fill it with 0x0 by calling the real function.

The write handler is a little different. Searching for write_zero give us a define on line 902 which identifies write_zero with write_null. This function is as simple as you might expect: it just just returns the count argument to indicate to userland that the buffer was written (even though it was discarded). The problem is that write_null is also the write handler under null_fops which is the file_operations struct for /dev/null. It seems a little messy to hijack write_null because it interfers with two different devices at once, when only a single device is necessary.

Looking at all the entries under /dev that are readable and writable by world, I first thought that the most logical candidates for this functionality would have beeen either /dev/random or /dev/urandom. Then I realised that they share a write handler too (see here and here), so it looks like either way you’ll end up hijacking writes to two devices at once whether you like it or not.

Process Hiding

The first kernel-based capability that the report talks about is hiding processes. We covered this in Part 7 where we just masked the directory listing of /proc/ to not show the PID we wanted to hide.

Drovorub doubles down and uses another approach in conjunction. Depending on kernel version, it hooks either find_pid_ns() or find_pid() and find_task_by_pid_type(). These last two are no longer present in the kernel - they got removed some time during the 2.6.27 cycle - which was back in October 2008! (see the end for some ideas as why this might be).

Let’s take a look at find_pid_ns():

struct pid *find_pid_ns(int nr, struct pid_namespace *ns)
{
    return idr_find(&ns->idr, nr);
}
EXPORT_SYMBOL_GPL(find_ns_pid);

This is prime function-hooking stuff. The description of idr_find() gives us a better idea of what this function is doing, but a better explanation is in the kernel documentation here.

Essentially, IDR is the general ID allocation system in the kernel - whether those IDs are file descriptors, PIDs, device numbers or even more arcane stuff like SCSI tags. The idr_find() function takes an ID number (in our case a PID) and a pid_namespace and looks up the pointer corresponding to the ID number within that namespace.

For the diehards, this is a bit more complicated that it sounds. idr_find() is a wrapper around radix_tree_lookup(), which is in turn a wrapper around __radix_tree_lookup(). This is where the magic happens. Radix trees are one of those computer-science-heavy concepts which borrow a lot from graph theory. What matters here is that there is a data structure that we use __radix_tree_lookup() to get the entries of.

The clue to how a function hook for find_pid_ns() might be written can be found in the description of idr_find(). Here, we see that a NULL pointer being returned indicates that either the ID is not allocated (or that the NULL pointer itself is associated to the ID - which would be rather bizarre).

This must be what Drovorub is doing! A simple check against the nr argument to find_pid_ns() to see if it matches one of the PIDs we want to hide and it can either return the pointer from idr_find(), or just return NULL to indicate that the PID isn’t associated to any process in memory.

The thing that is so clever here is that, by hooking find_pid_ns() and not idr_find() directly, PID allocation is completely unaffected! The kernel will use the varous idr_ and radix_tree_ functions to check which PIDs are allocated, and what the lowest unallocated PID is before allocating a new one. This is important because Drovorub communicates with userland by first sending a SIGUSR1 signal (described here) to the process. Indeed, if we follow the chain of function calls from sys_kill() we eventually get to pid_nr_ns():

pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns)
{
    struct upid *upid;
    pid_t nr = 0;

    if (pid && ns->level <= pid->level) {
        upid = &pid->numbers[ns->level];
        if (upid->ns == ns)
            nr = upid->nr;
    }
    return nr;
}
EXPORT_SYMBOL_GPL(pid_nr_ns);

For the interested, the chain is: sys_kill(), prepare_kill_siginfo(), task_tgid_vnr(), __task_pid_nr_ns(), pid_nr_ns().

For whatever reason (there is likely a very good reason), sys_kill() doesn’t go anywhere near the IDR subsystem. If it did, then Drovorub would probably have a lot of problems being able to hide processes this way and still send signals to them. I imagine that this is a broad, kernel-wide decision that would like have a lot of ramifications if done differently (if you know the precise reason, please let me know!).

File Hiding

Despite the techniques outlined above, all the active PIDs on the system will still show up under /proc/. On top of this, the Drovorub malware also hides the userland executable component of itself, as explained in the report. As mentioned earler, it takes a similar approach to what was done in Part 6 except, while we hooked sys_getdents64() directly, Drovorub instead hooks d_lookup(), iterate_dir() and, for kernel versions prior to 4.1, vfs_readdir().

Exactly why the Drovorub authors decided to hook iterate_dir() instead of sys_getdents64() is unclear, especially seeing as sys_getdents64() uses iterate_dir(), as you can see on line 366. Perhaps they opted to save on overhead - if you don’t hook syscalls then you don’t have to worry about multiple calling conventions brought about by the whole pt_regs change in kernel version 4.17 (see Part 2 for more on that).

Looking at sys_getdents64(), we see that the syscall starts off by calling fdget_pos() with the supplied file descriptor. This returns an fd struct which contains a file struct as a subfield. This file struct is now passed on to iterate_dir(). Taking a closer look at the file struct, we see that it has a path struct field called f_path. Continuing down the rabbit hole, we see a dentry struct, which we know all about from Part 6! A dentry struct contains a d_name object which is the name of the file!

In all likelihood, Drovorub’s iterate_dir() hook is first comparing file->f_path.dentry->d_name to either a pre-configured string or one instructed by the userspace component (see above). If it gets a match, it likely just returns 0, otherwise it can call the real iterate_dir().

There’s one final caveat to all of this, as explained here. There is a function pointer called lookup() which is found deep within every path struct (f_path.dentry->d_inode->i_op->lookup). Drovorub also manages to hook this function, but to what end and precisely how this edge-case arises, I am not sure.

Socket Hiding

The final technique I want to discuss is the one that’s perhaps open to the most conjecture. The official report states that “the Drovorub-kernel module hooks the appropriate kernel function and filters out the hidden sockets. It determines the function to hook by opening up the appropriate interface in the /proc/net directory in the proc file system”. It goes on to explain the difference between the tpc, tcp6, udp and udp6 “files” under /proc/net/.

This doesn’t make a whole lot of sense. We’ve explored the /proc/net/tcp file before in Part 8 and it doesn’t contain and functions or function pointers. I suspect what the NSA/FBI meant by the statement above is that Drovorub hooks tcp4_seq_show() tcp6_seq_show(), udp4_seq_show(), and udp6_seq_show() (we only hooked tcp4_seq_show() back in Part 6). Clearly there is no “appropriate interface” to open from the /proc/net directory.

In my opinion, I think it’s very likely that Drovorub is using almost the exact same technique as was used in Part 6 - except it has probably extended it to include the other 3 “files” tcp6, udp4, and udp6 - “Yara Rule #4” seems to indicate that this in indeed the case. One important difference is that the report specifies that Drovorub is capable of filtering connections not only based on source port (as we did), but also by destination port (i.e. it filters by skc_dport as well as skc_num).

Another interesting ability is that Drovorub is can hide any connections owned by a hidden process. Using our trusty strace, we can take (yet another) look at netstat. If we run netstat as root, we get to see all the processes assigned to each connection (otherwise we can only see the processes owned by our user). Checking the output of sudo strace -u root netstat -tunelp, we see that it loops through /proc/x/fd/y for each PID x and file descriptor y in order to identify which process owns which entry in each of tcp, tcp6, udp, and udp6. This means that the ability to hide connections owned by a processe is already covered by hiding it’s PID under /proc/!

Thing’s Left Unsaid

One last thing that I’d like to know more about, but the report fails to deliver on is the method by which Drovorub hides kernel modules. Table XIV confirms that it can be instructed to hide modules (I assume by name), but that’s all we get. Given what we already know, I think we can make a decent guess as to what they did.

Taking a closer look at the wording of Table XIV, we learn that the hm command will “hide a module”. The fact that it’s a module and not the module indicates that Drovorub is capable of hiding more than just itself from module listings.

In Part 5, we developed a method of hiding the rootkit module by fiddling with the linked list via the THIS_MODULE object. Practically, there are no reasons why this could not be extended to hide other modules - all the module would need to do is loop through the loaded kernel modules (easy enough thanks to the linked list!) and call list_del() on the ones that match some criteria - supposedly a successful strcmp() against the .name field of the module.

The only slightly complicated bit would be keeping track of the pointers to the modules that you’ve hidden because, as Table XIV informs us, the um command will unhide a module. Whatever internal book-keeping device Drovorub uses, it needs to keep track of the names of the modules associated to the saved pointers to that the right modules go back in the right place as necessary. I find it surprising (as well as frustrating!) that such a significant part of Drovorub’s functionality, with some potentially very interesting design choices, didn’t make it into the report.

Closing Remarks

One of the things that stands out to me from reading through the report is the focus on compatibility - beyond even what we attempted through the other posts in this series. Drovorub even goes as far as hooking kernel functions that are only present in kernel versions 2.6 and below (more than 12 years old at this point!). Does this tell us something about the intended targets?

Where do we still see such early kernel versions? Collective wisdom tells us that the IoT and embedded world still sees frequent use of such antiquated kernels. While researching this post, I found a report from Fraunhofer released in June 2020 entitled “Home Router Security Report”. On Page 8 is a pie chart which indicates that 31.4% of the routers they surveyed are running Linux Kernel 2.6.36! Especially worrying is that kernel module signing wasn’t implemented until kernel 3.7 - which would make mitigations against Drovorub extremely difficult.

Something which may indicate that routers are not the sole intended target is the delivery mechanism for the kernel module component. The bottom of page 5 of the report explains the usual kernel module loading methods for both Debian and Red Hat systems (/etc/modules.conf, etc). I find this interesting because it means that the kernel module itself must exist as a .ko file somewhere on the filesystem (even if it is hidden from directory listings once the module is loaded). The alternative to this would be loading the kernel module directly into memory (as was done in my privileged docker escape example) - although I hardcoded the kernel object as an array in the executable, there’s no reason why this couldn’t be delivered over the internet instead, thus leaving no remnant of the kernel module anywhere on the filesystem. The job of the forensic analyst is certainly made much easier by the approach taken by the Drovorub authors.

Why does this indicate that embedded devices aren’t the only target? In general, persistence on these kinds of devices is unnecessary (when was the last time you rebooted your router?). This still leaves servers as a possibility (sadly desktop Linux is still too small of a demograph to take seriously as a target), which I think makes the most sense. Perhaps both were the targets, or maybe the authors were just hedging their bets.

I hope you enjoyed this run through as much I enjoyed writing it. I was surprised at how similar so many of the techniques employed by Drovorub were to those explored by previous posts in this series. It seems that, in many cases, being able to load a kernel module allows an attacker to run rampant on a Linux system with very little being able to stop them.

Until next time…

Disclaimer

This post is totally educated guesswork. I have no affiliation with either the NSA, FBI, nor the GRU and I have not had the opportunity to examine the Drovorub malware. It was very deliberate that I didn’t attempt to recreate any of the source code that I suspect the GRU may have used in their development of Drovorub. Please do not attempt this yourself either. My hope is that anyone who reads this post (or any other post on this blog) uses the information gained to better defend themselves and others from this kind of malware.