2023-05-17
The eBPF subsystem provides some useful out-of-the-box helper functions which can be used to interact with the system they run on. For example, there are helpers to print debugging messages, get the time since the system was booted, interact with eBPF maps, or manipulate network packets, and much more. However, it's worth noting that since there are several eBPF program types, and they do not run in the same context, each program type can only call a subset of these helpers.
Let's create a simple program that gets the name and pid of every process that makes a raw syscall, and print it to DebugFS:
#include "vmlinux.h"
#include <bpf/bpf_core_read.h>
#include <bpf/bpf_helpers.h>
SEC("tracepoint/raw_syscalls/sys_enter")
int sys_enter(struct trace_event_raw_sys_enter *args) {
struct task_struct *task = (void *)bpf_get_current_task();
char proc_name[TASK_COMM_LEN];
u32 pid;
bpf_core_read(&proc_name, TASK_COMM_LEN, &task->comm);
bpf_core_read(&pid, sizeof(pid), &task->pid);
bpf_printk("PROC: %s, PID: %d", proc_name, pid);
return 0;
}
char LICENSE[] SEC("license") = "Dual MIT/GPL";
Now that we have a base program, let's dissect our use of some of the aforementioned helper functions.
bpf_get_current_task()
The first helper we use on L8 is relatively self-explained - return a pointer to the current task_struct:
u64 bpf_get_current_task(void)
In our program we explicitly use the pid
and comm
fields of the returned task_struct
.
struct task_struct {
...
pid_t pid;
/*
* executable name, excluding path.
*
* - normally initialized setup_new_exec()
* - access it with [gs]et_task_comm()
* - lock it with task_lock()
*/
char comm[TASK_COMM_LEN];
}
There are specific helpers for the 2 fields we use bpf_get_current_pid_tgid()
and bpf_get_current_comm()
, but this is just an example program. Larger programs will likely use more fields and therefore benefit from the full task data, although the individual functions will likely be more efficient.
bpf_core_read()
BPF CO-RE (Compile Once - Run Everywhere) is a modern approach to writing portable BPF programs that can run on multiple kernel versions without the need to make target specific changes before compilation. As struct definitions and offsets can change over time, BPF programs that use CO-RE relocatable fields will be able to leverage some of the helper functions provided by libbpf to automatically read the correct memory offset of a struct without needing to make said changes.
The simplest of helpers is bpf_core_read()
, which reads N bytes from a source to a given destination. As mentioned above, and stated in the source, it does this in a CO-RE relocatable manner:
/*
* This relocation allows libbpf to adjust BPF instruction to use correct
* actual field offset, based on target kernel BTF type that matches original
* (local) BTF, used to record relocation.
*/
In our sys_enter
program, we use this helper to read the pid
and comm
fields from the current task_struct. It is worth noting that this function can return an error code, so real-world programs should check and handle this accordingly.
bpf_printk()
The Linux kernel provides bpf_trace_printk()
which is a print-like helper function for debugging. It prints a message defined by format fmt
(of size fmt_size
) to the /sys/kernel/debug/tracing/trace
file from DebugFS, if available. It can take up to three additional u64
arguments (like all eBPF helpers, the total number of arguments is limited to five):
long bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
If we were to use this helper, the code would look like this:
const char fmt_str[] = "PROC: %s, PID: %d";
bpf_trace_printk(fmt_str, sizeof(fmt_str), proc_name, pid);
Fortunately, libbpf provides a wrapper around this in the form of bpf_printk()
which automatically calculates the size of fmt_str
, allowing us to pass it directly:
bpf_printk("PROC: %s, PID: %d", proc_name, pid);
Putting it all together
Now that we have an overview of what our program does and how it works, let's build and run it:
// Generate BTF headers used by libbpf for CO-RE relocations.
$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
// Compile BPF program.
$ clang -c -g -O2 -target bpf -o probe.bpf.o probe.bpf.c
// Load the BPF program into the kernel.
$ sudo bpftool prog load probe.bpf.o /sys/fs/bpf/probe autoattach
// Show the loaded program info.
$ sudo bpftool prog show name sys_enter 78: tracepoint name sys_enter tag 5da1f289631da6eb gpl loaded_at 2023-05-16T17:28:03+0100 uid 0 xlated 200B jited 133B memlock 4096B map_ids 26 btf_id 233
Follow the debug trace pipe to see our program output:
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
...
<...>-1583 [003] ...21 23539.222075: bpf_trace_printk: PROC: Xorg:cs0, PID: 1583
<...>-772513 [000] ...21 23539.223686: bpf_trace_printk: PROC: alacritty, PID: 772513
<...>-772884 [008] ...21 23539.223718: bpf_trace_printk: PROC: cat, PID: 772884
The first 5 arguments come from the kernel, representing the process name (sometimes shortened to <...>
), PID, and timestamp since last boot. We can then see the helper function we used to print (don't forget, bpf_printk()
is a wrapper around the kernel implementation). Lastly, we see the output we defined in our program.
Lastly, we can unload the BPF program:
$ sudo rm /sys/fs/bpf/probe