QEMU - virtual machine
apt install qemu-system-x86
Disk - Ubuntu 22.04 Server Cloud Image + Seed
Boot
qemu-system-x86_64 -hda ubuntu-22.04-server-cloudimg-amd64.img -m 2G -enable-kvm -nographic -net nic -net user,hostfwd=tcp::2222-:22
Build
wget https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.8.9.tar.xz # stable at the time of writing
tar -xf linux-6.8.9.tar.xz
cd linux-6.8.9
make defconfig
make -j$(nproc)
Boot
qemu-system-x86_64 -hda ubuntu-22.04-server-cloudimg-amd64.img -m 2G -enable-kvm -nographic -net nic -net user,hostfwd=tcp::2222-:22 -kernel arch/x86/boot/bzImage -append "root=/dev/sda1 console=ttyS0"
Various analysis tools use the LLVM bitcode representation of the source. Compiling a single source file into bitcode is straightforward.
clang -S -emit-llvm <file.c>
to get the readable bitcode.clang -c -emit-llvm <file.c>
to get the binary bitcode.However, compiling large projects like the Linux kernel into bitcode is not as straightforward.
At a high level, each *.c
source file is compiled into an object file. Object
files .o
in a sub-system are linked together to into an intermediate
built-in.a
file. Finally, all the built-in.a
files are linked together to
get the final kernel image vmlinux
.
To get the bitcode representation of the whole kernel, each source is compiled to bitcode and then linked in the same order the kernel build system does. One way to do it us a build system instrumentation tool like rizsotto/Bear to capture the build commands and execute the modified command or use a tool like my personal favorite trailofbits/blight which instruments the build system with pre and post build hooks to capture the build commands and modify them on the fly.
There are tools that already do something similar like travitch/whole-program-llvm and SRI-CSL/gllvm and work for all large projects not only the kernel. But these tools did not work for the kernel for reasons I'll have to investigate later.
For now, I use a simple python script not that generates the whole program bitcode for the linux kernel. The script is available at akshithg/whole-kernel-bitcode
Andersen's pointer analysis is a flow-insensitive, context-insensitive, and field-sensitive pointer analysis algorithm. It is a data-flow analysis that computes inclusion constraints between points-to sets of pointers.
Statement | Inclusion Constraint |
---|---|
a = b | PointsToSet(b) ⊆ PointsToSet(a) |
a = &b | b ∈ PointsToSet(a) |
a = *b | ∀c ∈ PointsToSet(b): PointsToSet(c) ⊆ PointsToSet(a) |
*a = b | ∀c ∈ PointsToSet(a): PointsToSet(b) ⊆ PointsToSet(c) |
Inputs | Description |
---|---|
A | a set of n pointers |
S | a set of m statements, where m ≤ 4*n^2 unique statements |
Output | Description |
---|---|
Exhaustive | Return PointsToSet(a) for all pointers a ∈ A |
On-demand | Return if b ∈ PointsToSet(a) for a specific pair a, b ∈ A |