How To Debug A Syscall

Written by Nick Otter.

Contents

Introduction

“But what’s actually happening when I run that command..?” I’ve heard that a number of times. The answer is sometimes, a syscall. System calls are how a program enters the kernel to perform a task. It’s widely covered. Here’s some great blocks explaining syscalls from The Definitive Guide to Linux System Calls by Package Cloud.

When you run a program which calls open, fork, read, write (and many others) you are making a system call.
User programs (like your editor, terminal, ssh daemon, etc) need to interact with the Linux kernel so that the kernel can perform a set of operations on behalf of your user programs that they can’t perform themselves.

For example, if a user program needs to do some sort of IO (open, read, write, etc) or modify its address space (mmap, sbrk, etc) it must trigger the kernel to run to complete those actions on its behalf.

Requirements

Updated 04/2020
Linux Kernel 5.4 RHEL 8 4.18

Strace

strace allows you to see the actual system calls to the kernel and their results when a command is run.

Let’s take a look with strace at what happens when calling cat with a new file foo:

$ strace -tT cat foo
[root@rhel-8-1 ~]# strace -tT cat foo
08:42:01 execve("/usr/bin/cat", ["cat", "foo"], 0x7ffc470097b0 /* 44 vars */) = 0 <0.000510>
08:42:01 brk(NULL)                      = 0x557016302000 <0.000060>
08:42:01 arch_prctl(0x3001 /* ARCH_??? */, 0x7ffcb7452a40) = -1 EINVAL (Invalid argument) <0.000058>
08:42:01 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000067>
08:42:01 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000067>
08:42:01 fstat(3, {st_mode=S_IFREG|0644, st_size=79333, ...}) = 0 <0.000059>
08:42:01 mmap(NULL, 79333, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3aa0873000 <0.000067>
08:42:01 close(3)                       = 0 <0.000376>
08:42:01 openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000029>
08:42:01 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2009\2\0\0\0\0\0"..., 832) = 832 <0.000018>
08:42:01 fstat(3, {st_mode=S_IFREG|0755, st_size=5993088, ...}) = 0 <0.000016>
08:42:01 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3aa0871000 <0.000021>
08:42:01 mmap(NULL, 3942432, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f3aa029c000 <0.000024>
08:42:01 mprotect(0x7f3aa0455000, 2097152, PROT_NONE) = 0 <0.000045>
08:42:01 mmap(0x7f3aa0655000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b9000) = 0x7f3aa0655000 <0.000032>
08:42:01 mmap(0x7f3aa065b000, 14368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3aa065b000 <0.000022>
08:42:01 close(3)                       = 0 <0.000016>
08:42:01 arch_prctl(ARCH_SET_FS, 0x7f3aa0872540) = 0 <0.000016>
08:42:01 mprotect(0x7f3aa0655000, 16384, PROT_READ) = 0 <0.000028>
08:42:01 mprotect(0x5570153ca000, 4096, PROT_READ) = 0 <0.000024>
08:42:01 mprotect(0x7f3aa0887000, 4096, PROT_READ) = 0 <0.000026>
08:42:01 munmap(0x7f3aa0873000, 79333)  = 0 <0.000043>
08:42:01 brk(NULL)                      = 0x557016302000 <0.000017>
08:42:01 brk(0x557016323000)            = 0x557016323000 <0.000019>
08:42:01 brk(NULL)                      = 0x557016323000 <0.000015>
08:42:01 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 <0.000032>
08:42:01 fstat(3, {st_mode=S_IFREG|0644, st_size=217796096, ...}) = 0 <0.000017>
08:42:01 mmap(NULL, 217796096, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3a932e7000 <0.000023>
08:42:01 close(3)                       = 0 <0.000016>
08:42:01 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 <0.000018>
08:42:01 openat(AT_FDCWD, "foo", O_RDONLY) = 3 <0.000024>
08:42:01 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000016>
08:42:01 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000017>
08:42:01 mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3aa084f000 <0.000023>
08:42:01 read(3, "", 131072)            = 0 <0.000076>
08:42:01 munmap(0x7f3aa084f000, 139264) = 0 <0.000047>
08:42:01 close(3)                       = 0 <0.000026>
08:42:01 close(1)                       = 0 <0.000019>
08:42:01 close(2)                       = 0 <0.000033>
08:42:01 exit_group(0)                  = ?
08:42:01 +++ exited with 0 +++

Not for the faint of heart. But it’s not too complicated.

Strace output explained

Let’s take a look at the key parts of the ouput.

08:42:01 execve ( ... ) = 0 <0.000510>
Absolute time stamp of syscall.
(Shown by the -t flag).
System call name (see execve man page here). Arguments being passed to that system call (see execve man page here). Return value from system call script executed. 0 means success. Time spent in each syscall.
(Shown by the -T flag).

Where to see the actual execve script? tapset of course. (Note the timestamp from when the syscall script was last run).

/usr/share/systemtap/tapset/linux/sysc_execve.stp
[root@rhel-8-1 linux]# ls -lu sysc_execve.stp
-rw-r--r--. 1 root root 6257 May 13 08:42 sysc_execve.stp

Common syscall script return values explained

Common posix syscalls

(Abbreviated from webwurst on stackoverflow).

What you can do with strace

Trace specific system calls by using trace=. Like network calls in this example (memory calls would be trace=memory):

[root@rhel-8-1 ~]# strace -e trace=network nc localhost 2399
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(2399), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0

Output to file with -o:

$ strace -e trace=network nc localhost 2399 -o log.txt

Monitor a process that’s already running:

(In this case a netcat session, which is pid 6359. accept() is the system call to accept a connection on a socket (read more ).

[root@rhel-8-1 ~]# strace -p 6359
strace: Process 6359 attached
select(5, [3 4], [], NULL, NULL)        = 1 (in [4])
accept(4, {sa_family=AF_INET, sin_port=htons(42444), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 5
close(3)                                = 0
close(4)                                = 0
fcntl(5, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
select(6, [0 5], [], NULL, NULL

Thanks. The ultimate strace cheatsheet, The Definitive Guide to Linux System Calls and other articles were useful to write this. This written by Nick Otter.