Lessons in debugging: observe how programs interact with the Linux kernel with STRACE

GDB is the number one debugger and it can be used to debug all sorts of problems. There are however many other tools out there that can only debug a subset of issues, but they do it much more effectively than GDB.

Today we are going to talk about one such tool called STRACE. STRACE is a tool used to trace Linux system calls and signals. Before moving on to explain what kind of problems can STRACE detect, a short introduction to system calls and signals.

What are system calls and signals?

If you are already familiar with system calls and signals you can skip the next two sections. Otherwise carry on.

System calls

When developers write their programs, many times they will use functionalities provided by the standard C/C++ library or other libraries. But for some functionalities their program (or the library function) must ask the operating system to do it. When we say a program performs a system call, what we mean is that the program is asking the operating system to do one such service. Here are a few examples of system calls:

  • Working with files or directories. Programs generally are not aware of how files and directories are physically organized on storage. Instead, the operating system exposes everything a program needs to work with files and directories using a few system calls. On Linux these calls are: open, close, read, write, stat (working with files), getdents (list files in a directory).
  • Working with devices. A program might want to work with a device, e.g. a terminal or a keyboard. The program is not interested in how the device works internally, instead, the operating systems expose devices in a standard way to make programmer’s life easier. On Linux, system calls that are used to work with devices are: ioctl (control the device), read (read from the device), write (write to the device) etc.
  • Managing processes. A program might want to get information about or control the running processes, including itself. On Linux this is achieved using system calls: fork (create an identical copy of the calling process), clone (create a thread), execve (execute a program), exit (exit the program), wait (wait for the child process to exit).
  • Memory management. Programs can request or relinquish memory from the operating system using system calls. On Linux these are calls brk and mmap (allocate memory), munmap (deallocate memory) etc.
  • Communication between processes. Programs that want to communicate with other processes can achieve this using system calls. On Linux these are calls: pipe (creates a communication pipe), shmget and mmap (creates a shared memory for two processes to communicate), futex (used to implement users-pace mutexes) etc.
  • Network communication. The operating system hides all the complexities of network communication and exposes the network through a socket abstraction. System calls used to work with sockets on Linux are: socket (creates a socket), accept (accept new connections from a remote server), sendto (send data), rcvfrom (receive data) etc.

These are just the most common system calls and there are many others. Linux has more than 300 system calls. A full list can be found here.

A few notes on system calls

Rarely do the programmers work directly with system calls, instead, the functions in the standard library (and other libraries as well) expose system calls to developers. For example, working with files is done with fopen/fread/fwrite API in C. In C++ the same is achieved using classes in fstream header.

System calls are more expensive than regular function calls. When a system call is made, the program control flow leaves the user space and enters kernel space. The kernel (operating system) does some work for us and then the control returns to our program. Switch from user space to kernel space is not cheap like a regular function call and many unnecessary system calls can make your program slow.

Signals

Signals are notifications that are sent to your program to let it know something important has happened. Operating system can send a signal SIGBUS to your process to let it know that it has accessed an inaccessible memory. Another program (e.g. Out-Of-Memory Killer) can send a signal SIGKILL to your program when it detects that your program is using too much memory. GNU Debugger sends signals SIGSTOP to pause your program and SIGCONT to continue the execution of your program.

You now get the idea of what signals are. Signals are in their essence means of interprocess communication. But I have never seen two processes have a meaningful exchange of information using processes; normally, signals are used to inform the process that something extraordinary has happened.

Every time a program receives a signal, a default action will be performed automatically unless the program has explicitly overwritten it. Default action depends on the signal type and can be: Core (create the core dump and terminate the receiving process), Term (terminate the receiving process), Ign (ignore the received signal), Stop (pause the program) and Cont (resume the stopped program).

The developer can override the default action for most signals. We call these signal maskable. Some signals are maskable like SIGINT (SIGINT is sent to the program when the user presses Ctrl + C. The default action is to terminate the running program, but the programmer can override this so the program does something else). There are also non-maskable signals where the default action cannot be overridden. Signals SIGKILL (kill the process no-matter what) and SIGSTOP (pause the execution of the program) are the only two non-maskable signals.

Here is a short list of a few most important signals taken from edspresso:

SignalDescriptionDefault Action
SIGINTInterrupt from keyboard (Ctrl + C)Term
SIGKILLForce the program to terminateTerm
SIGTERMPolitely ask the program to terminateTerm
SIGBUSAccess to memory that doesn’t existCore
SIGSEGVAccess to forbidden part of the memoryCore
SIGSTOPPauses the execution of the programStop
SIGCONTContinues the execution of the paused programCont
SIGILLProgram executed illegal or non-existing instructionCore
SIGFPEArithmetic instruction caused the exception (e.g. division by zero etc.)Core
SIGTRAPProgram executed the trap instructionCore
Most common signals on Linux

Signals are sent to your program asynchronously, i.e. when a signal arrives, your program will stop doing whatever it is it was doing and start processing the signal.

You can overwrite the default signal handler. For example, you can overwrite SIGTERM in order to delete temporary files your program created before exiting the program.

So now you know what are system calls and signals, moving on to STRACE utility.

Like what you are reading? Follow us on LinkedIn or Twitter and get notified as soon as new content becomes available.

STRACE

As we already said in the introduction, STRACE is a Linux tool that is used to trace system calls made by your program and signals received by your program. Since programs generally communicate with the kernel a lot, this can prove to be an invaluable source of information. Also, you don’t need the source code of the program you’re investigating, and you can trace almost any program!

What kind of problem can you debug with STRACE?

Here is a list of problems that are typically debugged using STRACE:

  • Check if the program is loading the correct library by monitoring open/mmap system calls.
  • Check the name of the log file the program is using by monitoring open/write system calls.
  • Check if the program is running other programs by monitoring execve system call.
  • Debug messages like Connection refused. You can check what is the remote server your program is trying to connect to by monitoring socket call.
  • Check how your program is allocating memory by monitoring mmap and brk.
  • Debug messages like File not found. You can check what is the file your program is trying to open by monitoring open or stat system calls.
  • Debug other weird messages without context produced by your programs.
  • Check if the compiler is loading the correct headers by monitoring the open system call and filtering by header name.
  • Check if make has missed a hidden dependency by monitoring open system call and then filtering only header files.

Please excuse the verbosity, but some of the ideas are simply brilliant. If you have other good ideas, feel free to leave a comment.

Now when we know what kind of information one can collect with STRACE, let’s move on to using this utility.

How to use STRACE?

Using STRACE is simple. Just prefix your command with strace and you are ready to go. Let’s trace ls command with strace.

$ strace ls
execve("/bin/ls", ["ls"], 0x7ffd909bfb40 /* 56 vars */) = 0
brk(NULL)                               = 0x559b54d55000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=105864, ...}) = 0
mmap(NULL, 105864, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0a30a58000
close(3)                                = 0
...
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=24, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
getdents(3, /* 57 entries */, 32768)    = 1840
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
write(1, " badblocks.list   Documents   Mu"..., 67 badblocks.list   Documents   Music	 Projects   Templates   Videos
) = 67
write(1, " Desktop\t  Downloads   Pictures\t"..., 72 Desktop	  Downloads   Pictures	 Public     usr        'VirtualBox VMs'
) = 72
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

In the output above I removed a lot of the output and replaced it with …

The first time you run strace you will notice that the output is very verbose. All system calls and signals are traced and everything is dumped on the screen. The output can really look intimidating, even for a simple program. This is a time when tools for filtering output like grep really come in handy.

Let’s have a look at few interesting lines from the example above:

execve("/bin/ls", ["ls"], 0x7ffd909bfb40 /* 56 vars */) = 0

This system call execve starts the program with appropriate parameters.

brk(NULL)                               = 0x559b54d55000

The system call brk allocates memory for the heap.

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=105864, ...}) = 0
mmap(NULL, 105864, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0a30a58000
close(3)                                = 0

The above code maps a library into the memory space so the program can call functions from the library. First, it opens the library file /etc/ld.so.cache. The openatsystem call returns a file descriptor 3 which used in the later system calls. System call ffstat checks if /etc/ld.so.cache is a file by specifying its file descriptor. Then it makes this file available in the memory using mmap and starting from address: 0x7f0a30a58000. And finally it closes the file handle.

openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
getdents(3, /* 57 entries */, 32768)    = 1840
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0

The few above system calls read the content of the current directory. System call fstat makes sure that the thing we want to list is actually a directory. System call getdents (short for get directory entries) lists the content of the directory.

write(1, " badblocks.list   Documents   Mu"..., 67 badblocks.list   Documents   Music	 Projects   Templates   Videos
) = 67
write(1, " Desktop\t  Downloads   Pictures\t"..., 72 Desktop	  Downloads   Pictures	 Public     usr        'VirtualBox VMs'
) = 72
close(1)                                = 0

System call write prints the files in the directory on the screen (file descriptor 1 is the output to the terminal).

As you can see, all sorts of information are available using strace. It is for you, the interpreter of this information, to figure out what is actually useful and what should be discarded.

We suggest you get introduced to system calls on need by need basis; if you try to grasp all system calls at once, you might get overwhelmed.

Advanced STRACE

STRACE comes with a few useful switches, that either limit the amount of information STRACE is producing or provide more interesting information. Here are a few:

Limit the tracing

STRACE comes with a switch -e to specify which system calls to trace. Here is an example on how to trace only open and stat system calls:

$ strace -e trace=open,stat ./my_program

You can trace a group of related calls. You can specify keywords file, network, process, signal, ipc or memory to trace system calls related to working with files, network, signals, inter-process communication or memory.

$ strace -e trace=file ./my_program

When you trace your program without any parameters, strace will trace signals as well. If you only want to trace signals, just remove tracing of all the system calls, like this:

$ strace -e trace=!all ./my_program

You will see all the received signals, but you won’t be able to see who generated those signals. STRACE cannot tell you that, but there are other tools that can.

Follow child processes

If you run your program through strace, it might happen that it starts children processes. By default, strace will not trace children processes, only the parent. If you want to trace children processes as well, use -f switch.

$ strace -f -e trace=network ./my_program
Process 87283 attached
...
[pid 87283] socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3

Next to each system call in the log you will see the pid (process identifier) of the process that made the call.

Output everything to a file

By default, strace prints all the information to the screen. You can use the -o switch to print the information to a file, like this:

$ strace -o strace_output.txt ./my_program

Print time when the system call happened

If you want to print wallclock time (e.g. 15:59:04) when the system call or signal happened, use -t switch:

$ strace -t ls
15:59:04 execve("/bin/ls", ["ls"], [/* 42 vars */]) = 0
...

If just seconds is not precise enough for you, you can also print milliseconds with -tt switch:

$ strace -tt ls
16:01:00.965998 execve("/bin/ls", ["ls"], [/* 42 vars */]) = 0
...

Measure how much time did the system call take

To measure how much time did the system call take, use the -T switch:

$ strace -T ls
execve("/bin/ls", ["ls"], [/* 42 vars */]) = 0 <0.000187>
...

You can also print the cumulative statistics of all system calls, you can use the -c switch like this:

$ strace -c ./my_program
...
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 76.89    0.203417       20342        10           wait4
 16.24    0.042960           3     14896       265 open
  3.93    0.010394           1     13217        60 stat
  1.15    0.003043           2      1397           getdents
  1.15    0.003034           4       703           select

Print lines in code that made the system call

You can use the -k switch to figure out where did the system call originate from. For this to work, strace needs to be linked with libunwind library. An example:

$ strace -k ./my_program
...
write(1, "\tInstruction per cycle: 1.01006\n", 32) = 32
 > /lib/x86_64-linux-gnu/libc-2.27.so(__write+0x14) [0x110154]
 > /lib/x86_64-linux-gnu/libc-2.27.so(_IO_file_write+0x2d) [0x8b1bd]
 > /lib/x86_64-linux-gnu/libc-2.27.so(_IO_do_write+0xb1) [0x8cf51]
 > /lib/x86_64-linux-gnu/libc-2.27.so(_IO_file_overflow+0x103) [0x8d403]
 > /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25(std::ostream::put(char)+0x8a) [0x11368a]
 > /home/ivica/Projects/johnysswlab/2020-07-branches/counting(measure_time::~measure_time()+0x467) [0x10c57]
 > /home/ivica/Projects/johnysswlab/2020-07-branches/counting(main+0x11d9) [0x3859]
 > /lib/x86_64-linux-gnu/libc-2.27.so(__libc_start_main+0xe7) [0x21b97]
 > /home/ivica/Projects/johnysswlab/2020-07-branches/counting(_start+0x2a) [0x491a]
...

You can see in the above example the stack backtrace which caused the system call.

Final Word

To really appreciate STRACE you will first need to understand what are the important system calls and how applications are structured around them. When you do, STRACE will become your best friend when it comes to debugging system calls and signals.

Like what you are reading? Follow us on LinkedIn or Twitter and get notified as soon as new content becomes available.

Further Read

signal(7) — Linux manual page

10 Strace Commands for Troubleshooting and Debugging Linux Processes

Leave a Reply

Your email address will not be published. Required fields are marked *