Programming Comments - The problem with POSIX semaphores

Summary

It has been a number of years since I've used named semaphores. It could be that the last time I used a named semaphore was back in my OS/2 days. But I recently needed to coordinate some work between several applications running in a linux environment, and named semaphores was the right solution.

Or so I thought...

Here is what I discovered, which hopefully explains why POSIX named semaphores as implemented in Linux are not working for me:

IPC

I was a bit confused when I started looking into inter-process communication on linux systems. It took me a while to understand that we have 2 different IPC systems: the old traditional SysV-based IPC, and the new POSIX-based IPC.

The SysV semaphores are in #include <sys/sem.h>. This includes:

int semctl(int, int, int, ...);
int semget(key_t, int, int);
int semop(int, struct sembuf *, size_t);

Meanwhile, the new POSIX semaphores are in #include <semaphore.h>. A partial list of this API includes:

int sem_close(sem_t *);
sem_t *sem_open(const char *, int, ...);
int sem_post(sem_t *);
int sem_wait(sem_t *);

Of particular importance to note is the SysV IPC command-line tools such as ipcmk, ipcs and ipcrm will not work with POSIX semaphores, though that is actually of minimal importance when trying to write self-contained C/C++ applications.

Using semaphores

Opening a POSIX named semaphore is very simple. Note the following source code extract:

#include <semaphore.h>  // POSIX semaphores
#include <fcntl.h>      // semaphore flags: O_* constants
#include <sys/stat.h>   // semaphore modes: S_* constants

sem_t *sem = sem_open( "/MyTestSemaphore",
    O_CREAT     |   // create if it does not already exist
    O_CLOEXEC   ,   // close on execute
    S_IRUSR     |   // user permission: read
    S_IWUSR     ,   // user permission: write
    1           );  // initial value of the semaphore
	

To wait on the semaphore, you'd call sem_wait( sem ) and to release it sem_post( sem ).

The problem

In a simple scenario, all of this works very well. Client applications open/create the semaphore, call sem_wait() when it is needed, and then sem_post() when finished. But in non-trivial, embedded and/or commercial software, where you have to be prepared for and recover from external signals, there is a problem. There are at least two signals that cannot be caught: SIGKILL and SIGSTOP.

Unfortunately, if a client application receives one of these signals between the call to sem_wait() and sem_post(), the semaphore is now unusable. Or, at the very least, you're leaking one count every time this happens. If your initial semaphore count is 1, then a single instance of someone sending SIGKILL to a client application will starve all other clients waiting for that semaphore.

What I was expecting is for either the semaphore to be auto-posted back when the application gets cleaned up -- somewhat like open files get automatically closed and memory freed -- or for an additional sem_*() API that a watcher can query to determine when this situation has occurred. If there was a reliable way to determine that an application had been killed between the two calls, the watcher itself could decide to sem_post() and "recover" the lost semaphore.

Note that old SysV-style semaphore do have a way to cleanup after themselves to prevent this type of problem. the SEM_UNDO flag can be specified which causes changes to the semaphore to be reverted if the application terminates abnormally. Sadly, this is specific to SysV and does not apply to POSIX semaphores.

Alternatives

Depending on how the semaphore is used, there may be alternatives to POSIX semaphores:

SysV semaphores (though this feels like a step backwards...!)
other resources which are automatically cleaned up when a process is terminated:
- sockets bound to a particular port
- file locks

Without adding much more complexity, the socket and file solutions only work when the semaphore is boolean, used like a "named mutex". The file lock solution proved to be adequate for what I was working on. This alternative solution caused me to replace all of my calls to sem_wait() and sem_post() with calls to lockf(...).

fd = open( filename,
    O_RDWR      |   // open the file for both read and write access
    O_CREAT     |   // create file if it does not already exist
    O_CLOEXEC   ,   // close on execute
    S_IRUSR     |   // user permission: read
    S_IWUSR     );  // user permission: write

lockf( fd, F_TLOCK, 0 ); // lock the "semaphore"

// do some work here

// note that close() automatically releases the file lock
// so technically the call with F_ULOCK is not necessary
lockf( fd, F_ULOCK, 0 );
close( fd );
	

Remember to check return values for errors, especially after open() and the first lockf() to ensure you've waited on the "semaphore".

This is still not a perfect solution since the lock can be circumvented by manually deleting the lock file. But in my case, the likelyhood of an uncaught signal was higher and more devastating than someone removing the lock file.