Semaphores: A synchronization mechnism



When we write programs that use threads operating in multiuser systems, multiprocessing systems, or a combination of the two, we often discover that we have critical sections of code, where we need to ensure that a single process (or a single thread of execution) has exclusive access to a resource. Semaphores have a complex programming interface. Fortunately, we can easily provide ourselves with a much-simplified interface that is sufficient for most semaphore-programming problems. a semaphore is a special variable that takes only whole positive numbers and upon which only two operations are allowed: wait and signal. Since “wait” and “signal” already have special meanings in Linux and UNIX programming, we’ll use the original notation:

❑ P(semaphore variable) for wait
❑ V(semaphore variable) for signal

These letters come from the Dutch words for wait (passeren: to pass, as in a checkpoint before the critical section) and signal (vrijgeven: to give or release, as in giving up control of the critical section). You may also come across the terms “up” and “down” used in relation to semaphores, taken from the use of signaling flags.

Semaphore Definition

The simplest semaphore is a variable that can take only the values 0 and 1, a binary semaphore. This is the most common form. Semaphores that can take many positive values are called general semaphores. For the remainder of this chapter, we’ll concentrate on binary semaphores.
The definitions of P and V are surprisingly simple. Suppose we have a semaphore variable sv. The two operations are then defined as follows:
P(sv) If sv is greater than zero, decrement sv. If sv is zero, suspend execution of this process.
V(sv) If some other process has been suspended waiting for sv, make it resume execution. If no process is suspended waiting for sv, increment sv.
Another way of thinking about semaphores is that the semaphore variable, sv, is true when the critical section is available, is decremented by P(sv) so it’s false when the critical section is busy, and is incremented by V(sv) when the critical section is again available. Be aware that simply having a normal variable that you decrement and increment is not good enough, because you can’t express in C, C++, or almost any conventional programming language the need to make a single, atomic operation of the test to see whether the variable is true, and if so change the variable to make it false. This is what makes the semaphore operations special.

Linux Semaphore Facilities

Now that we’ve seen what semaphores are and how they work in theory, we can look at how the features are implemented in Linux. The interface is rather elaborate and offers far more facilities than are generally required. All the Linux semaphore functions operate on arrays of general semaphores rather than a single binary semaphore. At first sight, this just seems to make things more complicated, but in complex cases where a process needs to lock multiple resources, the ability to operate on an array of semaphores is a big advantage. In this chapter, we will concentrate on using single semaphores, since in most cases that’s all you will need to use.
The semaphore function definitions are

#include <sys/sem.h>
int semctl(int sem_id, int sem_num, int command, ...);
int semget(key_t key, int num_sems, int sem_flags);
int semop(int sem_id, struct sembuf *sem_ops, size_t num_sem_ops);

As we work through each function in turn, remember that these functions were designed to work for arrays of semaphore values, which makes their operation significantly more complex than would have been required for a single semaphore.
Notice that key acts very much like a filename in that it represents a resource that programs may use and cooperate in using if they agree on a common name. Similarly, the identifier returned by semget and used by the other shared memory functions is very much like the FILE * file stream returned by fopen in that it’s a value used by the process to access the shared file. And, just as with files, different processes will have different semaphore identifiers, though they refer to the same semaphore. This use of a key and identifiers is common to all of the IPC facilities discussed here, although each facility uses independent keys and identifiers.

Semget

The semget function creates a new semaphore or obtains the semaphore key of an existing semaphore.

int semget(key_t key, int num_sems, int sem_flags);

The first parameter, key, is an integral value used to allow unrelated processes to access the same semaphore. All semaphores are accessed indirectly by the program supplying a key, for which the system then generates a semaphore identifier. The semaphore key is used only with semget. All other semaphore functions use the semaphore identifier returned from semget.
There is a special semaphore key value, IPC_PRIVATE (usually 0), that is intended to create a semaphore that only the creating process could access. This rarely has any useful purpose, which is fortunate, because on some Linux systems the manual pages list as a bug the fact that IPC_PRIVATE may not prevent other processes from accessing the semaphore. The num_sems parameter is the number of semaphores required. This is almost always 1. The sem_flags parameter is a set of flags, very much like the flags to the open function. The lower nine bits are the permissions for the semaphore, which behave like file permissions. In addition, these can be bitwise ORed with the value IPC_CREAT to create a new semaphore. It’s not an error to have the IPC_CREAT flag set and give the key of an existing semaphore. The IPC_CREAT flag is silently ignored if it is not required. We can use IPC_CREAT and IPC_EXCL together to ensure that we obtain a new, unique semaphore. It will return an error if the semaphore already exists.
The semget function returns a positive (nonzero) value on success; this is the semaphore identifier used in the other semaphore functions. On error, it returns –1.

semop

The function semop is used for changing the value of the semaphore:

int semop(int sem_id, struct sembuf *sem_ops, size_t num_sem_ops);

The first parameter, sem_id, is the semaphore identifier, as returned from semget. The second parameter,
sem_ops, is a pointer to an array of structures, each of which will have at least the following members:

struct sembuf {
short sem_num;
short sem_op;
short sem_flg;
}

The first member, sem_num, is the semaphore number, usually 0 unless you’re working with an array of semaphores. The sem_op member is the value by which the semaphore should be changed. (You can change a semaphore by amounts other than 1.) In general, only two values are used, –1, which is our P operation to wait for a semaphore to become available, and +1, which is our V operation to signal that a semaphore is now available. The final member, sem_flg, is usually set to SEM_UNDO. This causes the operating system to track the changes made to the semaphore by the current process and, if the process terminates without releasing the semaphore, allows the operating system to automatically release the semaphore if it was held by this process. It’s good practice to set sem_flg to SEM_UNDO, unless you specifically require different behavior. If you do decide you need a value other than SEM_UNDO, it’s important to be consistent, or you can get very confused as to whether the kernel is attempting to “tidy up” your semaphores when your process exits. All actions called for by semop are taken together to avoid a race condition implied by the use of multiple semaphores. You can find full details of the processing of semop in the manual pages.

Semctl

The semctl function allows direct control of semaphore information:
int semctl(int sem_id, int sem_num, int command, ...);
The first parameter, sem_id, is a semaphore identifier, obtained from semget. The sem_num parameter is the semaphore number. You use this when you’re working with arrays of semaphores. Usually, this is 0, the first and only semaphore. The command parameter is the action to take, and a fourth parameter, if present, is a union semun, which according to the X/OPEN specification, must have at least the following members:

union semun {
int val;
struct semid_ds *buf;
unsigned short *array;
}
Many versions of Linux have a definition of the semun union in a header file (usually sem.h), though X/Open does say that you have to declare your own. If you do find that you need to declare your own, check the manual pages for semctl to see if there is a definition given. If there is, we suggest you use exactly the definition given in your manual, even if it differs from that above. There are many different possible values of command allowed for semctl. Only the two that we describe here are commonly used. For full details of the semctl function, you should consult the manual page. The two common values of command are
❑ SETVAL: Used for initializing a semaphore to a known value. The value required is passed as
the val member of the union semun. This is required to set the semaphore up before it’s used for the first time.
❑ IPC_RMID: Used for deleting a semaphore identifier when it’s no longer required. The semctl function returns different values depending on the command parameter. For SETVAL and IPC_RMID it returns 0 for success and –1 on error.

File I/O : system calls

File I/O

Opening Files

The most basic method of accessing a file is via the read( ) and write( ) system calls. Before a file can be accessed, however, it must be opened via an open( ) or creat( ) system call. Once done using the file, it should be closed using the system call close( ).

The open( ) System Call

A file is opened, and a file descriptor is obtained with the open( ) system call:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int   open (const char *name, int flags);
int   open (const char *name, int flags, mode_t mode);


The open( ) system call maps the file given by the pathname name to a file descriptor, which it returns on success. The file position is set to zero, and the file is opened for access according to the flags given by flags.

Flags for open( )

The flags argument must be one of O_RDONLY, O_WRONLY, or O_RDWR. Respectively, these arguments request that the file be opened only for reading, only for writing, or for both reading and writing.

For example, the following code opens /home/vineet/test  for reading:

Int  fd;
fd = open ("/home/vineet/test", O_RDONLY);
if (fd == -1)
/* error */
 A file opened only for writing cannot  also be read, and vice versa. The process issuing the open( )  system call must have sufficient permissions to obtain the access requested.

The flags  argument can be bitwise-ORed with one or more of the following values, modifying the behavior of the open request:

O_APPEND
 The file will be opened in append mode. That is, before each write, the file position will be updated to point to the end of the file. This occurs even if another process has written to the file after the issuing process’ last write, thereby changing the file position.

O_ASYNC
A signal (SIGIO by default) will be generated when the specified file becomes readable or writable. This flag is available only for terminals and sockets, not for regular files.

O_CREAT
If the file denoted by name does not exist, the kernel will create it. If the file already exists, this flag has no effect unless O_EXCL is also given.

O_DIRECT
The file will be opened for direct I/O .

O_DIRECTORY
If name is not a directory, the call to open( ) will fail. This flag is used internally by the opendir( ) library call.

O_EXCL
When given with O_CREAT, this flag will cause the call to open( ) to fail if the file given by name already exists. This is used to prevent race conditions on file creation.

O_LARGEFILE
The given file will be opened using 64-bit offsets, allowing files larger than two gigabytes to be opened. This is implied on 64-bit architectures.

O_NOCTTY
If the given name refers to a terminal device (say, /dev/tty), it will not become the process controlling terminal, even if the process does not currently have a controlling terminal. This flag is not frequently used.

O_NOFOLLOW
If name is a symbolic link, the call to open( ) will fail. Normally, the link is resolved, and the target file is opened. If other components in the given path are links, the call will still succeed. For example, if name is /etc/ship/plank.txt, the call will fail if plank.txt is a symbolic link. It will succeed, however, if etc or ship is a symbolic link, so long as plank.txt is not.

O_NONBLOCK
If possible, the file will be opened in nonblocking mode. Neither the open( ) call, nor any other operation will cause the process to block (sleep) on the I/O. This behavior may be defined only for FIFOs.

O_SYNC
The file will be opened for synchronous I/O. No write operation will complete until the data has been physically written to disk; normal read operations are already synchronous, so this flag has no effect on reads. POSIX additionally defines O_DSYNC and O_RSYNC; on Linux, these flags are synonymous with O_SYNC.

O_TRUNC
If the file exists, it is a regular file, and the given flags allow for writing, the file will be truncated to zero length. Use of O_TRUNC on a FIFO or terminal device is ignored. Use on other file types is undefined. Specifying O_TRUNC with O_RDONLY is also undefined, as you need write access to the file in order to truncate it.

For example, the following code opens for writing the file /home/teach/pearl. If the file already exists, it will be truncated to a length of zero. Because the O_CREAT flag is not specified, if the file does not exist, the call will fail:

int   fd;
fd = open ("/home/vineet/test1", O_WRONLY | O_TRUNC);
if (fd == -1)
/* error */



Owners of New Files

S_IRWXU
Owner has read, write, and execute permission.
S_IRUSR
Owner has read permission.
S_IWUSR
Owner has write permission.
S_IXUSR
Owner has execute permission.
S_IRWXG
Group has read, write, and execute permission.
S_IRGRP
Group has read permission.
S_IWGRP
Group has write permission.
S_IXGRP
Group has execute permission.
S_IRWXO
Everyone else has read, write, and execute permission.
S_IROTH
Everyone else has read permission.

S_IWOTH
Everyone else has write permission.
S_IXOTH
Everyone else has execute permission.



The creat( ) Function

The combination of O_WRONLY | O_CREAT | O_TRUNC is so common that a system call exists to provide just that behavior:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int  creat (const char *name, mode_t mode);


Return Values and Error Codes
Both open( ) and creat( ) return a file descriptor on success. On error, both return -1, and set errno to an appropriate error value (Chapter 1 discussed errno and listed the potential error values). Handling an error on file open is not complicated, as generally there will have been few or no steps performed prior to the open that need to be undone. A typical response would be prompting the user for a different filename or
simply terminating the program.


Reading via read( )

The most basic—and common—mechanism used for reading is the read( ) system call.

#include <unistd.h>
ssize_t read (int  fd, void *buf, size_t len);

Each call reads up to len bytes into buf from the current file offset of the file referenced by fd. On success, the number of bytes written into buf is returned. On error, the call returns -1, and errno is set. The file position is advanced by the number of bytes read from fd. If the object represented by fd is not capable of seeking (for example, a character device file), the read always occurs from the “current” position.
Basic usage is simple. This example reads from the file descriptor fd  into word . The number of bytes read is equal to the size of the unsigned long  type, which is four bytes on 32-bit Linux systems, and eight bytes on 64-bit systems. On return, nr  contains the number of bytes read, or -1  on error:


unsigned long word;
ssize_t  nr;

/* read a couple bytes into 'word' from 'fd' */
nr = read (fd, &word, sizeof (unsigned long));
if (nr == -1)
/* error */

 There are two problems with this -ve implementation: the call might return without reading all len  bytes, and it could produce certain errors that this code does not check for and handle.

Writing with write( )

The most basic and common system call used for writing is write( ). write( ) is the counterpart of read( ).

#include <unistd.h>
ssize_t  write (int fd, const void *buf, size_t count);

A call to write( ) writes up to count bytes starting at buf to the current file position of the file referenced by the file descriptor fd. Files backed by objects that do not support seeking (for example, character devices) always write starting at the “head.”

On success, the number of bytes written is returned, and the file position is updated in kind. On error, -1 is returned, and errno is set appropriately. A call to write( ) can return 0, but this return value does not have any special meaning; it simply implies that zero bytes were written.
As with read( ), the most basic usage is simple:

const char *buf = "My ship is solid!";
ssize_t  nr;
/* write the string in 'buf' to 'fd' */
nr = write (fd, buf, strlen (buf));
if (nr == -1)
/* error */

But again, as with read( ) , this usage is not quite right. Callers also need to check for the possible occurrence of a partial write:

unsigned long word = 1720;
size_t  count;
ssize_t  nr;
count = sizeof (word);
nr = write (fd, &word, count);
if (nr == -1)
/* error, check errno */
else if (nr != count)
/* possible error, but 'errno' not set */
   

Closing Files
After a program has finished working with a file descriptor, it can unmap the file descriptor from the associated file via the close( ) system call:

#include <unistd.h>
Int  close (int fd);
A call to close( ) unmaps the open file descriptor fd, and disassociates the process from the file. The given file descriptor is then no longer valid, and the kernel is free to reuse it as the return value to a subsequent open( ) or creat( ) call. A call to close( ) returns 0 on success. On error, it returns -1, and sets errno appropriately. Usage is simple:

if (close (fd) == -1)
perror ("close");

Note that closing a file has no bearing on when the file is flushed to disk. If an application wants to ensure that the file is committed to disk before closing it, it needs to make use of one of the synchronization options discussed earlier in “Synchronized I/O.”
Closing a file does have some side effects, though. When the last open file descriptor referring to a file is closed, the data structure representing the file inside the kernel is freed. When this data structure is freed, it unpins the in-memory copy of the inode associated with the file. If nothing else is pinning the inode, it too may be freed from memory (it may stick around because the kernel caches inodes for performance reasons, but it need not). If a file has been unlinked from the disk, but was kept open before it was unlinked, it is not physically removed until it is closed and its inode is removed from memory. Therefore, calling close( ) may also result in an unlinked file finally being physically removed from the disk.