Embed Your Career: May 2012

Process vs Threads

Both threads and processes are methods of parallelizing an application. However, processes are independent execution units that contain their own state information, use their own address spaces, and only interact with each other via interprocess communication mechanisms (generally managed by the operating system). Applications are typically divided into processes during the design phase, and a master process explicitly spawns sub-processes when it makes sense to logically separate significant application functionality. Processes, in other words, are an architectural construct.

By contrast, a thread is a coding construct that doesn't affect the architecture of an application. A single process might contains multiple threads; all threads within a process share the same state and same memory space, and can communicate with each other directly, because they share the same variables.

Threads typically are spawned for a short-term benefit that is usually visualized as a serial task, but which doesn't have to be performed in a linear manner (such as performing a complex mathematical computation using parallelism, or initializing a large matrix), and then are absorbed when no longer required. The scope of a thread is within a specific code module—which is why we can bolt-on threading without affecting the broader application.

The Kernel Source Tree

The kernel source tree is divided into a number of directories, most of which contain many more subdirectories.The directories in the root of the source tree, along with their descriptions.

Directory Description

arch                                                      Architecture-specific source
block                                                    Block I/O layer
crypto                                                  Crypto API
Documentation                                   Kernel source documentation
drivers                                                 Device drivers
firmware                                              Device firmware needed to use
                                                             certain drivers
fs                                                          The VFS and the individual filesystems
include                                                 Kernel headers
init                                                        Kernel boot and initialization
ipc                                                        Interprocess communication code
kernel                                                  Core subsystems, such as the scheduler
lib                                                         Helper routines
mm                                                       Memory management subsystem and
                                                              the VM
net                                                       Networking subsystem
samples                                               Sample, demonstrative code
scripts                                                 Scripts used to build the kernel
security                                               Linux Security Module
sound                                                  Sound subsystem
usr                                                       Early user-space code (called initramfs)
tools                                                    Tools helpful for developing Linux
virt                                                       Virtualization infrastructure

Linux Daemon and services

A daemon should be distinguished from a demon, which is an evil spirit in some religions.

A daemon is a type of program on Unix-like operating systems that runs unobtrusively in the background, rather than under the direct control of a user, waiting to be activated by the occurance of a specific event or condition.

Unix-like systems typically run numerous daemons, mainly to accommodate requests for services from other computers on a network, but also to respond to other programs and to hardware activity. Examples of actions or conditions that can trigger daemons into activity are a specific time or date, passage of a specified time interval, a file landing in a particular directory, receipt of an e-mail or a Web request made through a particular communication line. It is not necessary that the perpetrator of the action or condition be aware that a daemon is listening, although programs frequently will perform an action only because they are aware that they will implicitly arouse a daemon.
Daemons are usually instantiated as processes. A process is an executing (i.e., running) instance of a program. Processes are managed by the kernel (i.e., the core of the operating system), which assigns each a unique process identification number (PID).
There are three basic types of processes in Linux: interactive, batch and daemon. Interactive processes are run interactively by a user at the command line (i.e., all-text mode). Batch processes are submitted from a queue of processes and are not associated with the command line; they are well suited for performing recurring tasks when system usage is otherwise low.
Daemons are recognized by the system as any processes whose parent process has a PID of one, which always represents the process init. init is always the first process that is started when a Linux computer is booted up (i.e., started), and it remains on the system until the computer is turned off. init adopts any process whose parent process dies (i.e., terminates) without waiting for the child process's status. Thus, the common method for launching a daemon involves forking (i.e., dividing) once or twice, and making the parent (and grandparent) processes die while the child (or grandchild) process begins performing its normal function.
Some daemons are launched via System V init scripts, which are scripts (i.e., short programs) that are run automatically when the system is booting up. They may either survive for the duration of the session or be regenerated at intervals.
Many daemons are now started only as required and by a single daemon, xinetd (which has replaced inetd in newer systems), rather than running continuously. xinetd, which is referred to as a TCP/IP super server, itself is started at boot time, and it listens to the ports assigned to the processes listed in the /etc/inetd.conf or in /etc/xinetd.conf configuration file. Examples of daemons that it starts include crond (which runs scheduled tasks), ftpd (file transfer), lpd (laser printing), rlogind (remote login), rshd (remote command execution) and telnetd (telnet).
In addition to being launched by the operating system and by application programs, some daemons can also be started manually. Examples of commands that launch daemons include binlogd (which logs binary events to specified files), mysqld (the MySQL databse server) and apache (the Apache web server).
In many Unix-like operating systems, including Linux, each daemon has a single script (i.e., short program) with which it can be terminated, restarted or have its status checked. The handling of these scripts is based on runlevels. A runlevel is a configuration or operating state of the system that only allows certain selected processes to exist. Booting into a different runlevel can help solve certain problems, including repairing system errors.
The term daemon is derived from the daemons of Greek mythology, which were supernatural beings that ranked between gods and mortals and which possessed special knowledge and power¹. For example, Socrates claimed to have a daemon that gave him warnings and advice but never coerced him into following it. He also claimed that his daemon exhibited greater accuracy than any of the forms of divination practiced at the time.
The word daemon was first used in a computer context at the pioneering Project MAC (which later became the MIT Laboratory for Computer Science) using the IBM 7094 in 1963. This usage was inspired by Maxwell's daemon of physics and thermodynamics, which was an imaginary agent that helped sort molecules of different speeds and worked tirelessly in the background. The term was then used to describe background processes which worked tirelessly to perform system chores. The first computer daemon was a program that automatically made tape backups. After the term was adopted for computer use, it was rationalized as an acronym for Disk And Execution MONitor.
On the Microsoft Windows operating systems, programs called services perform the functions of daemons, although the term daemon is now sometimes being used with regard to those systems as well.

acpid	This a completely flexible, totally extensible daemon for delivering ACPI events. It listens on a file (/proc/acpi/event) and when an event occurs, executes programs to handle the event. ACPI stands for: Advanced Configuration and Power Interface.
aep1000	For AEP 1000 coprocessors. It's used for hardware cryptographic acceleration under Linux.
anacron	Anacron is a periodic command scheduler. It executes commands at intervals specified in days. Unlike cron, it does not assume that the system is running continuously. Every time Anacron is run, it reads a configuration file that specifies the jobs Anacron controls, and their periods in days. If a job wasn't executed in the last n days, where n is the period of that job, Anacron executes it. Anacron then records the date in a special timestamp file that it keeps for each job, so it can know when to run it again
apmd	The apmd package is a set of user-level programs to control the Advanced Power Management system found in all modern laptop computers and most modern desktops. apmd talks to the Linux kernel APM layer, which does all the hardware-dependent stuff.
atd	atd runs jobs queued by at.
autofs	Auto-autofs detects Disks, Partitions, CD-ROMs, Floppies etc. and sets up an automount configuration. So it provides an easy access to the hardware. Auto-autofs is a Perl script that searches the hardware for block devices using the /proc directory. It finds partitions on harddisks via fdisk and tries to detect the filesystems.
bcm5820	Hardware cryptographic accelerator support for Broadcom BCM5820 eCommerce Processor.
chargen	Character Generator Protocol. A useful debugging and measurement tool is a character generator service. A character generator service simply sends data without regard to the input. Listens on port 19 TCP/UDP.
chargen-udp	See chargen.
crond	Daemon to execute scheduled commands.
cups	The Common UNIX Printing System ("CUPS") is a cross-platform printing solution for all UNIX environments. It is based on the "Internet Printing Protocol" and provides complete printing services to most PostScript and raster printers.
cups-lpd	This is the CUPS Line Printer Daemon ("LPD") mini-server that supports legacy client systems that use the LPD protocol.
daytime	The Daytime Protocol (Internet RFC 867) is a simple protocol that allows clients to retrieve the current date and time from a remote server. While useful at a bsic level, the Daytime protocol is most often used for debugging purposes rather than actually acquire the current date and time. The daytime protocol is available on TCP port 13.
daytime-udp	See daytime.
echo	Service for testing, everything you send to port 7 (echo) would be sent back to you.
echo-udp	see echo
gpm	General Purpose Mouse Daemon. Necessary only if you want to use your mouse on the console (not xterms).
httpd	The apache web server.
iptables	firewall
irda	(Infrared Data Association) is an industry standard for infrared wireless communication.
irqbalance	Daemon to balance irq's across multiple CPUs. Only useful on SMP systems (more than one processor)
isdn	ISDN (Integrated Services Digital Network). Use only with ISDN network interfaces.
ktalk	A graphical talk client for KDE.
kudzu	Detects and configures new and/or changed hardware on a system.
lisa	LISa is a small daemon which is intended to run on end user systems. It provides something like a "network neighborhood", but only relying on the TCP/IP protocol stack, no smb or whatever. The information about the hosts in your "neighborhood" is provided via TCP port 7741. To use it: from a client computer, open konqueror and type lan://targetIP
messagebus	D-BUS is first a library that provides one-to-one communication between any two applications; dbus-daemon-1 is an application that uses this library to implement a message bus daemon. Multiple programs connect to the message bus daemon and can exchange messages with one another.
microcode_ctl	It decodes and sends new microcode to the kernel driver to be uploaded to Intel IA32 processors. (Pentium Pro, PII, PIII, Pentium 4, Celeron, Xeon etc - all P6 and above, which does NOT include pentium classics) It signals the kernel driver to release any buffers it may hold. The microcode update is volatile and needs to be uploaded on each system boot i.e. it doesn't reflash your cpu permanently, reboot and it reverts back to the old microcode. This driver is designed for Intel IA32 microprocessors only, it will not work with AMD or any other non-Intel processors as they don't support microcode updates or they support it in a manner different from Intel's specs.
mysqld	MySQL database server.
named	DNS server. Bind.
netfs	Network Filesystem Mounter. Needed for mounting NFS, SMB and NCP shares on boot.
network	Activates all network interfaces at boot time.
nfslock	To help manage file access conflicts and protect NFS sessions during failures, NFS offers a file and record locking service called the network lock manager. The network lock manager is a separate service NFS makes available to user applications. To use the locking service, applications must make calls to standard lock routines.
ntpd	The `ntpd` sets and maintains the system time of day in synchronism with Internet standard time servers. It is a complete implementation of the Network Time Protocol (NTP) version 4. Allows other computers to synchronize system time with your server.
pcmcia	PCMCIA cards.
portmap	The portmap service is a dynamic port assignment daemon for RPC services such as NIS and NFS.
postgresql	PostgreSQL database server.
random	Initialize kernel random number generator
rawdevices	Block devices. Links hardware to devices that store data.
rhnsd	Red Hat Network Service. Informs you about official security and bug updates for your system.
rsync	Its just like rpc with much more features. Provides a very fast method for bringing remote files into sync.
saslauthd	SASL (Simple Authentication and Security Layer) authentication server. Server to allow others identify on this server.
sendmail	Mail server, allows to send emails using this machine as mail server.
services	An internal xinetd services, listing active services.
sgi_fam	File Alteration Monitor, provides an API that applications can use to be notified when specific files or directories are changed. For example, consider a graphical file manager, when the user removes a file thru the file manager, their changes are visible immediately.
smartd	Self Monitor Analysis and Reporting Technology System. Monitor you hard disk for failures.
smb	Samba, allows to share and access MS windows network.
snmpd	Simple Network Management protocol. A standard protocol for non-windows networks.
snmptrapd	This is an SNMP application that recieves and logs SNMP TRAP and INFORM messages. Uses UDP port 162.
squid	Web proxy cache.
sshd	Secure Shell daemon, allows secure and remote logging to this machine.
syslog	Logs all system activities.
time	Retrieve the date and time from a host or hosts on the network and set the local system time TCP version.
time-udp	Retrieve the date and time from a host or hosts on the network and set the local system time UDP version.
tux	The TUX Web Server is an HTTP daemon for Linux . The TUX Web Server is different from other Web servers in that it runs partially from within the Linux kernel as a module, or kernel subsystem. Given sufficient networking cards, it enables direct scatter-gather direct memory access (DMA) and hardware-based TCP/IP checksums from the page cache (the Linux file data cache) directly to the network, avoiding extra data copies.
vncserver	VNC stands for Virtual Network Computing. It is remote control software which allows you to view and interact with one computer (the "server") using a simple program (the "viewer") on another computer anywhere on the Internet.
vsftpd	Secure FTP daemon.
winbind	Winbind is an nss switch module to map Windows NT Domain databases to Unix. In combination with Samba and pam_ntdom, a Unix box will be able to integrate straight into a full Windows NT Domain environment, without needing a Unix Account database.
xfs	The X font server (`xfs`) provides a standard mechanism for an X server to communicate with a font renderer, frequently running on a remote machine. It usually runs on TCP port 7100. You need to be running `xfs` if you want a remote X terminal to be able to use fonts from your system, or if you want to use fonts that your X server doesn't understand (and the font server does).
xinetd	Service wrapper. xinetd is a replacement for inetd, the internet services daemon. xinetd - eXtended InterNET services daemon - provides a good security against intrusion and reduces the risks of Denial of Services (DoS) attacks. Like the well known couple (inetd+tcpd), it enables the configuration of the access rights for a given machine.
yum	yum is an automatic updater and package installer/remover for rpm systems. It automatically computes dependencies and figures out what things should occur to install packages. It makes it easier to maintain groups of machines without having to manually update each one using rpm.

How to: Compile Linux kernel 2.6

Compiling custom kernel has its own advantages and disadvantages. However, new Linux user / admin find it difficult to compile Linux kernel. Compiling kernel needs to understand few things and then just type couple of commands.

Step # 1 Get Latest Linux kernel code

Visit http://kernel.org/ and download the latest source code. File name would be linux-x.y.z.tar.bz2, where x.y.z is actual version number. For example file inux-2.6.25.tar.bz2 represents 2.6.25 kernel version. Use wget command to download kernel source code:

$ cd /tmp
 $ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-x.y.z.tar.bz2

Note: Replace x.y.z with actual version number.

Step # 2 Extract tar (.tar.bz3) file

Type the following command:

# tar -xjvf linux-2.6.25.tar.bz2 -C /usr/src
 # cd /usr/src

Step # 3 Configure kernel

Before you configure kernel make sure you have development tools (gcc compilers and related tools) are installed on your system. If gcc compiler and tools are not installed then use apt-get command under Debian Linux to install development tools.
# apt-get install gcc
Now you can start kernel configuration by typing any one of the command:

$ make menuconfig - Text based color menus, radiolists & dialogs. This option also useful on remote server if you wanna compile kernel remotely.
$ make xconfig - X windows (Qt) based configuration tool, works best under KDE desktop
$ make gconfig - X windows (Gtk) based configuration tool, works best under Gnome Dekstop.

For example make menuconfig command launches following screen:
$ make menuconfig
You have to select different options as per your need. Each configuration option has HELP button associated with it so select help button to get help.

Step # 4 Compile kernel

Start compiling to create a compressed kernel image, enter:
$ make
Start compiling to kernel modules:
$ make modules
Install kernel modules (become a root user, use su command):

$  su -
 #  make modules_install

Step # 5 Install kernel

So far we have compiled kernel and installed kernel modules. It is time to install kernel itself.
# make install
It will install three files into /boot directory as well as modification to your kernel grub configuration file:

System.map-2.6.25
config-2.6.25
vmlinuz-2.6.25

Step # 6: Create an initrd image

Type the following command at a shell prompt:

# cd /boot
 # mkinitrd -o initrd.img-2.6.25 2.6.25

initrd images contains device driver which needed to load rest of the operating system later on. Not all computer requires initrd, but it is safe to create one.

Step # 7 Modify Grub configuration file - /boot/grub/menu.lst

Open file using vi:
# vi /boot/grub/menu.lst

title           Debian GNU/Linux, kernel 2.6.25 Default
root            (hd0,0)
kernel          /boot/vmlinuz root=/dev/hdb1 ro
initrd          /boot/initrd.img-2.6.25
savedefault
boot

Remember to setup correct root=/dev/hdXX device. Save and close the file. If you think editing and writing all lines by hand is too much for you, try out update-grub command to update the lines for each kernel in /boot/grub/menu.lst file. Just type the command:
# update-grub
Neat. Huh?

Step # 8 : Reboot computer and boot into your new kernel

Just issue reboot command:
# reboot

........................................................................

Linux kernel and classic Unix systems

Main differences exist between the Linux kernel and classic Unix systems:

1.Linux supports the dynamic loading of kernel modules.Although the Linux kernel is monolithic, it can dynamically load and unload kernel code on demand.

2.Linux has symmetrical multiprocessor (SMP) support.Although most commercial variants of Unix now support SMP, most traditional Unix implementations did not.

3.The Linux kernel is preemptive. Unlike traditional Unix variants, the Linux kernel can preempt a task even as it executes in the kernel. Of the other commercial Unix implementations, Solaris and IRIX have preemptive kernels, but most Unix kernels are not preemptive.

4. Linux takes an interesting approach to thread support: It does not differentiate between threads and normal processes.To the kernel, all processes are the same— some just happen to share resources.

5. Linux provides an object-oriented device model with device classes, hot-pluggable events, and a user-space device filesystem (sysfs).

6.Linux ignores some common Unix features that the kernel developers consider poorly designed, such as STREAMS, or standards that are impossible to cleanly implement.

7.Linux is free in every sense of the word.The feature set Linux implements is the result of the freedom of Linux’s open development model. If a feature is without merit or poorly thought out, Linux developers are under no obligation to implement it.To the contrary, Linux has adopted an elitist attitude toward changes: Modifications must solve a specific real-world problem, derive from a clean design, and have a solid implementation. Consequently, features of some other modern Unix variants that are more marketing bullet or one-off requests, such as pageable kernel memory, have received no consideration.

Despite these differences, however, Linux remains an operating system with a strong Unix heritage.

Kernel Role

Kernel role can be split into following parts:-

Process management
The kernel is in charge of creating and destroying processes and handling their connection to the outside world (input and output). Communication among different processes (through signals, pipes, or interprocess communication primitives) is basic to the overall system functionality and is also handled by the kernel. In addition, the scheduler, which controls how processes share the CPU, is part of process management. More generally, the kernel’s process management activity implements the abstraction of several processes on top of a single CPU or a few of them.

Memory management
The computer’s memory is a major resource, and the policy used to deal with it is a critical one for system performance. The kernel builds up a virtual addressing space for any and all processes on top of the limited available resources. The different parts of the kernel interact with the memory management subsystem through a set of function calls, ranging from the simple malloc/free pair to much more complex functionalities.

Filesystems
Unix is heavily based on the filesystem concept; almost everything in Unix can be treated as a file. The kernel builds a structured filesystem on top of unstructured hardware, and the resulting file abstraction is heavily used throughout the whole system. In addition, Linux supports multiple filesystem types, that is, different ways of organizing data on the physical medium. For example, disks may be formatted with the Linux-standard ext3 filesystem, the commonly used FAT filesystem or several others.

Device control
Almost every system operation eventually maps to a physical device. With the exception of the processor, memory, and a very few other entities, any and all device control operations are performed by code that is specific to the device being addressed. That code is called a device driver. The kernel must have embedded in it a device driver for every peripheral present on a system, from the hard drive to the keyboard and the tape drive. This aspect of the kernel’s functions is our primary interest in this book.

Networking
Networking must be managed by the operating system, because most network operations are not specific to a process: incoming packets are asynchronous events. The packets must be collected, identified, and dispatched before a process takes care of them. The system is in charge of delivering data packets across program and network interfaces, and it must control the execution of programs according to their network activity. Additionally, all the routing and address resolution issues are implemented within the kernel.

More About Sockets...... protocols and syntax...!!!!

Socket Protocols

Where the underlying transport mechanism allows for more than one protocol to provide the requested socket type, you can select a specific protocol for a socket.

Creating a Socket

The socket system call creates a socket and returns a descriptor that can be used for accessing the socket.

#include <sys/types.h>
#include <sys/socket.h>
int socket(int domain, int type, int protocol);

The socket created is one end point of a communication channel. The domain parameter specifies the address family, the type parameter specifies the type of communication to be used with this socket, and protocol specifies the protocol to be employed.

Domains include the following:

AF_UNIX UNIX internal (file system sockets)
AF_INET ARPA Internet protocols (UNIX network sockets)
AF_ISO ISO standard protocols
AF_NS Xerox Network Systems protocols
AF_IPX Novell IPX protocol
AF_APPLETALK Appletalk DDS

The most common socket domains are AF_UNIX, which is used for local sockets implemented via the UNIX and Linux file systems, and AF_INET, which is used for UNIX network sockets. The AF_INET sockets may be used by programs communicating across a TCP/IP network including the Internet. The Windows Winsock interface also provides access to this socket domain.
The socket parameter type specifies the communication characteristics to be used for the new socket. Possible values include SOCK_STREAM and SOCK_DGRAM.
SOCK_STREAM is a sequenced, reliable, connection-based two-way byte stream. For an AF_INET domain socket, this is provided by default by a TCP connection that is established between the two end points of the stream socket when it’s connected. Data may be passed in both directions along the socket connection. The TCP protocols include facilities to fragment and reassemble long messages and to retransmit any parts that may be lost in the network.
SOCK_DGRAM is a datagram service. You can use this socket to send messages of a fixed (usually small) maximum size, but there’s no guarantee that the message will be delivered or that messages won’t be reordered in the network. For AF_INET sockets, this type of communication is provided by UDP datagrams.

The protocol used for communication is usually determined by the socket type and domain. There is normally no choice. The protocol parameter is used where there is a choice. 0 selects the default protocol, which we’ll use in all our examples.
The socket system call returns a descriptor that is in many ways similar to a low-level file descriptor. When the socket has been connected to another end-point socket, you may use the read and write system calls with the descriptor to send and receive data on the socket. The close system call is used to end a socket connection.

Socket Addresses

Each socket domain requires its own address format. For an AF_UNIX socket, the address is described by a structure, sockaddr_un, defined in the sys/un.h include file.

struct sockaddr_un {
sa_family_t
sun_family;
char
sun_path[];
};
/* AF_UNIX */
/* pathname */

So that addresses of different types may be passed to the socket-handling system calls, each address format is described by a similar structure that begins with a field (in this case, sun_family) that specifies the address type (the socket domain). In the AF_UNIX domain, the address is specified by a filename in the sun_path field of the structure.
On current Linux systems, the type sa_family_t, defined by X/Open as being declared in sys/un.h, is taken to be a short. Also, the pathname specified in sun_path is limited in size (Linux specifies 108 characters; others may use a manifest constant such as UNIX_MAX_PATH). Because address structures
may vary in size, many socket calls require or provide as an output a length to be used for copying the particular address structure.
In the AF_INET domain, the address is specified using a structure called sockaddr_in, defined in netinet/in.h, which contains at least these members:

struct sockaddr_in {
short int
unsigned short int
struct in_addr
};
sin_family;
sin_port;
sin_addr;
/* AF_INET */
/* Port number */
/* Internet address */

The IP address structure, in_addr, is defined as follows:

struct in_addr {
unsigned long int
};
s_addr;

The four bytes of an IP address constitute a single 32-bit value. An AF_INET socket is fully described by its domain, IP address, and port number. From an application’s point of view, all sockets act like file descriptors and are addressed by a unique integer value.

Naming a Socket

To make a socket (as created by a call to socket) available for use by other processes, a server program needs to give the socket a name. Thus, AF_UNIX sockets are associated with a file system pathname, as you saw in the server1 example. AF_INET sockets are associated with an IP port number.

#include <sys/socket.h>
int bind(int socket, const struct sockaddr *address, size_t address_len);

The bind system call assigns the address specified in the parameter, address, to the unnamed socket associated with the file descriptor socket. The length of the address structure is passed as address_len. The length and format of the address depend on the address family. A particular address structure pointer will need to be cast to the generic address type (struct sockaddr *) in the call to bind. On successful completion, bind returns 0. If it fails, it returns -1 and sets errno to one of the following.

EBADF The file descriptor is invalid.
ENOTSOCK The file descriptor doesn’t refer to a socket.
EINVAL The file descriptor refers to an already-named socket.
EADDRNOTAVAIL The address is unavailable.
EADDRINUSE The address has a socket bound to it already. There are some more values for AF_UNIX sockets:
EACCESS Can’t create the file system name due to permissions.
ENOTDIR, ENAMETOOLONG Indicates a poor choice of filename.

Creating a Socket Queue

To accept incoming connections on a socket, a server program must create a queue to store pending requests. It does this using the listen system call.

#include <sys/socket.h>
int listen(int socket, int backlog);

A Linux system may limit the maximum number of pending connections that may be held in a queue. Subject to this maximum, listen sets the queue length to backlog. Incoming connections up to this queue length are held pending on the socket; further connections will be refused and the client’s connection will fail. This mechanism is provided by listen to allow incoming connections to be held pending while a server program is busy dealing with a previous client. A value of 5 for backlog is very common. The listen function will return 0 on success or -1 on error. Errors include EBADF, EINVAL, and ENOTSOCK, as for the bind system call.

Accepting Connections

Once a server program has created and named a socket, it can wait for connections to be made to the socket by using the accept system call.

#include <sys/socket.h>
int accept(int socket, struct sockaddr *address, size_t *address_len);

The accept system call returns when a client program attempts to connect to the socket specified by the parameter socket. The client is the first pending connection from that socket’s queue. The accept function creates a new socket to communicate with the client and returns its descriptor. The new socket will have the same type as the server listen socket. The socket must have previously been named by a call to bind and had a connection queue allocated by listen. The address of the calling client will be placed in the sockaddr structure pointed to by address. A null pointer may be used here if the client address isn’t of interest.
The address_len parameter specifies the length of the client structure. If the client address is longer than this value, it will be truncated. Before calling accept, address_len must be set to the expected address length. On return, address_len will be set to the actual length of the calling client’s address
structure.
If there are no connections pending on the socket’s queue, accept will block (so that the program won’t continue) until a client makes a connection. You may change this behavior by using the O_NONBLOCK flag on the socket file descriptor, using the fcntl function like this:

int flags = fcntl(socket, F_GETFL, 0);
fcntl(socket, F_SETFL, O_NONBLOCK|flags);
The accept function returns a new socket file descriptor when there is a client connection pending or -1 on error. Possible errors are similar to those for bind and listen, with the addition of EWOULDBLOCK, where O_NONBLOCK has been specified and there are no pending connections. The error EINTR will occur if the process is interrupted while blocked in accept.

Requesting Connections

Client programs connect to servers by establishing a connection between an unnamed socket and the server listen socket. They do this by calling connect.

#include <sys/socket.h>
int connect(int socket, const struct sockaddr *address, size_t address_len);

The socket specified by the parameter socket is connected to the server socket specified by the parameter address, which is of length address_len. The socket must be a valid file descriptor obtained by a call to socket.

If it succeeds, connect returns 0, and -1 is returned on error. Possible errorsthis time include the following:
EBADF An invalid file descriptor was passed in socket.
EALREADY A connection is already in progress for this socket.
ETIMEDOUT A connection timeout has occurred.
ECONNREFUSED The requested connection was refused by the server.
If the connection can’t be set up immediately, connect will block for an unspecified timeout period.

Once this timeout has expired, the connection will be aborted and connect will fail. However, if the call to connect is interrupted by a signal that is handled, the connect call will fail (with errno set to EINTR), but the connection attempt won’t be aborted but rather will be set up asynchronously. As with accept, the blocking nature of connect may be altered by setting the O_NONBLOCK flag on the file descriptor. In this case, if the connection can’t be made immediately, connect will fail with errno
set to EINPROGRESS and the connection will be made asynchronously.
While asynchronous connections can be tricky to handle, you can use a call to select on the socket file descriptor to indicate that the socket is ready for writing.

Closing a Socket

You can terminate a socket connection at the server and client by calling close, just as you would for low-level file descriptors. You should always close the socket at both ends. For the server, you should do this when read returns zero, but the close call could block if the socket has untransmitted data, is of a connection-oriented type, and has the SOCK_LINGER option set.

Basics Of SOCKET

What Is a Socket?

A socket is a communication mechanism that allows client/server systems to be developed either locally, on a single machine, or across networks. Linux functions such as printing and network utilities such as rlogin and ftp usually use sockets to communicate. Sockets are created and used differently from pipes because they make a clear distinction between client
and server. The socket mechanism can implement multiple clients attached to a single server.

Socket Connections

You can think of socket connections as telephone calls into a busy building. A call comes into an organi- zation and is answered by a receptionist who puts the call through to the correct department (the server process) and from there to the right person (the server socket). Each incoming call (client) is routed to an appropriate end point and the intervening operators are free to deal with further calls. Before you look at the way socket connections are established in Linux systems, you need to understand how they operate for socket applications that maintain a connection.
First of all, a server application creates a socket, which like a file descriptor is a resource assigned to the server process and that process alone. The server creates it using the system call socket, and it can’t be shared with other processes.
Next, the server process gives the socket a name. Local sockets are given a filename in the Linux file sys- tem, often to be found in /tmp or /usr/tmp. For network sockets, the filename will be a service identifier (port number/access point) relevant to the particular network to which the clients can connect. This identifier allows Linux to route incoming connections specifying a particular port number to the correct server process. A socket is named using the system call bind. The server process then waits for a client to connect to the named socket. The system call, listen, creates a queue for incoming connections. The server can accept them using the system call accept.
When the server calls accept, a new socket is created that is distinct from the named socket. This new socket is used solely for communication with this particular client. The named socket remains for further connections from other clients. If the server is written appropriately, it can take advantage of multiple connections. For a simple server, further clients wait on the listen queue until the server is ready again. The client side of a socket-based system is more straightforward. The client creates an unnamed socket by calling socket. It then calls connect to establish a connection with the server by using the server’s named socket as an address.
Once established, sockets can be used like low-level file descriptors, providing two-way data communications.

Socket Attributes

To fully understand the system calls used in this example, you need to learn a little about UNIX networking.
Sockets are characterized by three attributes: domain, type, and protocol. They also have an address used as their name. The formats of the addresses vary depending on the domain, also known as the protocol family. Each protocol family can use one or more address families to define the address format.

Socket Domains

Domains specify the network medium that the socket communication will use. The most common socket domain is AF_INET, which refers to Internet networking that’s used on many Linux local area networks and, of course, the Internet itself. The underlying protocol, Internet Protocol (IP), which only has one address family, imposes a particular way of specifying computers on a network. This is called the IP address.
Although names almost always refer to networked machines on the Internet, these are translated into lower-level IP addresses. An example IP address is 192.168.1.99. All IP addresses are represented by four numbers, each less than 256, a so-called dotted quad. When a client connects across a network via sockets, it needs the IP address of the server computer.
There may be several services available at the server computer. A client can address a particular service on a networked machine by using an IP port. A port is identified within the system by assigning a unique 16-bit integer and externally by the combination of IP address and port number. The sockets are communication end points that must be bound to ports before communication is possible.
Servers wait for connections on particular ports. Well-known services have allocated port numbers that are used by all Linux and UNIX machines. These are usually, but not always, numbers less than 1024. Examples are the printer spooler (515), rlogin (513), ftp (21), and httpd (80). The last of these is the server for the World Wide Web (WWW). Usually, port numbers less than 1024 are reserved for system services and may only be served by processes with superuser privileges. X/Open defines a constant in netdb.h, IPPORT_RESERVED, to stand for the highest reserved port number. Because there is a standard set of port numbers for standard services, computers can easily connect to each other without having to establish the correct port. Local services may use nonstandard port addresses. The domain in our first example is the UNIX file system domain, AF_UNIX, which can be used by sockets based on a single computer that perhaps isn’t networked. When this is so, the underlying protocol is file input/output and the addresses are absolute filenames. The address that you used for the server socket was server_socket, which you saw appear in the current directory when you ran the server application.
Other domains that might be used include AF_ISO for networks based on ISO standard protocols and AF_XNS for the Xerox Network System. We won’t cover these here.

Socket Types

A socket domain may have a number of different ways of communicating, each of which might have different characteristics. This isn’t an issue with AF_UNIX domain sockets, which provide a reliable two- way communication path. In networked domains, however, you need to be aware of the characteristics of the underlying network. Internet protocols provide two distinct levels of service: streams and datagrams.

Stream Sockets

Stream sockets (in some ways similar to standard input/output streams) provide a connection that is a sequenced and reliable two-way byte stream. Thus, data sent is guaranteed not to be lost, duplicated, or reordered without an indication that an error has occurred. Large messages are fragmented, transmitted, and reassembled. This is like a file stream, as it accepts large amounts of data and writes it to the low- level disk in smaller blocks. Stream sockets have predictable behavior.
Stream sockets, specified by the type SOCK_STREAM, are implemented in the AF_INET domain by TCP/IP connections. They are also the usual type in the AF_UNIX domain. We’ll concentrate on SOCK_STREAM sockets in this chapter because they are more commonly used in programming network
applications.
TCP/IP stands for Transmission Control Protocol/Internet Protocol. IP is the low-level protocol for packets that provides routing through the network from one computer to another. TCP provides sequencing, flow control, and retransmission to ensure that large data transfers arrive with all of the data present and correct or with an appropriate error condition reported.

Datagram Sockets

In contrast, a datagram socket, specified by the type SOCK_DGRAM, doesn’t establish and maintain a connection. There is also a limit on the size of a datagram that can be sent. It’s transmitted as a single network message that may get lost, duplicated, or arrive out of sequence—ahead of datagrams sent after it.
Datagram sockets are implemented in the AF_INET domain by UDP/IP connections and provide anunsequenced, unreliable service. However, they are relatively inexpensive in terms of resources, since network connections need not be maintained. They’re fast because there is no associated connection setup time. UDP stands for User Datagram Protocol. Datagrams are useful for “single-shot” inquiries to information services, for providing regular status information, or for performing low-priority logging. They have the advantage that the death of a server doesn’t necessarily require a client restart. Because datagram-based servers usually retain no connection
information, they can be stopped and restarted without unduly disturbing their clients. For now, we leave the topic of datagrams; see the “Datagrams” section near the end of this chapter for more information on datagrams.

Shared Memory

Shared memory allows two unrelated processes to access the same logical memory. Shared memory is a very efficient way of transferring data between two running processes. Although the X/Open standard doesn’t require it, it’s probable that most implementations of shared memory arrange for the memory being shared between different processes to be the same physical memory.

Shared memory provides an efficient way of sharing and passing data between multiple processes. Since it provides no synchronization facilities, we usually need to use some other mechanism to synchronize access to the shared memory. Typically, we might use shared memory to provide efficient access to large areas of memory and pass small messages to synchronize access to that memory.

Shared memory is a special range of addresses that is created by IPC for one process and appears in the address space of that process. Other processes can then “attach” the same shared memory segment into their own address space. All processes can access the memory locations just as if the memory had been allocated by malloc. If one process writes to the shared memory, the changes immediately become visible to any other process that has access to the same shared memory.

The functions for shared memory resemble those for semaphores:

#include <sys/shm.h>

void *shmat(int shm_id, const void *shm_addr, int shmflg);

int shmctl(int shm_id, int cmd, struct shmid_ds *buf);

int shmdt(const void *shm_addr);

int shmget(key_t key, size_t size, int shmflg);

As with semaphores, the include files sys/types.h and sys/ipc.h are normally also required before shm.h is included.

shmget

We create shared memory using the shmget function:

int shmget(key_t key, size_t size, int shmflg);

As with semaphores, the program provides key, which effectively names the shared memory segment, and the shmget function returns a shared memory identifier that is used in subsequent shared memory functions. There’s a special key value, IPC_PRIVATE, that creates shared memory private to the process. You wouldn’t normally use this value, and, as with semaphores, you may find the private shared memory is not actually private on many Linux systems. The second parameter, size, specifies the amount of memory required in bytes. The third parameter, shmflg, consists of nine permission flags that are used in the same way as the mode flags for creating files. Aspecial bit defined by IPC_CREAT must be bitwise ORed with the permissions to create a new shared memory segment. It’s not an error to have the IPC_CREAT flag set and pass the key of an existing shared memory segment. The IPC_CREAT flag is silently ignored if it is not required. The permission flags are very useful with shared memory because they allow a process to create shared memory that can be written by processes owned by the creator of the shared memory but only read by processes that other users have created. We can use this to provide efficient read-only access to data by placing it in shared memory without the risk of its being changed by other users. If the shared memory is successfully created, shmget returns a nonnegative integer, the shared memory identifier. On failure, it returns –1.

shmat

When we first create a shared memory segment, it’s not accessible by any process. To enable access to the shared memory, we must attach it to the address space of a process. We do this with the shmat function:

void *shmat(int shm_id, const void *shm_addr, int shmflg);

The first parameter, shm_id, is the shared memory identifier returned from shmget. The second parameter, shm_addr, is the address at which the shared memory is to be attached to the current process. This should almost always be a null pointer, which allows the system to choose the address at which the memory appears. The third parameter, shmflg, is a set of bitwise flags. The two possible values are SHM_RND, which, in conjunction with shm_addr, controls the address at which the shared memory is attached, and SHM_RDONLY, which makes the attached memory read-only. It’s very rare to need to control the address at which shared memory is attached; you should normally allow the system to choose an address for you, as doing otherwise will make the application highly hardware-dependent. If the shmat call is successful, it returns a pointer to the first byte of shared memory. On failure –1 is returned.

The shared memory will have read or write access depending on the owner (the creator of the shared memory), the permissions, and the owner of the current process. Permissions on shared memory are similar to the permissions on files. An exception to this rule arises if shmflg & SHM_RDONLY is true. Then the shared memory won’t be writable, even if permissions would have allowed write access.

shmdt

The shmdt function detaches the shared memory from the current process. It takes a pointer to the address returned by shmat. On success, it returns 0, on error –1. Note that detaching the shared memory doesn’t delete it; it just makes that memory unavailable to the current process.

shmctl

The control functions for shared memory are (thankfully) somewhat simpler than the more complex ones for semaphores:

int shmctl(int shm_id, int command, struct shmid_ds *buf);

The shmid_ds structure has at least the following members:

struct shmid_ds {

uid_t shm_perm.uid;

uid_t shm_perm.gid;

mode_t shm_perm.mode;

}

The first parameter, shm_id, is the identifier returned from shmget.

The first parameter, shm_id, is the identifier returned from shmget. The second parameter, command, is the action to take. It can take three values:

Command Description

IPC_STAT Sets the data in the shmid_ds structure to reflect the values associated with the shared memory.

IPC_SET Sets the values associated with the shared memory to those provided in the shmid_ds data structure, if the process has permission to do so.

IPC_RMID Deletes the shared memory segment.

The third parameter, buf, is a pointer to structure containing the modes and permissions for the shared memory.

On success, it returns 0, on failure, –1. X/Open doesn’t specify what happens if you attempt to delete a shared memory segment while it’s attached. Generally, a shared memory segment that is attached but deleted continues to function until it has been detached from the last process. However, because this behavior isn’t specified, it’s best not to rely on it.