USB Enumeration

The process of identifying the device and setting a unique address is referred as bus enumeration. The enumeration process is handled by the system software on the host and the USB logical layer on the device side. The enumeration process starts when a device is attached to the host and the device gets the power. The USB specification includes simple steps for enumeration.

  1. The USB device is attached to the host, which receives an event indicating a change in the pipe’s status. The USB device is in the powered state, and the port it is attached to is disabled.
  2. The host queries about the change in the bus.
  3. Once the host determines that a new device is attached, it waits for at least 100ms for the device to become stable after power, after which it enables and resets the port.
  4. After a successful reset, the USB device is in a default state and can draw power to a range of 100 mA from VBUS pin.
  5. Once the device is in a default state, the host assigns a unique address to the USB device, which moves the device to an address state.
  6. The host starts communicating with the USB device in the default control pipe and reads the device descriptor.
  7. Subsequently, the host reads the device configuration information
  8. The host selects the configuration, which moves the device to a configured state and makes it ready for use.

Tasklet v/s Bottom Halves v/s Softirq

I spent good time reading the Linux Kernel things this morning. Here is what I found which explains the difference between bottom halves, tasklets and softirqs.

Softirqs and tasklets replaced bottom halves, because bottom halves were a large bottle neck on SMP systems. If a bottom half was running on one CPU no other bottom halves could run on any other CPU. It’s obvious how these would not scale. So here what I understand.
Softirqs and tasklets replaced bottom halves. The difference between softirqs and tasklets, is that a softirq is guaranteed to run on the CPU it was scheduled on, where as tasklets don’t have that guarantee. Also the same tasklet can not run on two separate CPUS at the same time, where as a softirq can. Don’t confuse the tasklet restriction with that of the bottom halves. Two different tasklets can run on two different CPUs, just not the same one.
 I can't argue why we have tasklets (I’m trying to get rid of them ( if you have idea please share with me ) but I'll give the best example of why we have softirqs. That’s the networking code. Say you get a network packet. But to process that packet, it takes a lot of work. If you do that in the interrupt handler, no other interrupts can happen on that IRQ line. That would cause a large latency to incoming interrupts and perhaps you'll overflow the buffers and drop packets. So the interrupt handler only moves the data off to a network receive queue, and returns. But this packet still needs to be processed right away. Before anything else. So it goes off to a softirq for processing. Now you still allow for interrupts to come in. Perhaps the network interrupt comes in again on another CPU. The other CPU can start processing that packet with a softirq on that CPU, even before the first packet was done processing.
See how this can scale well? But the same tasklet can't run on two different CPUs, so it doesn't have this advantage. In fact if a tasklet is scheduled to run on another CPU but is waiting for other tasklets to finish, and you try to schedule the tasklet on a CPU that’s not currently processing tasklets, it will notice that the tasklet is already scheduled to run and not do anything. So tasklets are not so reliable when it comes to latencies. Hence, why I’m working on getting rid of them, since I don't believe they accomplish what people think they do.
 I will discuss more about Bottom halves some other time, then I will discuss about workqueue too. Till then you can explore things from Linux Kernel Development. 

USB: A brief tutorial

If you need your embedded application to talk to a PC then increasingly the way to go is USB. Partly this is because of the performance it can supply but also for the very practical reason that many PCs and most portables no longer have parallel or serial ports. But unlike the good old parallel or serial cables these interfaces are far from simple to implement, debug or program. Here is a quick summary of some terms you might encounter during USB Driver.

The Standards

I will not go into the history, but if we take a brief look on USB, the basic standards are as  below :

USB 1.1

The original USB standard provides a fast Master/Slave interface using a tiered star topology supporting up to 127 devices with up to 6 tiers (hubs). A PC is normally the master or Host and each of the peripherals linked to it act as slaves or Devices.  One of the aims of the design was to minimise the complexity of the Devices by doing as much in the Host as possible. Data transfer rates are defined in the specification as - Low Speed 1.5 Mbits/sec and Full Speed 12 Mbits/sec and the maximum length of each cable section is 5 metres. The USB specification allows each device to take up to 500mA of power (limited to 100mA during startup).

USB 2.0

There are some minor variations from USB 1.1 within the USB 2.0 specification and since USB 2.0s inception most interfaces have been designed to conform to the USB 2.0 standard. The 2.0 specification is a superset of 1.1 and the major functional difference which is the addition of a High Speed 480 Mbits/sec data transfer mode.  Be warned, however, that the Spec does allow a product (eg an interface chip) to say that it is "USB 2.0 compatible" without necessarily implying that it  actually implements the High Speed mode.

USB 3.0

Released in 2008 with motherboards and products appearing in 2010.  It has been designed to be backward compatible with 2.0 with a socket that will fit most combinations of legacy plugs and as well as supplying more power (900mA) it also adds a Super Speed >4.8 Gbits/sec data transfer mode so should be able to deliver 400 MBytes/sec after protocol overheads. It is becoming popular for use with external hard disks and other high speed applications.

Wireless USB

A short range  high speed radio communications protocol ( 480 Mbit/s up to 3 m and 110 Mbit/s up to 10 ) which seems to aim to compete with Bluetooth.

Signals, Throughput & Protocol

USB 1 & 2 cables use 4 lines - Power, Ground and a twisted pair differential +/- data lines using NRZI encoding. The USB connectors are designed so that power and ground are applied before the signal lines are connected. When the Host powers up it polls each of the Slave devices in turn (using the reserved address 0), it assigns each one a unique address and finds out from each device what its Speed is and the and type of data transfer it wishes to perform. This process is called enumeration and it also takes place whenever a device is plugged into an active network.  The connectors design (contact to power then signal) along with the process of enumeration and a lot of host software allows devices to be described as  "Plug-and-Play".
 A typical transaction will consist of a number of packets  - a token indicating the type of data that the Host is sending or requiring, the data and in some cases an acknowledgement. Each packet is preceded by a sync field and followed by an end of packet marker.
These transactions are used to provide four basic data transfer mechanisms:
Control - used by the Host to send commands or query parameters. Packet lengths are 8 bytes for Low speed,  8-64 for Full and 64 for High Speed devices.
Interrupt - badly named it is in fact a polled message from the Host which has to request specific data of the Device. Used by Devices which will be sending small amounts of data (e.g. mice or keyboards).
Bulk - Used by Devices that send or receive data in quantity such as a printer. Variable length blocks of data are sent or requested by the Host (max length is 64-byte- full speed, 512 -high speed), are verified with a CRC and their receipt is acknowledged. This mechanism is not used by time critical peripherals as it takes whatever bandwidth is left by the other mechanisms.
Isochronous - Used for devices that stream data in real time without any error recovery such as audio channels. For them losing some data occasionally is better than the glitch resulting from a re-transmit. Packet sizes can be up to 1024 bytes.

As devices are enumerated, the host keeps track of the total bandwidth that the isochronous and interrupt devices request. They can consume up to 90 percent of the bandwidth that is available. After 90 percent is used up, the host denies access to any other isochronous or interrupt devices. Control packets and Bulk Transfers are guaranteed at least 10 percent with Control taking priority.
The USB Host does this by dividing the available bandwidth into frames containing 1,500 bytes, with a new frame starting every millisecond. During a frame, isochronous and interrupt transfers get a slot so they are guaranteed the bandwidth they need. Control and then Bulk transfers use whatever time is left.

Throughput for 2.0

Maximum Theoretical transfer rates
Transfer TypeLow-speedFull-Speed (all Kbytes/sec)High-speed

But note that
 in practice the actual throughput achieved also depends on the host's performance which may only allow 70% of the highest values to be achieved. Although  if we have one highest priority application which is using USB, then we can achieve the 90% or more of the highest value.


When the USB Device is enumerated as well as getting an address from the Host it presents the Host with a good deal of information about itself in the form of a series of descriptors.
1.The Device Descriptor tells the Host what the Vendor and Product ID are (which the Host may use to load a driver)
        1.1The Configuration Descriptors (there may be a number of these - the Host chooses one) offers a power consumption value
                    1.1.1 and a number of Interface Descriptors (a device may be a printer/scanner/fax with separate descriptors)
            of these will define a number of Endpoints which are the sources and destinations for data transfers
                    Endpoint Descriptors provide the following detail  ---
                                   type (Bulk, Interrupt, Isochronous), direction, packet sizes, bandwidth requirement and repeat interval.
There can be  4 Endpoints for Low speed devices 16 in and 16 out for Full and High speed devices- these include the mandatory Endpoint 0. 
But what is an Endpoint ?  Wikipedia defines it as "the the identifier for the entity on one end of a transport layer connection" in practice it may be a register in the device or a buffer into which data ("addressed" to the Endpoint) is going to be transferred. Those familiar with TCP/IP nomenclature will see that it resembles what is referred to in that software as a Socket.
Logically each USB peripheral sets up a one to one link between endpoints on the device and applications software.  The driver software and the interfaces at each end translate between a software call on the host to a peripheral endpoint and the required message details.  The host's application software simply moves data to or from the endpoint on the peripheral without needing to know the connection details.
The Host software does this via pathways known as Pipes between it and the Endpoints  There are two types of pipe defined within the specification...
Message pipes which are bidirectional, for which the USB standard defines the format and which can only be used by Control Transfers
Stream pipes which can be In or Out and which can be used by Interrupt, Bulk and Isochronous Transfers. The USB standard does not determine the layout of the data in these streams (of course the messages passing the data across the bus are structured and contain endpoint addresses but these fields are stripped out before delivery to the pipe).
All the above are specified by the USB protocol so that devices will operate in a uniform manner both at the Bus level but also at the next level up the Transport Level.
Hopefully your USB interface chip will handle all the detailed operations that make up the actions we have discussed above however you will need to know these terms in order to select the appropriate Transfer mechanism, set your Device up correctly and to use the Pipe to transfer your data - not to mention having an idea of what is going on when you get an error message.

Devices, Hosts and On-The-Go

USB Device

Most early on-chip USB interfaces and USB interface chips provided support allowing your embedded system to connect to the USB as a Device.

USB Host

The Master for the transaction - may be a PC but your application will have to be Master if you want to plug a USB memory stick into it and read the files off the stick. Increasingly interfaces capable of being Host are being incorporated into Microcontroller chips.

On-The-Go   OTG

The initial USB standard assumed the presence of a Host  - the PC.  However it has become important to connect units which may, under some circumstances be a Device but may be required under others to be a Host.  For example a Printer is normally happy to be a Device when connected to a PC but may need to be connected to a Camera in the absence of a PC when the Printer will need to be the Host.  The OTG protocol provides an arbitration mechanism that allows units to negotiate who is going to be Host. OTG introduces an additional data pin ID that determines the initial status for the Host/Device negotiation. OTG products may also be known as Dual Role Devices.

Linux Commands

An A-Z Index of the Bash command line for Linux. For more detail of this command type man and command name on terminal. 

   alias    Create an alias •
  apropos  Search Help manual pages (man -k)
  apt-get  Search for and install software packages (Debian/Ubuntu)
  aptitude Search for and install software packages (Debian/Ubuntu)
  aspell   Spell Checker
  awk      Find and Replace text, database sort/validate/index

  basename Strip directory and suffix from filenames
  bash     GNU Bourne-Again SHell 
  bc       Arbitrary precision calculator language 
  bg       Send to background
  break    Exit from a loop •
  builtin  Run a shell builtin
  bzip2    Compress or decompress named file(s)

  cal      Display a calendar
  case     Conditionally perform a command
  cat      Concatenate and print (display) the content of files
  cd       Change Directory
  cfdisk   Partition table manipulator for Linux
  chgrp    Change group ownership
  chmod    Change access permissions
  chown    Change file owner and group
  chroot   Run a command with a different root directory
  chkconfig System services (runlevel)
  cksum    Print CRC checksum and byte counts
  clear    Clear terminal screen
  cmp      Compare two files
  comm     Compare two sorted files line by line
  command  Run a command - ignoring shell functions •
  continue Resume the next iteration of a loop •
  cp       Copy one or more files to another location
  cron     Daemon to execute scheduled commands
  crontab  Schedule a command to run at a later time
  csplit   Split a file into context-determined pieces
  cut      Divide a file into several parts

  date     Display or change the date & time
  dc       Desk Calculator
  dd       Convert and copy a file, write disk headers, boot records
  ddrescue Data recovery tool
  declare  Declare variables and give them attributes •
  df       Display free disk space
  diff     Display the differences between two files
  diff3    Show differences among three files
  dig      DNS lookup
  dir      Briefly list directory contents
  dircolors Colour setup for `ls'
  dirname  Convert a full pathname to just a path
  dirs     Display list of remembered directories
  dmesg    Print kernel & driver messages 
  du       Estimate file space usage

  echo     Display message on screen •
  egrep    Search file(s) for lines that match an extended expression
  eject    Eject removable media
  enable   Enable and disable builtin shell commands •
  env      Environment variables
  ethtool  Ethernet card settings
  eval     Evaluate several commands/arguments
  exec     Execute a command
  exit     Exit the shell
  expect   Automate arbitrary applications accessed over a terminal
  expand   Convert tabs to spaces
  export   Set an environment variable
  expr     Evaluate expressions

  false    Do nothing, unsuccessfully
  fdformat Low-level format a floppy disk
  fdisk    Partition table manipulator for Linux
  fg       Send job to foreground 
  fgrep    Search file(s) for lines that match a fixed string
  file     Determine file type
  find     Search for files that meet a desired criteria
  fmt      Reformat paragraph text
  fold     Wrap text to fit a specified width.
  for      Expand words, and execute commands
  format   Format disks or tapes
  free     Display memory usage
  fsck     File system consistency check and repair
  ftp      File Transfer Protocol
  function Define Function Macros
  fuser    Identify/kill the process that is accessing a file

  gawk     Find and Replace text within file(s)
  getopts  Parse positional parameters
  grep     Search file(s) for lines that match a given pattern
  groupadd Add a user security group
  groupdel Delete a group
  groupmod Modify a group
  groups   Print group names a user is in
  gzip     Compress or decompress named file(s)

  hash     Remember the full pathname of a name argument
  head     Output the first part of file(s)
  help     Display help for a built-in command •
  history  Command History
  hostname Print or set system name

  iconv    Convert the character set of a file
  id       Print user and group id's
  if       Conditionally perform a command
  ifconfig Configure a network interface
  ifdown   Stop a network interface 
  ifup     Start a network interface up
  import   Capture an X server screen and save the image to file
  install  Copy files and set attributes

  jobs     List active jobs •
  join     Join lines on a common field

  kill     Stop a process from running
  killall  Kill processes by name

  less     Display output one screen at a time
  let      Perform arithmetic on shell variables •
  ln       Make links between files
  local    Create variables •
  locate   Find files
  logname  Print current login name
  logout   Exit a login shell •
  look     Display lines beginning with a given string
  lpc      Line printer control program
  lpr      Off line print
  lprint   Print a file
  lprintd  Abort a print job
  lprintq  List the print queue
  lprm     Remove jobs from the print queue
  ls       List information about file(s)
  lsof     List open files

  make     Recompile a group of programs
  man      Help manual
  mkdir    Create new folder(s)
  mkfifo   Make FIFOs (named pipes)
  mkisofs  Create an hybrid ISO9660/JOLIET/HFS filesystem
  mknod    Make block or character special files
  more     Display output one screen at a time
  mount    Mount a file system
  mtools   Manipulate MS-DOS files
  mtr      Network diagnostics (traceroute/ping)
  mv       Move or rename files or directories
  mmv      Mass Move and rename (files)

  netstat  Networking information
  nice     Set the priority of a command or job
  nl       Number lines and write files
  nohup    Run a command immune to hangups
  notify-send  Send desktop notifications
  nslookup Query Internet name servers interactively

  open     Open a file in its default application
  op       Operator access 

  passwd   Modify a user password
  paste    Merge lines of files
  pathchk  Check file name portability
  ping     Test a network connection
  pkill    Stop processes from running
  popd     Restore the previous value of the current directory
  pr       Prepare files for printing
  printcap Printer capability database
  printenv Print environment variables
  printf   Format and print data •
  ps       Process status
  pushd    Save and then change the current directory
  pwd      Print Working Directory

  quota    Display disk usage and limits
  quotacheck Scan a file system for disk usage
  quotactl Set disk quotas

  ram      ram disk device
  rcp      Copy files between two machines
  read     Read a line from standard input •
  readarray Read from stdin into an array variable •
  readonly Mark variables/functions as readonly
  reboot   Reboot the system
  rename   Rename files
  renice   Alter priority of running processes 
  remsync  Synchronize remote files via email
  return   Exit a shell function
  rev      Reverse lines of a file
  rm       Remove files
  rmdir    Remove folder(s)
  rsync    Remote file copy (Synchronize file trees)

  screen   Multiplex terminal, run remote shells via ssh
  scp      Secure copy (remote file copy)
  sdiff    Merge two files interactively
  sed      Stream Editor
  select   Accept keyboard input
  seq      Print numeric sequences
  set      Manipulate shell variables and functions
  sftp     Secure File Transfer Program
  shift    Shift positional parameters
  shopt    Shell Options
  shutdown Shutdown or restart linux
  sleep    Delay for a specified time
  slocate  Find files
  sort     Sort text files
  source   Run commands from a file `.'
  split    Split a file into fixed-size pieces
  ssh      Secure Shell client (remote login program)
  strace   Trace system calls and signals
  su       Substitute user identity
  sudo     Execute a command as another user
  sum      Print a checksum for a file
  suspend  Suspend execution of this shell •
  symlink  Make a new name for a file
  sync     Synchronize data on disk with memory

  tail     Output the last part of file
  tar      Tape ARchiver
  tee      Redirect output to multiple files
  test     Evaluate a conditional expression
  time     Measure Program running time
  times    User and system times
  touch    Change file timestamps
  top      List processes running on the system
  traceroute Trace Route to Host
  trap     Run a command when a signal is set(bourne)
  tr       Translate, squeeze, and/or delete characters
  true     Do nothing, successfully
  tsort    Topological sort
  tty      Print filename of terminal on stdin
  type     Describe a command •

  ulimit   Limit user resources •
  umask    Users file creation mask
  umount   Unmount a device
  unalias  Remove an alias •
  uname    Print system information
  unexpand Convert spaces to tabs
  uniq     Uniquify files
  units    Convert units from one scale to another
  unset    Remove variable or function names
  unshar   Unpack shell archive scripts
  until    Execute commands (until error)
  uptime   Show uptime
  useradd  Create new user account
  userdel  Delete a user account
  usermod  Modify user account
  users    List users currently logged in
  uuencode Encode a binary file 
  uudecode Decode a file created by uuencode

  v        Verbosely list directory contents (`ls -l -b')
  vdir     Verbosely list directory contents (`ls -l -b')
  vi       Text Editor
  vmstat   Report virtual memory statistics

  wait     Wait for a process to complete •
  watch    Execute/display a program periodically
  wc       Print byte, word, and line counts
  whereis  Search the user's $path, man pages and source files for a program
  which    Search the user's $path for a program file
  while    Execute commands
  who      Print all usernames currently logged in
  whoami   Print the current user id and name (`id -un')
  wget     Retrieve web pages or files via HTTP, HTTPS or FTP
  write    Send a message to another user 

  xargs    Execute utility, passing constructed argument list(s)
  xdg-open Open a file or URL in the user's preferred application
  yes      Print a string until interrupted

Redirection in Linux

What are standard input and standard output?

Most Linux commands read input, such as a file or another attribute for the command, and write output. By default, input is being given with the keyboard, and output is displayed on your screen. Your keyboard is your standard input (stdin) device, and the screen or a particular terminal window is the standard output (stdout) device.
However, since Linux is a flexible system, these default settings don't necessarily have to be applied. The standard output, for example, on a heavily monitored server in a large environment may be a printer.

The redirection operators

Output redirection with > and |

Sometimes you will want to put output of a command in a file, or you may want to issue another command on the output of one command. This is known as redirecting output. Redirection is done using either the ">"(greater-than symbol), or using the "|" (pipe) operator which sends the standard output of one command to another command as standard input.
As we saw before, the cat command concatenates files and puts them all together to the standard output. By redirecting this output to a file, this file name will be created - or overwritten if it already exists, so take care.

vineet:~> cat test1
Hello to world

vineet:~> cat test2
Bye to world

vineet:~> cat test1 test2 > test3

vineet:~> cat test3
Hello to world
Bye to world

Don't overwrite!
Be careful not to overwrite existing (important) files when redirecting output. Many shells, including Bash, have a built-in feature to protect you from that risk: noclobber. See the Info pages for more information. InBash, you would want to add the set -o noclobber command to your .bashrc configuration file in order to prevent accidental overwriting of files.
Redirecting "nothing" to an existing file is equal to emptying the file:

vineet:~> ls -l list
-rw-rw-r--    1 root   root     117 Apr  2 18:09 list

vineet:~> > list

nancy:~> ls -l list
-rw-rw-r--    1 root   root       0 Apr  4 12:01 list
This process is called truncating.
The same redirection to an nonexistent file will create a new empty file with the given name:

vineet:~> ls -l newlist
ls: newlist: No such file or directory

vineet:~> > newlist

vinnet:~> ls -l newlist
-rw-rw-r--  1 vineet   vineet     0 Apr  4 12:05 newlist
Some examples using piping of commands:
To find a word within some text, display all lines matching "pattern1", and exclude lines also matching "pattern2" from being displayed:
grep pattern1 file | grep -v pattern2
To display output of a directory listing one page at a time:
ls -la | less
To find a file in a directory:
ls -l | grep part_of_file_name

 Combining redirections

The following example combines input and output redirection. The file text.txt is first checked for spelling mistakes, and the output is redirected to an error log file:
spell < text.txt > error.log
The following command lists all commands that you can issue to examine another file when using less:

vineet:~> less --help | grep -i examine
  :e [file]      Examine a new file.
  :n          *  Examine the (N-th) next file from the command line.
  :p          *  Examine the (N-th) previous file from the command line.
  :x          *  Examine the first (or N-th) file from the command line.
The -i option is used for case-insensitive searches - remember that UNIX systems are very case-sensitive.
If you want to save output of this command for future reference, redirect the output to a file:

vineet:~> less --help | grep -i examine > examine-files-in-less

vineet:~> cat examine-files-in-less
  :e [file]      Examine a new file.
  :n          *  Examine the (N-th) next file from the command line.
  :p          *  Examine the (N-th) previous file from the command line.
  :x          *  Examine the first (or N-th) file from the command line.
Output of one command can be piped into another command virtually as many times as you want, just as long as these commands would normally read input from standard input and write output to the standard output. Sometimes they don't, but then there may be special options that instruct these commands to behave according to the standard definitions; so read the documentation (man and Info pages) of the commands you use if you should encounter errors.
Again, make sure you don't use names of existing files that you still need. Redirecting output to existing files will replace the content of those files.

 The >> operator

Instead of overwriting file data, you can also append text to an existing file using two subsequent greater-than signs:

vineet:~> cat wishlist
more money
less work

vineet:~> date >> wishlist

vineet:~> cat wishlist
more money
less work
Tue Oct  9 17:06:55 IST 2012

The date command would normally put the last line on the screen; now it is appended to the file wishlist.

 Use of file descriptors

There are three types of I/O, which each have their own identifier, called a file descriptor:

  • standard input: 0
  • standard output: 1
  • standard error: 2
In the following descriptions, if the file descriptor number is omitted, and the first character of the redirection operator is <, the redirection refers to the standard input (file descriptor 0). If the first character of the redirection operator is >, the redirection refers to the standard output (file descriptor 1).
Some practical examples will make this more clear:
ls > dirlist 2>&1
will direct both standard output and standard error to the file dirlist, while the command
ls 2>&1 > dirlist
will only direct standard output to dirlist. This can be a useful option for programmers.
. The example below demonstrates this:

[vineet@vineet]$ ls 2> tmp

[vineet@vineet]$ ls -l tmp
-rw-rw-r--  1 vineet vineet 0 Sept  7 12:58 tmp

[vineet@vineet]$ ls 2 > tmp
ls: 2: No such file or directory
The first command that vineet executes is correct (eventhough no errors are generated and thus the file to which standard error is redirected is empty). The second command expects that 2 is a file name, which does not exist in this case, so an error is displayed.
All these features are explained in detail in the Bash Info pages.


  Writing to output and files simultaneously

You can use the tee command to copy input to standard output and one or more output files in one move. Using the -a option to tee results in appending input to the file(s). This command is useful if you want to both see and save output. The > and >> operators do not allow to perform both actions simultaneously.
This tool is usually called on through a pipe (|), as demonstrated in the example below:

[vineet@vineet ~]$ date | tee file1 file2
Tue Oct  9 17:12:17 IST 2012
[vineet@vineet ~]$ cat file1
Tue Oct  9 17:12:17 IST 2012
[vineet@vineet ~]$ cat file2
Tue Oct  9 17:12:17 IST 2012
[vineet@vineet ~]$ uptime | tee -a file2 17:13:41 up 36 min, 2 users, load average: 0.27, 0.12, 0.14 [vineet@vineet ~]$ cat file2
Tue Oct  9 17:12:17 IST 2012
17:13:41 up 36 min, 2 users, load average: 0.27, 0.12, 0.14

Linux Booting Process

Press the power button on your system, and after few moments you see the Linux login prompt.
Have you ever wondered what happens behind the scenes from the time you press the power button until the Linux login prompt appears?
The following are the 6 high level stages of a typical Linux boot process.


  • BIOS stands for Basic Input/Output System
  • Performs some system integrity checks
  • Searches, loads, and executes the boot loader program.
  • It looks for boot loader in floppy, cd-rom, or hard drive. You can press a key (typically F12 of F2, but it depends on your system) during the BIOS startup to change the boot sequence.
  • Once the boot loader program is detected and loaded into the memory, BIOS gives the control to it.
  • So, in simple terms BIOS loads and executes the MBR boot loader.

2. MBR

  • MBR stands for Master Boot Record.
  • It is located in the 1st sector of the bootable disk. Typically /dev/hda, or /dev/sda
  • MBR is less than 512 bytes in size. This has three components 1) primary boot loader info in 1st 446 bytes 2) partition table info in next 64 bytes 3) mbr validation check in last 2 bytes.
  • It contains information about GRUB (or LILO in old systems).
  • So, in simple terms MBR loads and executes the GRUB boot loader.


  • GRUB stands for Grand Unified Bootloader.
  • If you have multiple kernel images installed on your system, you can choose which one to be executed.
  • GRUB displays a splash screen, waits for few seconds, if you don’t enter anything, it loads the default kernel image as specified in the grub configuration file.
  • GRUB has the knowledge of the filesystem (the older Linux loader LILO didn’t understand filesystem).
  • Grub configuration file is /boot/grub/grub.conf (/etc/grub.conf is a link to this).
4. Kernel
  • Mounts the root file system as specified in the “root=” in grub.conf
  • Kernel executes the /sbin/init program
  • Since init was the 1st program to be executed by Linux Kernel, it has the process id (PID) of 1. Do a ‘ps -ef | grep init’ and check the pid.
  • initrd stands for Initial RAM Disk.
  • initrd is used by kernel as temporary root file system until kernel is booted and the real root file system is mounted. It also contains necessary drivers compiled inside, which helps it to access the hard drive partitions, and other hardware.

5. Init

  • Looks at the /etc/inittab file to decide the Linux run level.
  • Following are the available run levels
    • 0 – halt
    • 1 – Single user mode
    • 2 – Multiuser, without NFS
    • 3 – Full multiuser mode
    • 4 – unused
    • 5 – X11
    • 6 – reboot
  • Init identifies the default initlevel from /etc/inittab and uses that to load all appropriate program.
  • Execute ‘grep initdefault /etc/inittab’ on your system to identify the default run level
  • If you want to get into trouble, you can set the default run level to 0 or 6. Since you know what 0 and 6 means, probably you might not do that.
  • Typically you would set the default run level to either 3 or 5.

6. Runlevel programs

  • When the Linux system is booting up, you might see various services getting started. For example, it might say “starting sendmail …. OK”. Those are the runlevel programs, executed from the run level directory as defined by your run level.
  • Depending on your default init level setting, the system will execute the programs from one of the following directories.
    • Run level 0 – /etc/rc.d/rc0.d/
    • Run level 1 – /etc/rc.d/rc1.d/
    • Run level 2 – /etc/rc.d/rc2.d/
    • Run level 3 – /etc/rc.d/rc3.d/
    • Run level 4 – /etc/rc.d/rc4.d/
    • Run level 5 – /etc/rc.d/rc5.d/
    • Run level 6 – /etc/rc.d/rc6.d/
  • Please note that there are also symbolic links available for these directory under /etc directly. So, /etc/rc0.d is linked to /etc/rc.d/rc0.d.
  • Under the /etc/rc.d/rc*.d/ directories, you would see programs that start with S and K.
  • Programs starts with S are used during startup. S for startup.
  • Programs starts with K are used during shutdown. K for kill.
  • There are numbers right next to S and K in the program names. Those are the sequence number in which the programs should be started or killed.
  • For example, S12syslog is to start the syslog deamon, which has the sequence number of 12. S80sendmail is to start the sendmail daemon, which has the sequence number of 80. So, syslog program will be started before sendmail.

What is Zombie Process and Orphan Process?

Zombie Process

On Unix operating systems, a zombie process or defunct process is a process that has completed execution but still has an entry in the process table, allowing the process that started it to read its exit status. In the term's colorful metaphor, the child process has died but has not yet been reaped. 

When a process ends, all of the memory and resources associated with it are deallocated so they can be used by other processes. However, the process's entry in the process table remains. The parent is sent a SIGCHLD signal indicating that a child has died; the handler for this signal will typically execute the wait system call, which reads the exit status and removes the zombie. The zombie's process ID and entry in the process table can then be reused.  However, if a parent ignores the SIGCHLD, the zombie will be left in the process table. In some situations this may be desirable, for example if the parent creates another child process it ensures that it will not be allocated the same process ID. 

A zombie process is not the same as an orphan process. Orphan processes don't become zombie processes; instead, they are adopted by init (process ID 1), which waits on its children. 

The term zombie process derives from the common definition of zombie an undead person. 

Zombies can be identified in the output from the Unix PS command by the presence of a "Z" in the STAT column. Zombies that exist for more than a short period of time typically indicate a bug in the parent program. As with other leaks, the presence of a few zombies isn't worrisome in itself, but may indicate a problem that would grow serious under heavier loads. 

To remove zombies from a system, the SIGCHLD signal can be sent to the parent manually, using the kill command. If the parent process still refuses to reap the zombie, the next step would be to remove the parent process. When a process loses its parent, init becomes its new parent. Init periodically executes the wait system call to reap any zombies with init as parent. 


Orphan Process

An orphan process is a computer process whose parent process has finished or terminated. 

A process can become orphaned during remote invocation when the client process crashes after making a request of the server. 

Orphans waste server resources and can potentially leave a server in trouble. However there are several solutions to the orphan process problem: 
1. Extermination is the most commonly used technique; in this case the orphan process is killed.
2. Reincarnation is a technique in which machines periodically try to locate the parents of any remote computations; at which point orphaned processes are killed.
3. Expiration is a technique where each process is allotted a certain amount of time to finish before being killed. If need be a process may "ask" for more time to finish before the allotted time expires.

A process can also be orphaned running on the same machine as its parent process. In a UNIX-like operating system any orphaned process will be immediately adopted by the special "init" system process. This operation is called re-parenting and occurs automatically. Even though technically the process has the "init" process as its parent, it is still called an orphan process since the process which originally created it no longer exists.

Linux Directory Structure (File System Structure)

The following list provides more detailed information and gives some examples which files and subdirectories can be found in the directories:

1. / – Root

  • Every single file and directory starts from the root directory.
  • Only root user has write privilege under this directory.
  • Please note that /root is root user’s home directory, which is not same as /.

2. /bin – User Binaries

  • Contains binary executables.
  • Common linux commands you need to use in single-user modes are located under this directory.
  • Commands used by all the users of the system are located here.
  • For example: ps, ls, ping, grep, cp.

3. /sbin – System Binaries

  • Just like /bin, /sbin also contains binary executables.
  • But, the linux commands located under this directory are used typically by system aministrator, for system maintenance purpose.
  • For example: iptables, reboot, fdisk, ifconfig, swapon

4. /etc – Configuration Files

  • Contains configuration files required by all programs.
  • This also contains startup and shutdown shell scripts used to start/stop individual programs.
  • For example: /etc/resolv.conf, /etc/logrotate.conf

5. /dev – Device Files

  • Contains device files.
  • These include terminal devices, usb, or any device attached to the system.
  • For example: /dev/tty1, /dev/usbmon0

6. /proc – Process Information

  • Contains information about system process.
  • This is a pseudo filesystem contains information about running process. For example: /proc/{pid} directory contains information about the process with that particular pid.
  • This is a virtual filesystem with text information about system resources. For example: /proc/uptime

7. /var – Variable Files

  • var stands for variable files.
  • Content of the files that are expected to grow can be found under this directory.
  • This includes — system log files (/var/log); packages and database files (/var/lib); emails (/var/mail); print queues (/var/spool); lock files (/var/lock); temp files needed across reboots (/var/tmp);

8. /tmp – Temporary Files

  • Directory that contains temporary files created by system and users.
  • Files under this directory are deleted when system is rebooted.

9. /usr – User Programs

  • Contains binaries, libraries, documentation, and source-code for second level programs.
  • /usr/bin contains binary files for user programs. If you can’t find a user binary under /bin, look under /usr/bin. For example: at, awk, cc, less, scp
  • /usr/sbin contains binary files for system administrators. If you can’t find a system binary under /sbin, look under /usr/sbin. For example: atd, cron, sshd, useradd, userdel
  • /usr/lib contains libraries for /usr/bin and /usr/sbin
  • /usr/local contains users programs that you install from source. For example, when you install apache from source, it goes under /usr/local/apache2

10. /home – Home Directories

  • Home directories for all users to store their personal files.
  • For example: /home/john, /home/nikita

11. /boot – Boot Loader Files

  • Contains boot loader related files.
  • Kernel initrd, vmlinux, grub files are located under /boot
  • For example: initrd.img-2.6.32-24-generic, vmlinuz-2.6.32-24-generic

12. /lib – System Libraries

  • Contains library files that supports the binaries located under /bin and /sbin
  • Library filenames are either ld* or lib*.so.*
  • For example:,

13. /opt – Optional add-on Applications

  • opt stands for optional.
  • Contains add-on applications from individual vendors.
  • add-on applications should be installed under either /opt/ or /opt/ sub-directory.

14. /mnt – Mount Directory

  • Temporary mount directory where sysadmins can mount filesystems.

15. /media – Removable Media Devices

  • Temporary mount directory for removable devices.
  • For examples, /media/cdrom for CD-ROM; /media/floppy for floppy drives; /media/cdrecorder for CD writer

16. /srv – Service Data

  • srv stands for service.
  • Contains server specific services related data.
  • For example, /srv/cvs contains CVS related data.

Risks of using open source software

Today's software development is geared more towards building upon previous work and less about reinventing content from scratch. Resourceful software development organisations and developers use a combination of previously created code, commercial software, open source software, and their own creative content to produce the desired software product or functionality. Outsourced code can also be used, which in itself can contain any of the above combination of software.
There are many good reasons for using off-the-shelf and especially open source software, with the greatest being its ability to speed up development and drive down costs without sacrificing quality. Almost all software groups knowingly, and in many cases unknowingly, use open source software to their advantage. Code reuse is possibly the biggest accelerator of innovation, as long as open source software is adopted and managed in a controlled fashion.
In today's world of open-sourced, out-sourced, easily-searched and easily-copied software it is difficult for companies to know what is in their code. Any time a product containing software changes hands there is a need to understand its composition, pedigree, ownership, and any open source licences or obligations that restrict the rules around its use by new owners.
Given developers' focus on the technical aspects of their work and emphasis on innovation, obligations associated with use of third party components can be easily compromised. Ideally, companies track open source and third party code throughout the development lifecycle. If that is not the case then, at the very least, they should know what is in their code before engaging in a transaction that includes a software component.
Examples of transactions involving software are: a launch of a product into the market, merger & acquisition (M&A) of companies with software development operations, and technology transfer between organisations whether they are commercial, academic or public. Any company that produces software as part of a software supply chain must be aware of what is in their code base.
Impact of Code Uncertainties
Any uncertainty around software ownership or licence compliance can deter downstream users, reduce ability to create partnerships, and create litigation risk to the company and their customers. For smaller companies, Intellectual Property (IP) uncertainties can also delay or otherwise threaten closures in funding deals, affect product and company value, and negatively impact M&A activities.
IP uncertainties can affect the competitiveness of small technology companies due to indemnity demands from their clients. Technology companies need to understand the obligations associated with the software that they are acquiring. Any uncertainties around third party content in code can also stretch sales cycles. Lack of internal resources allocated to identification, tracking and maintaining open source and other third party code in a project impacts smaller companies even more.
Along with licencing issues and IP uncertainties, organisations that use open source also need to be aware of security vulnerabilities. A number of public databases, such as the US National Vulnerability Database (NVD) or Carnegie Mellon University's Computer Emergency Response Team (CERT) database, list known vulnerabilities associated with a large number of software packages. Without an accurate knowledge of what exists in the code base it is not possible to consult these databases. Aspects such as known deficiencies, vulnerabilities, known security risks, and code pedigree all assume the existence of software bill of materials. In a number of jurisdictions, another important aspect to consider before a software transaction takes place is whether the code includes encryption content or other content subject to export control – this is important to companies that do business internationally.
The benefits of open source software usage can be realised and the risks can be managed at the same time. Ideally, a company using open source software should have a process in place to ensure that open source software is properly adopted and managed throughout the development cycle. Having such a process in place allows organisations to detect any licencing or IP uncertainties at the earliest possible stage during development which reduces the time, effort, and cost associated correcting the problem later down the road. 

Process vs Threads

Both threads and processes are methods of parallelizing an application. However, processes are independent execution units that contain their own state information, use their own address spaces, and only interact with each other via interprocess communication mechanisms (generally managed by the operating system). Applications are typically divided into processes during the design phase, and a master process explicitly spawns sub-processes when it makes sense to logically separate significant application functionality. Processes, in other words, are an architectural construct.

By contrast, a thread is a coding construct that doesn't affect the architecture of an application. A single process might contains multiple threads; all threads within a process share the same state and same memory space, and can communicate with each other directly, because they share the same variables.

Threads typically are spawned for a short-term benefit that is usually visualized as a serial task, but which doesn't have to be performed in a linear manner (such as performing a complex mathematical computation using parallelism, or initializing a large matrix), and then are absorbed when no longer required. The scope of a thread is within a specific code module—which is why we can bolt-on threading without affecting the broader application.

The Kernel Source Tree

The kernel source tree is divided into a number of directories, most of which contain many more subdirectories.The directories in the root of the source tree, along with their descriptions.

Directory                                                  Description

arch                                                      Architecture-specific source
block                                                    Block I/O layer
crypto                                                  Crypto API
Documentation                                   Kernel source documentation
drivers                                                 Device drivers
firmware                                              Device firmware needed to use             
                                                             certain drivers
fs                                                          The VFS and the individual filesystems
include                                                 Kernel headers
init                                                        Kernel boot and initialization
ipc                                                        Interprocess communication code
kernel                                                  Core subsystems, such as the scheduler
lib                                                         Helper routines
mm                                                       Memory management subsystem and
                                                              the VM
net                                                       Networking subsystem
samples                                               Sample, demonstrative code
scripts                                                 Scripts used to build the kernel
security                                               Linux Security Module
sound                                                  Sound subsystem
usr                                                       Early user-space code (called initramfs)
tools                                                    Tools helpful for developing Linux
virt                                                       Virtualization infrastructure