Saturday, February 22, 2014

System Calls

In this post we will discuss mainly about what are system calls, why do we need it and how to implement it.


What is a system call ?

To understand this first we would ask ourselves what are the stuffs the OS(read kernel) needs to do ?

  • Process Management (starting, running, stopping processes)
  • File Management(creating, opening, closing, reading, writing, renaming files)
  • Memory Management (allocating, deallocating memory)
  • Other stuff (timing, scheduling, network management).
So, system call is an interface through which user space applications request the Kernel do perform the operations listed above.

An example would be , the user space requests to open a device(hardware).

In short we can say that the System call is an interface between user space processes and hardware.


Why do we need system call?



  1. It provides an abstraction to the user space process. Eg. open call for user means just open the device, the user doesn't need to care about intricacy of the call.
  2. It maintains the system security and stability  as the kernel first checks the authenticity of the call before requesting it a service.
  3. It helps in virtualization of various processes i.e various processes can use it independently.

System call interface and C library.

The system call interface in Linux, as with most Unix systems, is provided in part by the C library.

We will see How System call works using a example of printf() call in userspace.



Syscalls

  •     System calls (syscalls in Linux) are accessed via function calls. System calls need inputs and also provide a return value (long) signifies success or error.( 0 generally means success).
  •        System calls have a defined behavior. 
For example, the system call getpid() is defined to return an integer that is the current process's PID. 
The implementation of this syscall in the kernel is very simple:
asmlinkage long sys_getpid(void)
{ return current->tgid;
}

  • Some important observations from this-
  1. A convention in which a system call is appended with sys in kernel space.
  2. asmlinkage modifier -tells the compiler that the function should not expect to find any of its arguments in registers (a common optimization), but only on the CPU's stack.
  •      In Linux, each system call is assigned a syscall number. This is a unique number that is used to reference a specific system call.
  •         When the syscall number is assigned, it cannot changed or be recycled.
  •        System calls in Linux are faster than in many other operating systems. (such as fast context switch times.
  •        The kernel keeps track of all the registered system calls in table sys_call_table which is defined in enTRy.S( assembler file) in arch/arch-name/kernel/


System Call Handler:-



  •        Since the system call code lies in kernel side, so to execute it we must switch the processor to kernel mode when system call is executed.
  •        This is done by issuing a software interrupt.
  •        In this mechanism an exception is raised and the Kernel switches to kernel mode and execute the system call handler.
  •        The defined software interrupt on x86 is the int $0x80 instruction in ARM the address is 0x08 offset from start of exception vector base(0X00000000, or 0xFFFF0000)
  •        It triggers a switch to kernel mode and the execution of exception vector 128, which is the system call handler. 
  •        The system call handler  function is system_call()
  •        It is architecture dependent and typically implemented in assembly in entry.S
  •        User space first enters the system call number in eax register(X86) and causes the trap.
  •        The kernel reads the value of the eax register and calls the appropriate system call handler.
  •        The system_call() function checks the validity of the given system call number by comparing it to NR_syscalls
  •          If it is larger than or equal to NR_syscalls, the function returns -ENOSYS. Otherwise, the specified system call is invoked:
         call *sys_call_table(,%eax,4)
  •            Because each element in the system call table is 32 bits (four bytes), the kernel multiplies the given system call number by four to arrive at its location in the system call table


  •      Now, the system call is called with some parameters, generally upto 5 parameters, we store the parameters values in registers ebx, ecx, edx, esi, and edi.
  •        In some unique cases when 6 or more parameters are passed then a single register is used which stores the pointer to the user space where all the parameters are stored.
  •         Not only this, even the return value is stored in the the register( eax in case of X86).



How to implement system calls?

Adding a system call is an easy task. But it is the implementation that has to be done carefully.

Now we will see what are the steps used to implement a system call.



   First we must define its purpose. What is the use of this system call? The syscall should have exactly one purpose.
Next, we must define system call's arguments, return value, and error codes.
The system call should have a clean and simple interface with the smallest number of arguments possible.

  •         Final Steps in Binding a System Call
  1.         First, add an entry to the end of the system call table.
  2.          For each architecture supported, the syscall number needs to be defined in <asm/unistd.h>.
  3.         The syscall needs to be compiled into the kernel image

How system call verifies parameters(arguments)?
  •      System calls must make sure all of their parameters are valid and legal.  Such as access permission.
  •      System calls must carefully verify all their parameters to ensure that they are valid and legal. 
  •       The system call runs in kernel-space, and if the user is able to pass invalid input into the kernel without restraint, the system's security and stability can suffer, in short the kernel can be hacked!!
  •       For example, for file I/O syscalls, the syscall must check whether the file descriptor is valid. Process-related functions must check whether the provided PID is valid. Every parameter must be checked to ensure it is not just valid and legal, but correct.
  •       One of the most important checks is the validity of any pointers that the user provides. Imagine if a process could pass any pointer into the kernel, unchecked, with warts and all, even passing a pointer for which it did not have read access! Processes could then trick the kernel into copying data for which they did not have access permission, such as data belonging to another process. Before following a pointer into user-space, the system must ensure that

  1.        The pointer points to a region of memory in user-space. Processes must not be able to trick the kernel into reading data in kernel-space on their behalf.
  2.       The pointer points to a region of memory in the process's address space. The process must not be able to trick the kernel into reading someone else's data.
  3.       If reading, the memory is marked readable. If writing, the memory is marked writable. The process must not be able to bypass memory access restrictions

  •       Two methods for performing the requisite checks and the desired copy to and from user-space:
  1.      For writing into user-space, the method copy_to_user(destination memory address , source pointer , size of the data to copy ) is provided.
  2.      For reading from user-space, the method copy_from_user(destination memory address , source pointer, the number from the second parameter reading into the first parameter) is used.
  •     Both of these functions return the number of bytes they failed to copy on error. On success, they return zero. It is standard for the syscall to return -EFAULT in the case of such an error.  
  •         check is for valid permission. A call to capable() with a valid capabilities flag returns nonzero if the caller holds the specified capability and zero otherwise. For example, capable(CAP_SYS_NICE) checks whether the caller has the ability to modify nice values of other processes.







4 comments:

  1. May i know what is the software interrupt instruction for jump from user space to kernel space in ARM architecture ? We are using INT $0x80 in x86 processor.

    ReplyDelete
  2. Britain's largest home phone provider, British Telecom is set to increase its daytime free business calls rate to 5.25p per minute in October 2009, according to a new report from consumerchoices.co.uk. In addition, the cost of a call set-up fee is also set to increase to 9.05p. While this could be viewed as a cynical move to encourage phone users to subscribe to flat-rate packages (such as BT's Anytime Plan), there appears to be growing number of consumers who are getting wise to the idea that using VoIP technology can save them lot of money - after all cheap phone calls are an important factor to the lives of many domestic and business 

    ReplyDelete
  3. Hi,

    Your sidebar article linked to .in domain of your blog and going nowhere.
    All articles under Study Topics and Linux Kernel Interview questions & answers,


    packetflows.com


    ReplyDelete