背景
操作系统有两大功能:向用户程序提供抽象(abstraction)和管理计算机资源,而向用户程序提供抽象(abstraction)是保证用户程序和操作系统的交互的基础。
系统调用(System Call)
In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed. This may include hardware-related services (for example, accessing a hard disk drive), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.
The system call is the fundamental interface between an application and the Linux kernel.
A Library with Wrapper Functions as an intermediary
System calls are generally not invoked directly, but rather via wrapper functions in glibc
(or perhaps some other library). For details of direct invocation of a system call, see intro.
Often, but not always, the name of the wrapper function is the same as the name of the system call that it invokes. For example, glibc contains a function chdir()
which invokes the underlying “chdir” system call.
Often the glibc wrapper function is quite thin, doing little work other than copying arguments to the right registers before invoking the system call, and then setting errno appropriately after the system call has returned. (These are the same steps that are performed by syscall()
, which can be used to invoke system calls for which no wrapper function is provided.) Note: system calls indicate a failure by returning a negative error number to the caller on architectures without a separate error register/flag, as noted in syscall(2); when this happens, the wrapper function negates the returned error number (to make it positive), copies it to errno, and returns -1 to the caller of the wrapper.
glibc
By far the most widely used C library on Linux is the GNU C Library ⟨http://www.gnu.org/software/libc/⟩, often referred to as glibc. This is the C library that is nowadays used in all major Linux distributions.
The pathname /lib/libc.so.6 (or something similar) is normally a symbolic link that points to the location of the glibc library, and executing this pathname will cause glibc to display various information about the version installed on your system.
syscall()
syscall() is a small library function that invokes the system call whose assembly language interface has the specified number with the specified arguments. Employing syscall()
is useful, for example, when invoking a system call that has no wrapper function in the C library.
syscall() saves CPU registers before making the system call,restores the registers upon return from the system call, andstores any error returned by the system call in errno.
Symbolic constants for system call numbers can be found in the header file <sys/syscall.h>.
Calling Relation
下图显示了POSIX API
、C语言库
和系统调用
之间的关系:
当一个进程在用户模式(user mode)下运行用户程序,而这个用户程序希望执行一个系统服务(system service),比如写入 log 到一个文件中,用户程序需要调用一个C library中的一/多个 (wraper) function,这个 function 内部会调用一个系统调用(通常以 assembly instruction 的形式),这时,控制流会进入内核态(kernel mode),当内核执行完成后,将控制流返回给 C library, 最终它将控制流返回给用户程序。
The library’s wrapper functions expose an ordinary function calling convention (a subroutine call on the assembly level) for using the system call, as well as making the system call more modular.
The call to the library function itself does not cause a switch to kernel mode and is usually a normal subroutine call (using, for example, a “CALL” assembly instruction in some Instruction set architectures (ISAs)). The actual system call does transfer control to the kernel (and is more implementation-dependent and platform-dependent than the library call abstracting it). For example, in Unix-like systems, fork and execve are C library functions that in turn execute instructions that invoke the fork and exec system calls.
Examples
On Unix, Unix-like and other POSIX-compliant operating systems, popular system calls are open, read, write, close, wait, exec, fork, exit, and kill. Many modern operating systems have hundreds of system calls. For example, Linux and OpenBSD each have over 300 different calls, NetBSD has close to 500, FreeBSD has over 500, Windows 7 has close to 700, while Plan 9 has 51.
Typical implementations
Implementing system calls requires a transfer of control from user space to kernel space, which involves some sort of architecture-specific feature.
A typical way to implement this is to use a software interrupt or trap. Interrupts transfer control to the operating system kernel, so software simply needs to set up some register with the system call number needed, and execute the software interrupt.
Categories of system calls
System calls can be grouped roughly into six major categories:
Process control
A running program needs to be able to stop execution either normally or abnormally. When execution is stopped abnormally, often a dump of memory is taken and can be examined with a debugger.
- create process (for example,
fork()
on Unix-like systems, orNtCreateProcess()
in the Windows NT Native API) - terminate process
- load, execute
- get/set process attributes
- wait for time, wait event, signal event
- allocate and free memory
File management
Some common system calls are create, delete, read, write, reposition, or close. Also, there is a need to determine the file attributes – get and set file attribute. Many times the OS provides an API to make these system calls.
- create file (
open()
), delete file - open (
open()
), close - read (
read()
), write (write()
), reposition - get/set file attributes
Device management
Process usually require several resources to execute, if these resources are available, they will be granted and control returned to the user process. These resources are also thought of as devices. Some are physical, such as a video card, and others are abstract, such as a file.
User programs request the device, and when finished they release the device. Similar to files, we can read, write, and reposition the device.
- request device, release device
- read, write, reposition
- get/set device attributes
- logically attach or detach devices
Information maintenance
Some system calls exist purely for transferring information between the user program and the operating system. An example of this is time, or date.
The OS also keeps information about all its processes and provides system calls to report this information.
- get/set total system information (including time, date, computer name, enterprise etc.)
- get/set process, file, or device metadata (including author, opener, creation time and date, etc.)
Communication
There are two models of interprocess communication, the message-passing model and the shared memory model.
-
Message-passing uses a common mailbox to pass messages between processes.
-
Shared memory use certain system calls to create and gain access to create and gain access to regions of memory owned by other processes. The two processes exchange information by reading and writing in the shared data.
-
create, delete communication connection
-
send, receive messages
-
transfer status information
-
attach or detach remote devices
- Protection
- get/set file permissions
Important System Calls Used in OS
Process control - wait()
In some systems, a process needs to wait for another process to complete its execution. This type of situation occurs when a parent process creates a child process, and the execution of the parent process remains suspended until its child process executes.
The suspension of the parent process automatically occurs with a wait() system call. When the child process ends execution, the control moves back to the parent process.
Process control - fork()
Processes use this system call to create processes that are a copy of themselves. With the help of this system Call parent process creates a child process, and the execution of the parent process will be suspended till the child process executes.
Process control - exec()
This system call runs when an executable file in the context of an already running process that replaces the older executable file. However, the original process identifier remains as a new process is not built, but stack, data, head, data, etc. are replaced by the new process.
Process control - kill()
The kill() system call is used by OS to send a termination signal to a process that urges the process to exit. However, a kill system call does not necessarily mean killing the process and can have various meanings.
Process control - exit()
The exit() system call is used to terminate program execution. Specially in the multi-threaded environment, this call defines that the thread execution is complete. The OS reclaims resources that were used by the process after the use of exit() system call.
Linux System Calls List
https://man7.org/linux/man-pages/man2/syscalls.2.html
例子
以调用 read(fd, buffer, nbytes)
系统调用为例。
count = read(fd, buffer, nbytes);
read 调用包含三个参数:
- fd:指定读取的文件
- buffer:缓冲区的指针
- nbytes:读取的字节数
我们可以通过strace
看到,当执行 ls
时,其底层其实会调用 read
这个System Call:
$ strace -e read ls
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\254\3\0\0004\0\0\0"..., 512) = 512
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
read(3, "A,\0\0\0aeabi\0\1\"\0\0\0\0056\0\6\6\10\1\t\2\n\3\f\1\22\4\24"..., 45) = 45
Reference
- https://en.wikipedia.org/wiki/System_call
- 《Modern Operating System 4th》
- Chapter 5. System Calls - https://notes.shichao.io/lkd/ch5/
- https://man7.org/linux/man-pages/man2/syscalls.2.html
- https://man7.org/linux/man-pages/man2/syscall.2.html
- https://man7.org/linux/man-pages/man7/libc.7.html
- https://www.guru99.com/system-call-operating-system.html
- https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md
- http://faculty.salina.k-state.edu/tim/ossg/Introduction/sys_calls.html
- https://www.kernel.org/doc/html/latest/process/adding-syscalls.html?highlight=system%20call
- https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-1.html