【Golang】源码 - syscall

Posted by 西维蜀黍 on 2021-07-17, Last Modified on 2021-10-02

syscall in Golang

SyscallRawSyscall 的有如下区别,通常系统调用使用Syscall,因为它可以防止阻塞同一个P中的其他goroutine的执行:

  • Syscall will call runtime·entersyscall(SB) and runtime·exitsyscall(SB) whereas RawSyscall will not
    • 因为 Syscall 可以做到进入和退出syscall的时候通知runtime。
  • 这两个函数runtime·entersyscall和runtime·exitsyscall的实现在proc.go文件里面。其实在runtime·entersyscall函数里面,通知系统调用时候,是会将G的M的P解绑,因而P可以去获取另一个M以执行其余的G,这样提升效率。 所以如果用户代码使用了 RawSyscall 来做一些阻塞的系统调用,是有可能阻塞其它的 G 的。RawSyscall 只是为了在执行那些一定不会阻塞的系统调用时,能节省两次对 runtime 的函数调用消耗

Syscall

我们来看下 Syscall 的具体实现:

// func Syscall(trap int64, a1, a2, a3 int64) (r1, r2, err int64);
// Trap # in AX, args in DI SI DX R10 R8 R9, return in AX DX
// Note that this differs from "standard" ABI convention, which
// would pass 4th arg in CX, not R10.

TEXT	·Syscall(SB),NOSPLIT,$0-56
	CALL	runtime·entersyscall(SB)
	MOVQ	a1+8(FP), DI
	MOVQ	a2+16(FP), SI
	MOVQ	a3+24(FP), DX
	MOVQ	$0, R10
	MOVQ	$0, R8
	MOVQ	$0, R9
	MOVQ	trap+0(FP), AX	// syscall entry
	SYSCALL
	CMPQ	AX, $0xfffffffffffff001
	JLS	ok
	MOVQ	$-1, r1+32(FP)
	MOVQ	$0, r2+40(FP)
	NEGQ	AX
	MOVQ	AX, err+48(FP)
	CALL	runtime·exitsyscall(SB)
	RET
ok:
	MOVQ	AX, r1+32(FP)
	MOVQ	DX, r2+40(FP)
	MOVQ	$0, err+48(FP)
	CALL	runtime·exitsyscall(SB)
	RET

这段汇编中,主要执行了 6 个步骤:

  1. 调用 runtime.entersyscall 函数。通知 runtime 调度器,让出运行时间
  2. 读内存,把各个参数放到合适的寄存器
  3. 通知内核执行系统调用
  4. 判断系统调用的执行结果,并进行跳转
  5. 若执行成功,拷贝执行结果到返回值。若执行失败,置空返回值
  6. 调用 runtime.exitsyscall 函数,恢复该 goroutine 的运行

RawSyscall

RawSyscall 的汇编实现与 Syscall 一致,唯一的区别是没有调用 runtime.entersyscallruntime.exitsyscall,也就是说,直接使用 RawSyscall 可能出现阻塞的情况。

提到阻塞就不得不解释下,系统调用可以分两种:快系统调用、慢系统调用。快系统调指的是不会造成阻塞的系统调用,如:获取 pid。相应的,慢系统指的就是会造成阻塞的系统调用,如:读写磁盘、网络。虽然平时可能感觉这些慢系统调用也执行的很快,但它们的速度相比 CPU 还是太慢,在某些情形下,这个速度还会被放慢很多,甚至出现假死(hang)的情况。

因此,正如 golang 邮件列表里的讨论所言,除非你对你要用的具体系统调用非常了解,同时性能要求极高,其它场景下能别用就别用 RawSyscall

I would say that Go programs should always call Syscall. RawSyscall exists to make it slightly more efficient to call system calls that never block, such as getpid. But it’s really ann internal mechanism.

syscall Package

Package syscall contains an interface to the low-level operating system primitives. The details vary depending on the underlying system.

The primary use of syscall is inside other packages that provide a more portable interface to the system, such as “os”, “time” and “net”. Use those packages rather than this one if you can. For details of the functions and data types in this package consult the manuals for the appropriate operating system. These calls return err == nil to indicate success; otherwise err is an operating system error describing the failure. On most systems, that error has type syscall.Errno.

entersyscall

// /usr/local/Cellar/go/1.16.6/libexec/src/runtime/proc.go

// Standard syscall entry used by the go syscall library and normal cgo calls.
//
// This is exported via linkname to assembly in the syscall package.
//
//go:nosplit
//go:linkname entersyscall
func entersyscall() {
	reentersyscall(getcallerpc(), getcallersp())
}

func reentersyscall(pc, sp uintptr) {
	_g_ := getg()

	// Disable preemption because during this function g is in Gsyscall status,
	// but can have inconsistent g->sched, do not let GC observe it.
	_g_.m.locks++

	// Entersyscall must not call any function that might split/grow the stack.
	// (See details in comment above.)
	// Catch calls that might, by replacing the stack guard with something that
	// will trip any stack check and leaving a flag to tell newstack to die.
	_g_.stackguard0 = stackPreempt
	_g_.throwsplit = true

	// Leave SP around for GC and traceback.
  // 保存现场,在 syscall 之后会依据这些数据恢复现场
	save(pc, sp)
	_g_.syscallsp = sp
	_g_.syscallpc = pc
	casgstatus(_g_, _Grunning, _Gsyscall)
	if _g_.syscallsp < _g_.stack.lo || _g_.stack.hi < _g_.syscallsp {
		systemstack(func() {
			print("entersyscall inconsistent ", hex(_g_.syscallsp), " [", hex(_g_.stack.lo), ",", hex(_g_.stack.hi), "]\n")
			throw("entersyscall")
		})
	}

	if trace.enabled {
		systemstack(traceGoSysCall)
		// systemstack itself clobbers g.sched.{pc,sp} and we might
		// need them later when the G is genuinely blocked in a
		// syscall
		save(pc, sp)
	}

	if atomic.Load(&sched.sysmonwait) != 0 {
		systemstack(entersyscall_sysmon)
		save(pc, sp)
	}

	if _g_.m.p.ptr().runSafePointFn != 0 {
		// runSafePointFn may stack split if run on this stack
		systemstack(runSafePointFn)
		save(pc, sp)
	}

	_g_.m.syscalltick = _g_.m.p.ptr().syscalltick
	_g_.sysblocktraced = true
	pp := _g_.m.p.ptr()
	pp.m = 0
	_g_.m.oldp.set(pp)
	_g_.m.p = 0
	atomic.Store(&pp.status, _Psyscall)
	if sched.gcwaiting != 0 {
		systemstack(entersyscall_gcwait)
		save(pc, sp)
	}

	_g_.m.locks--
}

exitsyscall

func exitsyscall() {
	_g_ := getg()

	_g_.m.locks++ // see comment in entersyscall
	if getcallersp() > _g_.syscallsp {
		throw("exitsyscall: syscall frame is no longer valid")
	}

	_g_.waitsince = 0
	oldp := _g_.m.oldp.ptr()
	_g_.m.oldp = 0
	if exitsyscallfast(oldp) {
		if trace.enabled {
			if oldp != _g_.m.p.ptr() || _g_.m.syscalltick != _g_.m.p.ptr().syscalltick {
				systemstack(traceGoStart)
			}
		}
		// There's a cpu for us, so we can run.
		_g_.m.p.ptr().syscalltick++
		// We need to cas the status and scan before resuming...
		casgstatus(_g_, _Gsyscall, _Grunning)

		// Garbage collector isn't running (since we are),
		// so okay to clear syscallsp.
		_g_.syscallsp = 0
		_g_.m.locks--
		if _g_.preempt {
			// restore the preemption request in case we've cleared it in newstack
			_g_.stackguard0 = stackPreempt
		} else {
			// otherwise restore the real _StackGuard, we've spoiled it in entersyscall/entersyscallblock
			_g_.stackguard0 = _g_.stack.lo + _StackGuard
		}
		_g_.throwsplit = false

		if sched.disable.user && !schedEnabled(_g_) {
			// Scheduling of this goroutine is disabled.
			Gosched()
		}

		return
	}

	_g_.sysexitticks = 0
	if trace.enabled {
		// Wait till traceGoSysBlock event is emitted.
		// This ensures consistency of the trace (the goroutine is started after it is blocked).
		for oldp != nil && oldp.syscalltick == _g_.m.syscalltick {
			osyield()
		}
		// We can't trace syscall exit right now because we don't have a P.
		// Tracing code can invoke write barriers that cannot run without a P.
		// So instead we remember the syscall exit time and emit the event
		// in execute when we have a P.
		_g_.sysexitticks = cputicks()
	}

	_g_.m.locks--

	// Call the scheduler.
	mcall(exitsyscall0)

	// Scheduler returned, so we're allowed to run now.
	// Delete the syscallsp information that we left for
	// the garbage collector during the system call.
	// Must wait until now because until gosched returns
	// we don't know for sure that the garbage collector
	// is not running.
	_g_.syscallsp = 0
	_g_.m.p.ptr().syscalltick++
	_g_.throwsplit = false
}

macOS

syscall_darwin.go

// 调用以下 function 会在runtime时,调用 C library来触发一个syscall
// Implemented in the runtime package (runtime/sys_darwin.go)
func syscall(fn, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
func syscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err Errno)
func syscall6X(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err Errno)
func rawSyscall(fn, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
func rawSyscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err Errno)
func syscallPtr(fn, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)

这些函数的实现都是汇编,按照 linux 的 syscall 调用规范,我们只要在汇编中把参数依次传入寄存器,并调用 SYSCALL 指令即可进入内核处理逻辑,系统调用执行完毕之后,返回值放在 RAX 。

zsyscall_darwin_amd64.go

socket()

// 调用一个 function 会在runtime时,调用 C library来触发一个syscall
// 返回的fd是当前process的file descriptor number
func socket(domain int, typ int, proto int) (fd int, err error) {
  // 当执行完 rawSyscall 时,fd 已经被创建完成了(OS wise)
	r0, _, e1 := rawSyscall(funcPC(libc_socket_trampoline), uintptr(domain), uintptr(typ), uintptr(proto))
	fd = int(r0)
	if e1 != 0 {
		err = errnoErr(e1)
	}
	return
}

以下为证明:

$ lsof -p 4393
COMMAND    PID   USER   FD   TYPE             DEVICE SIZE/OFF                NODE NAME
___1go_bu 4393 shiwei  cwd    DIR                1,4      288            12340897 /Users/shiwei/SW/awesomeProject
___1go_bu 4393 shiwei  txt    REG                1,4  3249792            12749552 /private/var/folders/0g/klmrv6hx1vq0sx_yf1y6r6mm0000gn/T/___1go_build_uds_server_go
___1go_bu 4393 shiwei  txt    REG                1,4  2580592 1152921500312766121 /usr/lib/dyld
___1go_bu 4393 shiwei    0u   CHR               16,1    0t276                1221 /dev/ttys001
___1go_bu 4393 shiwei    1u   CHR               16,1    0t276                1221 /dev/ttys001
___1go_bu 4393 shiwei    2u   CHR               16,1    0t276                1221 /dev/ttys001
___1go_bu 4393 shiwei    3   PIPE 0xa02917af91321490    16384                     ->0x1427e9dd29602369
___1go_bu 4393 shiwei    4   PIPE 0x1427e9dd29602369    16384                     ->0xa02917af91321490
___1go_bu 4393 shiwei    5u  unix  0xac4cdb6a9619963      0t0                     ->(none)

read()

func read(fd int, p []byte) (n int, err error) {
	var _p0 unsafe.Pointer
	if len(p) > 0 {
		_p0 = unsafe.Pointer(&p[0])
	} else {
		_p0 = unsafe.Pointer(&_zero)
	}
	r0, _, e1 := syscall(funcPC(libc_read_trampoline), uintptr(fd), uintptr(_p0), uintptr(len(p)))
	n = int(r0)
	if e1 != 0 {
		err = errnoErr(e1)
	}
	return
}

writer()

func write(fd int, p []byte) (n int, err error) {
	var _p0 unsafe.Pointer
	if len(p) > 0 {
		_p0 = unsafe.Pointer(&p[0])
	} else {
		_p0 = unsafe.Pointer(&_zero)
	}
	r0, _, e1 := syscall(funcPC(libc_write_trampoline), uintptr(fd), uintptr(_p0), uintptr(len(p)))
	n = int(r0)
	if e1 != 0 {
		err = errnoErr(e1)
	}
	return
}

syscall_unix.go

Read()

func Read(fd int, p []byte) (n int, err error) {
	n, err = read(fd, p)
	...
	return
}

Write()

func Write(fd int, p []byte) (n int, err error) {
	...
	if faketime && (fd == 1 || fd == 2) {
		n = faketimeWrite(fd, p)
		if n < 0 {
			n, err = 0, errnoErr(Errno(-n))
		}
	} else {
		n, err = write(fd, p)
	}
	if race.Enabled && n > 0 {
		race.ReadRange(unsafe.Pointer(&p[0]), n)
	}
	if msanenabled && n > 0 {
		msanRead(unsafe.Pointer(&p[0]), n)
	}
	return
}

Reference