西维蜀黍

【Golang】性能调优 - GC 调优

GC 调优

Go 的 GC 被设计为极致简洁,与较为成熟的 Java GC 的数十个可控参数相比,严格意义上来讲,Go 可供用户调整的参数只有 GOGC 环境变量。当我们谈论 GC 调优时,通常是指减少用户代码对 GC 产生的压力,这一方面包含了减少用户代码分配内存的数量(即对程序的代码行为进行调优),另一方面包含了最小化 Go 的 GC 对 CPU 的使用率(即调整 GOGC)。

总的来说,我们可以在现在的开发中处理的有以下几种情况:

  1. 对停顿敏感:GC 过程中产生的长时间停顿、或由于需要执行 GC 而没有执行用户代码,导致需要立即执行的用户代码执行滞后(进而导致application的throughput下降)。
  2. 对资源消耗敏感:对于频繁分配内存的应用而言,频繁分配内存增加 GC 的工作量,原本可以充分利用 CPU 的应用不得不频繁地执行垃圾回收,影响用户代码对 CPU 的利用率,进而影响用户代码的执行效率(同样也导致了application的throughput下降)。

从这两点来看,所谓 GC 调优的核心思想也就是充分的围绕上面的两点来展开:优化内存的申请速度,尽可能的少申请内存,复用已申请的内存。或者简单来说,不外乎这三个关键字:控制、减少、复用

  ...


【Golang】源码 - io 包

Reader interface

// Reader is the interface that wraps the basic Read method.
//
// Read reads up to len(p) bytes into p. It returns the number of bytes
// read (0 <= n <= len(p)) and any error encountered. Even if Read
// returns n < len(p), it may use all of p as scratch space during the call.
// If some data is available but not len(p) bytes, Read conventionally
// returns what is available instead of waiting for more.
//
// When Read encounters an error or end-of-file condition after
// successfully reading n > 0 bytes, it returns the number of
// bytes read. It may return the (non-nil) error from the same call
// or return the error (and n == 0) from a subsequent call.
// An instance of this general case is that a Reader returning
// a non-zero number of bytes at the end of the input stream may
// return either err == EOF or err == nil. The next Read should
// return 0, EOF.
//
// Callers should always process the n > 0 bytes returned before
// considering the error err. Doing so correctly handles I/O errors
// that happen after reading some bytes and also both of the
// allowed EOF behaviors.
//
// Implementations of Read are discouraged from returning a
// zero byte count with a nil error, except when len(p) == 0.
// Callers should treat a return of 0 and nil as indicating that
// nothing happened; in particular it does not indicate EOF.
//
// Implementations must not retain p.
type Reader interface {
	Read(p []byte) (n int, err error)
}
  ...


【Golang】源码 - poll 包

  • /usr/local/go/src/internal/poll/fd_unix.go

fd_unix.go

FD struct

// FD is a file descriptor. The net and os packages use this type as a
// field of a larger type representing a network connection or OS file.
type FD struct {
	// Lock sysfd and serialize access to Read and Write methods.
	fdmu fdMutex

	// System file descriptor. Immutable until Close.
	Sysfd int

	// I/O poller.
	pd pollDesc

	// Writev cache.
	iovecs *[]syscall.Iovec

	// Semaphore signaled when file is closed.
	csema uint32

	// Non-zero if this file has been set to blocking mode.
	isBlocking uint32

	// Whether this is a streaming descriptor, as opposed to a
	// packet-based descriptor like a UDP socket. Immutable.
	IsStream bool

	// Whether a zero byte read indicates EOF. This is false for a
	// message based socket connection.
	ZeroReadIsEOF bool

	// Whether this is a file rather than a network socket.
	isFile bool
}
  ...


【Golang】源码 - syscall

syscall in Golang

SyscallRawSyscall 的有如下区别,通常系统调用使用Syscall,因为它可以防止阻塞同一个P中的其他goroutine的执行:

  • Syscall will call runtime·entersyscall(SB) and runtime·exitsyscall(SB) whereas RawSyscall will not
    • 因为 Syscall 可以做到进入和退出syscall的时候通知runtime。
  • 这两个函数runtime·entersyscall和runtime·exitsyscall的实现在proc.go文件里面。其实在runtime·entersyscall函数里面,通知系统调用时候,是会将G的M的P解绑,因而P可以去获取另一个M以执行其余的G,这样提升效率。 所以如果用户代码使用了 RawSyscall 来做一些阻塞的系统调用,是有可能阻塞其它的 G 的。RawSyscall 只是为了在执行那些一定不会阻塞的系统调用时,能节省两次对 runtime 的函数调用消耗

Syscall

我们来看下 Syscall 的具体实现:

// func Syscall(trap int64, a1, a2, a3 int64) (r1, r2, err int64);
// Trap # in AX, args in DI SI DX R10 R8 R9, return in AX DX
// Note that this differs from "standard" ABI convention, which
// would pass 4th arg in CX, not R10.

TEXT	·Syscall(SB),NOSPLIT,$0-56
	CALL	runtime·entersyscall(SB)
	MOVQ	a1+8(FP), DI
	MOVQ	a2+16(FP), SI
	MOVQ	a3+24(FP), DX
	MOVQ	$0, R10
	MOVQ	$0, R8
	MOVQ	$0, R9
	MOVQ	trap+0(FP), AX	// syscall entry
	SYSCALL
	CMPQ	AX, $0xfffffffffffff001
	JLS	ok
	MOVQ	$-1, r1+32(FP)
	MOVQ	$0, r2+40(FP)
	NEGQ	AX
	MOVQ	AX, err+48(FP)
	CALL	runtime·exitsyscall(SB)
	RET
ok:
	MOVQ	AX, r1+32(FP)
	MOVQ	DX, r2+40(FP)
	MOVQ	$0, err+48(FP)
	CALL	runtime·exitsyscall(SB)
	RET

这段汇编中,主要执行了 6 个步骤:

  1. 调用 runtime.entersyscall 函数。通知 runtime 调度器,让出运行时间
  2. 读内存,把各个参数放到合适的寄存器
  3. 通知内核执行系统调用
  4. 判断系统调用的执行结果,并进行跳转
  5. 若执行成功,拷贝执行结果到返回值。若执行失败,置空返回值
  6. 调用 runtime.exitsyscall 函数,恢复该 goroutine 的运行

RawSyscall

RawSyscall 的汇编实现与 Syscall 一致,唯一的区别是没有调用 runtime.entersyscallruntime.exitsyscall,也就是说,直接使用 RawSyscall 可能出现阻塞的情况。

提到阻塞就不得不解释下,系统调用可以分两种:快系统调用、慢系统调用。快系统调指的是不会造成阻塞的系统调用,如:获取 pid。相应的,慢系统指的就是会造成阻塞的系统调用,如:读写磁盘、网络。虽然平时可能感觉这些慢系统调用也执行的很快,但它们的速度相比 CPU 还是太慢,在某些情形下,这个速度还会被放慢很多,甚至出现假死(hang)的情况。

因此,正如 golang 邮件列表里的讨论所言,除非你对你要用的具体系统调用非常了解,同时性能要求极高,其它场景下能别用就别用 RawSyscall

I would say that Go programs should always call Syscall. RawSyscall exists to make it slightly more efficient to call system calls that never block, such as getpid. But it’s really ann internal mechanism.

syscall Package

Package syscall contains an interface to the low-level operating system primitives. The details vary depending on the underlying system.

The primary use of syscall is inside other packages that provide a more portable interface to the system, such as “os”, “time” and “net”. Use those packages rather than this one if you can. For details of the functions and data types in this package consult the manuals for the appropriate operating system. These calls return err == nil to indicate success; otherwise err is an operating system error describing the failure. On most systems, that error has type syscall.Errno.

entersyscall

// /usr/local/Cellar/go/1.16.6/libexec/src/runtime/proc.go

// Standard syscall entry used by the go syscall library and normal cgo calls.
//
// This is exported via linkname to assembly in the syscall package.
//
//go:nosplit
//go:linkname entersyscall
func entersyscall() {
	reentersyscall(getcallerpc(), getcallersp())
}

func reentersyscall(pc, sp uintptr) {
	_g_ := getg()

	// Disable preemption because during this function g is in Gsyscall status,
	// but can have inconsistent g->sched, do not let GC observe it.
	_g_.m.locks++

	// Entersyscall must not call any function that might split/grow the stack.
	// (See details in comment above.)
	// Catch calls that might, by replacing the stack guard with something that
	// will trip any stack check and leaving a flag to tell newstack to die.
	_g_.stackguard0 = stackPreempt
	_g_.throwsplit = true

	// Leave SP around for GC and traceback.
  // 保存现场,在 syscall 之后会依据这些数据恢复现场
	save(pc, sp)
	_g_.syscallsp = sp
	_g_.syscallpc = pc
	casgstatus(_g_, _Grunning, _Gsyscall)
	if _g_.syscallsp < _g_.stack.lo || _g_.stack.hi < _g_.syscallsp {
		systemstack(func() {
			print("entersyscall inconsistent ", hex(_g_.syscallsp), " [", hex(_g_.stack.lo), ",", hex(_g_.stack.hi), "]\n")
			throw("entersyscall")
		})
	}

	if trace.enabled {
		systemstack(traceGoSysCall)
		// systemstack itself clobbers g.sched.{pc,sp} and we might
		// need them later when the G is genuinely blocked in a
		// syscall
		save(pc, sp)
	}

	if atomic.Load(&sched.sysmonwait) != 0 {
		systemstack(entersyscall_sysmon)
		save(pc, sp)
	}

	if _g_.m.p.ptr().runSafePointFn != 0 {
		// runSafePointFn may stack split if run on this stack
		systemstack(runSafePointFn)
		save(pc, sp)
	}

	_g_.m.syscalltick = _g_.m.p.ptr().syscalltick
	_g_.sysblocktraced = true
	pp := _g_.m.p.ptr()
	pp.m = 0
	_g_.m.oldp.set(pp)
	_g_.m.p = 0
	atomic.Store(&pp.status, _Psyscall)
	if sched.gcwaiting != 0 {
		systemstack(entersyscall_gcwait)
		save(pc, sp)
	}

	_g_.m.locks--
}

exitsyscall

func exitsyscall() {
	_g_ := getg()

	_g_.m.locks++ // see comment in entersyscall
	if getcallersp() > _g_.syscallsp {
		throw("exitsyscall: syscall frame is no longer valid")
	}

	_g_.waitsince = 0
	oldp := _g_.m.oldp.ptr()
	_g_.m.oldp = 0
	if exitsyscallfast(oldp) {
		if trace.enabled {
			if oldp != _g_.m.p.ptr() || _g_.m.syscalltick != _g_.m.p.ptr().syscalltick {
				systemstack(traceGoStart)
			}
		}
		// There's a cpu for us, so we can run.
		_g_.m.p.ptr().syscalltick++
		// We need to cas the status and scan before resuming...
		casgstatus(_g_, _Gsyscall, _Grunning)

		// Garbage collector isn't running (since we are),
		// so okay to clear syscallsp.
		_g_.syscallsp = 0
		_g_.m.locks--
		if _g_.preempt {
			// restore the preemption request in case we've cleared it in newstack
			_g_.stackguard0 = stackPreempt
		} else {
			// otherwise restore the real _StackGuard, we've spoiled it in entersyscall/entersyscallblock
			_g_.stackguard0 = _g_.stack.lo + _StackGuard
		}
		_g_.throwsplit = false

		if sched.disable.user && !schedEnabled(_g_) {
			// Scheduling of this goroutine is disabled.
			Gosched()
		}

		return
	}

	_g_.sysexitticks = 0
	if trace.enabled {
		// Wait till traceGoSysBlock event is emitted.
		// This ensures consistency of the trace (the goroutine is started after it is blocked).
		for oldp != nil && oldp.syscalltick == _g_.m.syscalltick {
			osyield()
		}
		// We can't trace syscall exit right now because we don't have a P.
		// Tracing code can invoke write barriers that cannot run without a P.
		// So instead we remember the syscall exit time and emit the event
		// in execute when we have a P.
		_g_.sysexitticks = cputicks()
	}

	_g_.m.locks--

	// Call the scheduler.
	mcall(exitsyscall0)

	// Scheduler returned, so we're allowed to run now.
	// Delete the syscallsp information that we left for
	// the garbage collector during the system call.
	// Must wait until now because until gosched returns
	// we don't know for sure that the garbage collector
	// is not running.
	_g_.syscallsp = 0
	_g_.m.p.ptr().syscalltick++
	_g_.throwsplit = false
}

macOS

syscall_darwin.go

// 调用以下 function 会在runtime时,调用 C library来触发一个syscall
// Implemented in the runtime package (runtime/sys_darwin.go)
func syscall(fn, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
func syscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err Errno)
func syscall6X(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err Errno)
func rawSyscall(fn, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
func rawSyscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err Errno)
func syscallPtr(fn, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
  ...


【Golang】源码 - net 包 - unixsock

Test UDS Server

package main

import (
	"log"
	"net"
	"os"
	"os/signal"
	"syscall"
)

func echoServer(c net.Conn) {
	for {
		buf := make([]byte, 512)
		nr, err := c.Read(buf)
		if err != nil {
			return
		}

		data := buf[0:nr]
		println("Server got:", string(data))
		_, err = c.Write(data)
		if err != nil {
			log.Fatal("Writing client error: ", err)
		}
	}
}

func main() {
	log.Println("Starting echo server")
	ln, err := net.Listen("unix", "/tmp/go1.sock")
	if err != nil {
		log.Fatal("Listen error: ", err)
	}

	sigc := make(chan os.Signal, 1)
	signal.Notify(sigc, os.Interrupt, syscall.SIGTERM)
	go func(ln net.Listener, c chan os.Signal) {
		sig := <-c
		log.Printf("Caught signal %s: shutting down.", sig)
		ln.Close()
		os.Exit(0)
	}(ln, sigc)

	for {
		fd, err := ln.Accept()
		if err != nil {
			log.Fatal("Accept error: ", err)
		}

		go echoServer(fd)
	}
}

Listen()

// /usr/local/go/src/net/dial.go

// Listen announces on the local network address.
//
// The network must be "tcp", "tcp4", "tcp6", "unix" or "unixpacket".
//
// For TCP networks, if the host in the address parameter is empty or
// a literal unspecified IP address, Listen listens on all available
// unicast and anycast IP addresses of the local system.
// To only use IPv4, use network "tcp4".
// The address can use a host name, but this is not recommended,
// because it will create a listener for at most one of the host's IP
// addresses.
// If the port in the address parameter is empty or "0", as in
// "127.0.0.1:" or "[::1]:0", a port number is automatically chosen.
// The Addr method of Listener can be used to discover the chosen
// port.
//
// See func Dial for a description of the network and address
// parameters.
func Listen(network, address string) (Listener, error) {
	var lc ListenConfig
	return lc.Listen(context.Background(), network, address)
}

// Listen announces on the local network address.
//
// See func Listen for a description of the network and address
// parameters.
func (lc *ListenConfig) Listen(ctx context.Context, network, address string) (Listener, error) {
	addrs, err := DefaultResolver.resolveAddrList(ctx, "listen", network, address, nil)
  // 对于 UDS,这里 addrs 一定等于 addrList[&UnixAddr{Nmae: "<uds_path>", Net: "unix"}]
	...
	sl := &sysListener{
		ListenConfig: *lc,
		network:      network,
		address:      address,
	}
	var l Listener
	la := addrs.first(isIPv4)
	switch la := la.(type) {
	case *UnixAddr:
		l, err = sl.listenUnix(ctx, la)
	default:
		...
	}
  ...
	return l, nil
}
  ...