sycn.once的使用和原理

读写锁和互斥锁的性能比较

定义互斥锁和读写锁


type RW interface {
	Write()
	Read()
}

const cost = time.Microsecond

type Lock struct {
	count int
	mu    sync.Mutex
}

func (l *Lock) Write() {
	l.mu.Lock()
	l.count++
	time.Sleep(cost)
	l.mu.Unlock()
}

func (l *Lock) Read() {
	l.mu.Lock()
	time.Sleep(cost)
	_ = l.count
	l.mu.Unlock()
}

type RWLock struct {
	count int
	mu    sync.RWMutex
}

func (l *RWLock) Write() {
	l.mu.Lock()
	time.Sleep(cost)
    l.count++
	l.mu.Unlock()
}

func (l *RWLock) Read() {
	l.mu.RLock()
	_ = l.count
	time.Sleep(cost)
	l.mu.RUnlock()
}

使用benchmark 测试互斥锁和读写锁的性能差异

 
var wg sync.WaitGroup

func benchmark(b *testing.B, lock RW, read, write int) {
	for i := 0; i < b.N; i++ {
		//read
		for j := 0; j < read*100; j++ {
			wg.Add(1)
			go func() {
				lock.Read()
				wg.Done()
			}()
		}
		//write
		for k := 0; k < write*100; k++ {
			wg.Add(1)
			go func() {
				lock.Write()
				wg.Done()
			}()
		}
		wg.Wait()
	}
}

func BenchmarkReadMore(b *testing.B) {
	benchmark(b, &Lock{}, 9, 1)
}

func BenchmarkReadMoreWithRW(b *testing.B) {
	benchmark(b, &RWLock{}, 9, 1)
}

func BenchmarkReadWriteEqual(b *testing.B) {
	benchmark(b, &RWLock{}, 5, 5)
}

func BenchmarkWriteMore(b *testing.B) {
	benchmark(b, &Lock{}, 1, 9)
}

func BenchmarkWriteMoreWithRW(b *testing.B) {
	benchmark(b, &RWLock{}, 1, 9)
}

测试结果

Forrest@LAPTOP-32G4HDVI MINGW64 /d/Go_WorkSpace/面经/mutex (main)
$ go test -bench .
goos: windows
goarch: amd64
pkg: deom
cpu: AMD Ryzen 7 5800H with Radeon Graphics
BenchmarkReadMore-16                   1        13955332000 ns/op
BenchmarkReadMoreWithRW-16             1        1462559300 ns/op
BenchmarkReadWriteEqual-16             1        7310712800 ns/op
BenchmarkWriteMore-16                  1        14232708800 ns/op
BenchmarkWriteMoreWithRW-16            1        12806891200 ns/op
PASS
ok      deom    50.568s

可以看出读写锁在读多的情形下更省时间

互斥锁有两种状态：正常状态和饥饿状态。

在正常状态下，所有等待锁的 goroutine 按照FIFO顺序等待。唤醒的 goroutine 不会直接拥有锁，而是会和新请求锁的 goroutine 竞争锁的拥有。新请求锁的 goroutine 具有优势：它正在 CPU 上执行，而且可能有好几个，所以刚刚唤醒的 goroutine 有很大可能在锁竞争中失败。在这种情况下，这个被唤醒的 goroutine 会加入到等待队列的前面。如果一个等待的 goroutine 超过 1ms 没有获取锁，那么它将会把锁转变为饥饿模式。

在饥饿模式下，锁的所有权将从 unlock 的 goroutine 直接交给交给等待队列中的第一个。新来的 goroutine 将不会尝试去获得锁，即使锁看起来是 unlock 状态, 也不会去尝试自旋操作，而是放在等待队列的尾部。

如果一个等待的 goroutine 获取了锁，并且满足一以下其中的任何一个条件：(1)它是队列中的最后一个；(2)它等待的时候小于1ms。它会将锁的状态转换为正常状态。

正常状态有很好的性能表现，饥饿模式也是非常重要的，因为它能阻止尾部延迟的现象。

sync.Once

可以使用 sync.Once 实现单例模式


var instance *Instance
var once sync.Once

type Instance struct {
}

func GetInstance() *Instance {
	once.Do(func() {
		instance = &Instance{}
	})
	return instance
}

sync.Once的实现

// Once is an object that will perform exactly one action.
type Once struct {
	// done indicates whether the action has been performed.
	// It is first in the struct because it is used in the hot path.
	// The hot path is inlined at every call site.
	// Placing done first allows more compact instructions on some architectures (amd64/x86),
	// and fewer instructions (to calculate offset) on other architectures.
	done uint32
	m    Mutex
}

func (o *Once) Do(f func()) {
	if atomic.LoadUint32(&o.done) == 0 { // check
		// Outlined slow-path to allow inlining of the fast-path.
		o.doSlow(f)
	}
}

func (o *Once) doSlow(f func()) {
	o.m.Lock()                          // lock
	defer o.m.Unlock()
	
	if o.done == 0 {                    // check
		defer atomic.StoreUint32(&o.done, 1)
		f()
	}
}

done 在热路径中，done 放在第一个字段，能够减少 CPU 指令，也就是说，这样做能够提升性能。

简单解释下这句话：

热路径(hot path)是程序非常频繁执行的一系列指令，sync.Once 绝大部分场景都会访问 o.done，在热路径上是比较好理解的，如果 hot path 编译后的机器码指令更少，更直接，必然是能够提升性能的。
为什么放在第一个字段就能够减少指令呢？因为结构体第一个字段的地址和结构体的指针是相同的，如果是第一个字段，直接对结构体的指针解引用即可。如果是其他的字段，除了结构体指针外，还需要计算与第一个值的偏移(calculate offset)。在机器码中，偏移量是随指令传递的附加值，CPU 需要做一次偏移值与指针的加法运算，才能获取要访问的值的地址。因为，访问第一个字段的机器代码更紧凑，速度更快。

sync.Pool


var buf, _ = json.Marshal(&Student{
	name:  "YST",
	Score: 19,
})

type Student struct {
	name  string
	Score int
}

var StudentPool = sync.Pool{New: func() interface{} {
	return new(Student)
}}

func getStu() *Student {
	return StudentPool.Get().(*Student)
}

func putStu(stu *Student) {
	StudentPool.Put(stu)
}

定义一个student 结构体和studentPool的pool池
封装get和put操作

写benchmark 测试有 pool和无的内存分配的区别


func BenchmarkMarshal(b *testing.B) {
	for i := 0; i < b.N; i++ {
		stu := &Student{}
		err := json.Unmarshal(buf, stu)
		if err != nil {
		 	fmt.Println("err:", err)
		 	continue
        }
	}
}

func BenchmarkMarshalWithPool(b *testing.B) {
	for i := 0; i < b.N; i++ {
		stu := getStu()
		err := json.Unmarshal(buf, stu)
		if err != nil {
            fmt.Println("err:", err)
			continue
		}
		putStu(stu)
	}
}

1	`go test -bench . -benchmem`

测试结果：可以看出使用 **pool ** 的每次操作所分配内存会更少

Forrest@LAPTOP-32G4HDVI MINGW64 /d/Go_WorkSpace/面经/pool (main)
$  go test -bench . -benchmem
goos: windows
goarch: amd64
pkg: demo
cpu: AMD Ryzen 7 5800H with Radeon Graphics
BenchmarkMarshal-16              2651370               454.0 ns/op           248 B/op          6 allocs/op
BenchmarkMarshalWithPool-16      2634880               442.2 ns/op           224 B/op          5 allocs/op
PASS
ok      demo    4.011s

#单例模式，once

sycn.once的使用和原理

http://example.com/2024/01/11/Go并发编程/

作者

Forrest

发布于

2024年1月11日

许可协议

Go常用数据结构上一篇

MIT-6.824-分布式系统下一篇