【Concurrent Control】乐观并发控制(Optimistic Concurrency Control)与悲观并发控制(Pessimistic Concurrency Control)

Posted by 西维蜀黍 on 2019-02-17, Last Modified on 2023-07-18

Update

2020.6.1 update/ 2021.5 update:

今天和室友讨论,发现大家都看的是 乐观锁、悲观锁,这一篇就够了!来学习乐观并发控制和悲观并发控制。

然而,这篇文章中提到了“乐观锁”的概念,这个词这可能会带来confusion

  • 因为“乐观并发控制”本身 imply使用到了锁(lock),而锁本来就是悲观的(pessimistic),这与乐观(optimistic)相互矛盾
  • 另外,乐观并发控制其实描述的是一种进行并发控制(concurrent control)的思想,在这种思想中并不存在锁
  • 因而,就可能导致 confusion
  • 如果称为乐观并发控制(optimistic concurrent control),就会更准确。因而在本文中,不会使用“乐观锁”这个词,而是使用乐观并发控制。
  • 也可能称为无锁并发控制(lock-free concurrent control)

其实,搜索一下"optimistic locking",发现仍然还是有很多的结果,这说明这个confusion其实不是因为是翻译错误的原因。

今天突然意识到了这个问题,于是重新更新了这篇 blog。

Background

并发控制(Concurrency Control)

需要并发控制(Concurrency Control),其实是为了实现某些一致性规则(consistency rules)。

需要并发控制的一些场景:

  • 实现数据库的 ACID 中的一致性(consistency)
  • 编程中的实现互斥(mutual exclusion),比如Java中的 synchrinzed 方法

对应地,如果需要并发控制机制,但又没有应用任何并发控制机制,自然会产生某些问题,典型的包括:

  • Read-copy-update
  • The lost update problem: A second transaction writes a second value of a data-item (datum) on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results.
  • The dirty read problem: Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction (“dirty read”). The reading transactions end with incorrect results.
  • The incorrect summary problem: While one transaction takes a summary over the values of all the instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not.

悲观并发控制(Pessimistic Concurrency Control)与乐观并发控制(Optimistic Concurrency Control)

乐观并发控制(Optimistic Concurrency Control)对应于生活中乐观的人总是想着事情往好的方向发展,悲观并发控制(Pessimistic Concurrency Control)对应于生活中悲观的人总是想着事情往坏的方向发展。这两种人各有优缺点,不能不以场景而定说一种人好于另外一种人。

悲观并发控制(Pessimistic Concurrency Control)

悲观并发控制(Pessimistic Concurrency Control)总是假设最坏的情况,即每次去拿数据的时候都认为别人会修改,所以每次在修改数据的时候都会先把数据“锁起来”,来阻止其他人对数据的修改。

因此,自然地,实现悲观并发控制的最典型的方式就是使用**锁(lock)**了,

  • 传统的关系型数据库里边就用到了很多这种锁机制,比如行锁,表锁等,读锁,写锁等,即,都是在做操作之前先上锁。
  • Java 中基于synchronized 关键字的**独占锁(exclusive lock)**就是基于悲观锁思想的实现。

当上锁后,即使别人想拿或者修改这个数据,也无法做到。因此,

  • 他可能会不断的重试(直到拿到这个锁),这称为自旋(spin)
  • 也可能直接进入阻塞状态(目的是为了节省CPU时间片)。直接当锁被释放时,会被唤醒且尝试再次获取锁。

悲观并发控制通常来说会造成性能的下降。而且,有时还会出现错误情况,比如死锁(deallock)

乐观并发控制(Optimistic Concurrency Control)

乐观并发控制(Optimistic Concurrency Control)则总是假设最好的情况,即认为每次去修改数据的时候,别人均不会修改,所以不会上锁,但是在更新的时候,会判断一下在此期间,别人有没有去更新这个数据。

代表

乐观并发控制思想的代表:

  • 多版本并发控制(Multiversion concurrency control, MVCC)
  • 非阻塞算法(non-blocking algorithm),非阻塞算法都依赖于CPU指令集提供的原子修改原语指令(atomic read-modify-write primitives),最典型的非阻塞算法就是CAS 算法 (compare-and-swap)

场景:

  • 像数据库提供的类似于 write_condition 的机制,其实都是基于乐观锁并发控制思想的体现。
  • 在 Java 中java.util.concurrent.atomic 包中,提供了很多支持原子操作的类。如:AtomicBoolean、AtomicInteger、AtomicLong、AtomicReference 等,全都是基于乐观并发控制思想的 CAS 算法实现的。
  • 在 SVN 、Git 等版本控制管理器中,也有乐观锁并发控制思想的体现。当你提交数据的时候对比下版本号,如果远程仓库的版本号和本地的不一样,就表示有人已经提交过代码了,你需要先更新代码到本地处理一下版本冲突问题,不然是没有办法提交的。
  • HTTP Header中的 If-Match,The GET method returns an ETag for a resource and subsequent PUTs use the ETag value in the If-Match headers; while the first PUT will succeed, the second will not, as the value in If-Match is based on the first version of the resource

乐观锁适用于多读少写的应用类型,这样可以提高吞吐量

乐观并发控制的几个阶段

Optimistic concurrency control transactions involve these phases:

  • Begin: Record a timestamp marking the transaction’s beginning.
  • Modify: Read database values, and tentatively write changes.
  • Validate: Check whether other transactions have modified data that this transaction has used (read or written). This includes transactions that completed after this transaction’s start time, and optionally, transactions that are still active at validation time.
  • Commit/Rollback: If there is no conflict, make all changes take effect. If there is a conflict, resolve it, typically by aborting the transaction, although other resolution schemes are possible. Care must be taken to avoid a TOCTTOU bug, particularly if this phase and the previous one are not performed as a single atomic operation.

乐观并发控制(Optimistic Concurrency Control)思想的分析

从上面对两种并发控制思想的介绍,我们知道两种锁各有优缺点,不可认为一种好于另一种。

乐观并发控制适用于写比较少的情况下(多读场景),即冲突很少发生,这样可以省去了加锁的开销,提高了系统的整个吞吐量。但如果是多写的情况,一般会经常产生冲突,这就会导致上层应用会不断的进行不断地多次尝试(retry),这样反倒是降低了性能,所以一般多写的场景下用悲观并发控制机制就比较合适。

乐观并发控制思想常见的两种实现方式

乐观锁可以使用**多版本并发控制(Multiversion concurrency control, MVCC)**或 CAS 算法来实现。

1 多版本并发控制(Multiversion concurrency control, MVCC)

Refer to https://swsmile.info/post/concurrent-control-mvcc/

2 非阻塞算法(non-blocking algorithm)

// TODO

  • Wait-freedom
  • Lock-freedom
  • Obstruction-freedom

非阻塞算法的好处

There are several benefits of non-blocking algorithms compared to blocking algorithms. This section will describe these benefits.

Choice

The first benefit of non-blocking algorithms is, that threads are given a choice about what to do when their requested action cannot be performed. Instead of just being blocked, the request thread has a choice about what to do. Sometimes there is nothing a thread can do. In that case it can choose to block or wait itself, thus freeing up the CPU for other tasks. But at least the requesting thread is given a choice.

On a single CPU system perhaps it makes sense to suspend a thread that cannot perform a desired action, and let other threads which can perform their work run on the CPU. But even on a single CPU system blocking algorithms may lead to problems like deadlock, starvation and other concurrency problems.

No Deadlocks

The second benefit of non-blocking algorithms is, that the suspension of one thread cannot lead to the suspension of other threads. This means that deadlock cannot occur. Two threads cannot be blocked waiting for each other to release a lock they want. Since threads are not blocked when they cannot perform their requested action, they cannot be blocked waiting for each other. Non-blocking algorithms may still result in live lock, where two threads keep attempting some action, but keep being told that it is not possible (because of the actions of the other thread).

No Thread Suspension

Suspending and reactivating a thread is costly. Yes, the costs of suspension and reactivation has gone down over time as operating systems and thread libraries become more efficient. However, there is still a high price to pay for thread suspension and reactivation.

Whenever a thread is blocked it is suspended, thus incurring the overhead of thread suspension and reactivation. Since threads are not suspended by non-blocking algorithms, this overhead does not occur. This means that the CPUs can potentially spend more time performing actual business logic instead of context switching.

On a multi CPU system blocking algorithms can have more significant impact on the overall performance. A thread running on CPU A can be blocked waiting for a thread running on CPU B. This lowers the level of parallelism the application is capable of achieving. Of course, CPU A could just schedule another thread to run, but suspending and activating threads (context switches) are expensive. The less threads need to be suspended the better.

Reduced Thread Latency

Latency in this context means the time between a requested action becomes possible and the thread actually performs it. Since threads are not suspended in non-blocking algorithms they do not have to pay the expensive, slow reactivation overhead. That means that when a requested action becomes possible threads can respond faster and thus reduce their response latency.

The non-blocking algorithms often obtain the lower latency by busy-waiting until the requested action becomes possible. Of course, in a system with high thread contention on the non-blocking data structure, CPUs may end up burning a lot of cycles during these busy waits. This is a thing to keep in mind. Non-blocking algorithms may not be the best if your data structure has high thread contention. However, there are often ways do redesign your application to have less thread contention.

CAS 算法

Refer to https://swsmile.info/post/concurrent-control-cas/

乐观并发控制的缺点

ABA 问题

比如说一个线程 T1 从内存位置V中取出 A ,这时候另一个线程T2也从内存中取出 A ,并且 T2 进行了一些操作变成了 B ,然后 T2 又将 V 位置的数据变成 A,这时候线程 T1 进行 CAS 操作发现内存中仍然是 A ,然后 T1 操作成功。尽管线程 T1CAS 操作成功,但可能存在潜在的问题。

一个通常的解决 ABA 问题的思路是增加时间戳,即在进行数据更新时,不仅仅对比当前值与开始操作之前的值是不是一致,还对比最后修改数据的时间戳是否一致。

循环时间长开销大

自旋 CAS(不成功,就一直循环执行,直到成功)如果长时间不成功,会给 CPU 带来非常大的执行开销。

如果 JVM 能支持处理器提供的 pause 指令那么效率会有一定的提升,pause 指令有两个作用:

  • 它可以延迟流水线执行指令(de-pipeline),使CPU不会消耗过多的执行资源,延迟的时间取决于具体实现的版本,在一些处理器上延迟时间是零。
  • 它可以避免在退出循环的时候因内存顺序冲突(memory order violation)而引起CPU流水线被清空(CPU pipeline flush),从而提高CPU的执行效率。

只能保证一个共享变量的原子操作

当对一个共享变量执行操作时,我们可以使用循环 CAS 的方式来保证原子操作,但是对多个共享变量操作时,循环 CAS 就无法保证操作的原子性。

这个时候就可以用锁,或者有一个取巧的办法,就是把多个共享变量合并成一个共享变量来操作。比如有两个共享变量 i = 2,j = a ,合并一下 ij = 2a ,然后用 CAS 来操作 ij。从 Java 1.5 开始 JDK 提供了 AtomicReference 类来保证引用对象之间的原子性,你可以把多个变量放在一个对象里来进行 CAS 操作。

乐观并发控制的 CAS 与悲观并发控制的 Java synchronized 关键字的使用情景

结论先行:CAS 适用于写比较少的情况下(多读场景,冲突一般较少),Java 的 synchronized 关键字适用于写比较多的情况下(多写场景,冲突一般较多)

  • 对于资源竞争较少(线程冲突较轻)的情况,使用 synchronized 同步锁进行线程阻塞和唤醒切换以及用户态内核态间的切换操作额外浪费消耗 CPU 资源;而 CAS 基于 CPU 指令集实现,不需要进入内核,不需要切换线程,操作自旋几率较少,因此可以获得更高的性能。
  • 对于资源竞争严重(线程冲突严重)的情况,CAS 自旋的概率会比较大,从而浪费更多的 CPU 资源,效率低于 synchronized。

补充: Java 并发编程这个领域中 synchronized 关键字一直都是元老级的角色,很久之前很多人都会称它为 “重量级锁” 。但是,在 JavaSE 1.6 之后进行了主要包括为了减少获得锁和释放锁带来的性能消耗而引入的 偏向锁轻量级锁 以及其它各种优化之后变得在某些情况下并不是那么重了。

synchronized 的底层实现主要依靠 Lock-Free 的队列,基本思路是自旋后阻塞竞争切换后继续竞争锁稍微牺牲了公平性,但获得了高吞吐量。在线程冲突较少的情况下,可以获得和 CAS 类似的性能;而线程冲突严重的情况下,性能远高于 CAS 。

Reference