Memory Barrier/Fence
A memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.
Memory barriers are necessary because most modern CPUs employ performance optimizations that can result in out-of-order execution. This reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution, but can cause unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent and defined by the architecture’s memory ordering model. Some architectures provide multiple barriers for enforcing different ordering constraints.
- Java Memory Model: https://swsmile.info/post/java-memory-model/
- Golang Memory Model: https://swsmile.info/post/golang-memory-model/
Memory Barrier的cause:
- CPU对待执行的变量赋值指令的重排序优化(re-ordering optimizations)
- Compiler对待执行的变量赋值指令的重排序
Memory Barrier 会提供 3 个功能:
-
它确保指令重排序时不会把其后面的指令排到内存屏障之前的位置,也不会把前面的指令排到内存屏障的后面;即在执行到内存屏障这句指令时,在它前面的操作已经全部完成;
-
它会强制将对缓存的修改操作立即写入主存;
-
如果是写操作,它会导致其他 CPU 中对应的缓存行无效。
Guarantees
There are some minimal guarantees that may be expected of a CPU:
On any given CPU, dependent memory accesses will be issued in order, with respect to itself. This means that for:
Q = READ_ONCE(P); D = READ_ONCE(*Q);
the CPU will issue the following memory operations:
Q = LOAD P, D = LOAD *Q
and always in that order.
Example
When a program runs on a single-CPU machine, the hardware performs the necessary bookkeeping to ensure that the program executes as if all memory operations were performed in the order specified by the programmer (program order), so memory barriers are not necessary. However, when the memory is shared with multiple devices, such as other CPUs in a multiprocessor system, or memory-mapped peripherals, out-of-order access may affect program behavior. For example, a second CPU may see memory changes made by the first CPU in a sequence which differs from program order.
A program is run via a process which can be multi-threaded (i.e. a software thread such as pthread as opposed to a hardware thread). Different processes do not share a memory space so this discussion does not apply to two programs, each one running in different process (hence a different memory space). It applies to two or more (software) threads running in a single process (i.e. a single memory space whereby multiple software threads share a single memory space). Multiple software threads, within a single process, may run concurrently on a Multi-core processor.
The following multi-threaded program, running on a Multi-core processor gives an example of how such out-of-order execution can affect program behavior:
Initially, memory locations x
and f
both hold the value 0
. The software thread running on processor #1 loops while the value of f
is zero, then it prints the value of x
. The software thread running on processor #2 stores the value 42
into x
and then stores the value 1
into f
. Pseudo-code for the two program fragments is shown below.
The steps of the program correspond to individual processor instructions.
Thread #1 Core #1:
while (f == 0);
// Memory fence required here
print x;
Thread #2 Core #2:
x = 42;
// Memory fence required here
f = 1;
One might expect the print statement to always print the number “42”; however, if thread #2’s store operations are executed out-of-order, it is possible for f
to be updated before x
, and the print statement might therefore print “0”.
Similarly, thread #1’s load operations may be executed out-of-order and it is possible for x
to be read before f
is checked, and again the print statement might therefore print an unexpected value. For most programs neither of these situations is acceptable. A memory barrier must be inserted before thread #2’s assignment to f
to ensure that the new value of x
is visible to other processors at or prior to the change in the value of f
. Another important point is a memory barrier must also be inserted before thread #1’s access to x
to ensure the value of x
is not read prior to seeing the change in the value of f
.
Java volatile
Refer to https://swsmile.info/post/java-volatile/
Reference
- https://en.wikipedia.org/wiki/Memory_barrier
- https://en.wikipedia.org/wiki/Memory_ordering
- https://www.kernel.org/doc/Documentation/memory-barriers.txt
- https://stackoverflow.com/questions/286629/what-is-a-memory-fence
- https://www.infoq.com/articles/memory_barriers_jvm_concurrency/