[Java Concurrent Learning II] Summary of the basic hardware knowledge of multi-threaded programming-

[Java Concurrent Learning II] Summary of the basic hardware knowledge of multi-threaded programming-

Sign in to registerWrite article homepage download APP

[Java Concurrent Learning II] Summary of the basic hardware knowledge of multi-threaded programming

MrDTree's attention and appreciation support

[Java Concurrent Learning II] Summary of the basic hardware knowledge of multi-threaded programming

Under this sort of simple multi-threaded hardware-related knowledge, they allow us to understand more clearly understand the nature of multi-thread work, as well as keywords synchronized, volatile, finalimplementation principle.

We will find that every hardware component is introduced to solve certain problems, and then they give birth to new problems. (The programmer is in such an endless loop...)


1. Cache concept

Let's talk about the concept of cache first. Nowadays, the processor's operating speed is much faster than the read and write speed of memory. In order to fill the gap between the two, hardware designers introduced the concept of cache.

As shown in the picture on the left, the cache is a storage component whose access rate is much faster than that of the memory and the capacity is much smaller than that of the main memory. With it, the CPU no longer directly deals with the main memory, but reads and writes data in the cache, and the cache reads and writes with the main memory.

As shown on the right, modern processors generally have multiple levels of cache, which are divided into level one cache (L1d Cache), level two cache (L2d Cache), and level three cache (L3d Cache). The first level cache may be directly integrated in the processor core, the access speed is the first level cache> the second level cache> the third level cache. Storage capacity Level 1 cache <Level 2 cache <Level 3 cache.

2. Cache structure

The internal structure of the cache is a zipper hash table, which is very similar to HashMapthe underlying structure and principle (see HashMap principle analysis ). It is divided into several buckets, each bucket is a linked list containing several cache entries ( Cache Line)

The cache entry can be further divided into three parts:

  1. Data Block (cache line): store data read from the main memory and data to be written to the main memory, a cache line can store multiple variables
  2. Tag: contains the location information of the data in the cache line
  3. Flag: status information of the cache line

When the CPU accesses the memory, it will obtain the corresponding data in the cache through three data decoded by the memory address: index(bucket number), tag(relative number of the cache entry), and offset(location offset of the variable in the cache entry).

If it is found and the Flag of the cache line is valid, the cache hits; otherwise, the cache will load the corresponding data from the main memory, and the processor will be in a stall state in this process and cannot execute other instructions.

3. Cache Coherence Protocol (MESI)

Since each processor has its own cache, when multiple threads concurrently access the same shared variable, the processors where these threads are located each save a copy of the shared variable. When a processor updates the copy data, how can other processors detect and react? This depends on: Cache Coherency Protocol (MESI Protocol)

Note: The concepts in the following two sections can be quickly browsed, mainly combined with the examples in section 3.3 to come back to understand

3.1 MESI concept

The MESI protocol divides the status of cache entries into four types:

  • M: Modified (Modified)
    The data in the cache line has been modified and is inconsistent with the data in the main memory. At any one time, in the caches of multiple processors, only one cache entry with the same Tag value can be in this state.
    Before other CPUs read the cache entry data of the same Tag, the data in the cache line will be written back to the main memory and then become an exclusive state

  • E: Exclusive (Exclusive)
    This cache line exclusively retains the copy data of the corresponding memory address, and all other caches on the CPU cannot retain a valid copy of the data. The data of the cache entry is consistent with the data in the main memory.
    When other CPUs read the memory at any time, the state will become shared; when the CPU modifies the content in the cache line, the state will become Modified.

  • S: Shared (Shared)
    The data in the cache line may be cached by multiple CPUs, and the data in each cache is consistent with the main memory data.
    When a CPU modifies the cache line data, the cache line in other CPUs can be invalidated, that is, it becomes an invalid state (Invalid).

  • I: Invalid (Invalid) The cache line is invalid and does not contain any valid copy data corresponding to the memory address. This state is the initial state of the cache entry.

3.2 MESI message

In addition, in order to communicate between caches and coordinate the read and write memory operations of each processor, the MESI protocol also defines the following set of messages:

Request message description Return message description
Read Notify other processors that the main memory is preparing to read some data. The message contains the memory address of the data to be read Read Response Return the data requested to be read. The message may be returned from the main memory or other caches.
Invalidate Notify other processors to set the status of the cache entry corresponding to the specified memory address in the cache I, that is , notify these processors to delete the copy data of the specified memory address Invalidate Acknowledge Receiving a Invalidatemessage must reply to the message, indicating that the data in its cache has been deleted
Read Invalidate A composite message composed of Readand Invalidate. The function is to notify other processors that the current processor is ready to update (Read-Modify-Write) a piece of data and request other processors to delete the copy data in its own cache. Read Response and Invalidate Acknowledge The processor that receives the request must return these two messages
Writeback The message contains the data that needs to be written to the main memory and its corresponding memory address

When the processor performs memory read and write operations, it will send specific request messages to the bus when necessary. At the same time, each processor will also sniff (also known as intercepting) the request messages sent by other processors in the bus. Reply the corresponding response message to the bus under certain conditions.

3.3 Examples

The above concept is just a simple look. The main thing we need to understand is: The MESI protocol controls memory data access similar to a read-write lock, which makes the read memory operation for the same address concurrent, while the write memory operation for the same address is Exclusive. So as to ensure the consistency of data between caches

Now let's take two examples to look at the specific process:

Concurrent Read
When Processor 0 wants to read the data S in the cache, if it finds that the cache entry status of S is M, E, or S, the processor can read the data directly.

If the status of the cache entry where S is located is I, it means that the cache of Processor 0 does not contain valid data of S. At this time, Processor 0 will send a Read message to the bus to read the valid data of S, and other processors whose cache status is not I (such as Process 1) or main memory (other processors cache entries are all I when slave Main memory read) After receiving the message, it needs to reply Read Response to return the valid S data to the sender.

It should be noted that other processors that return valid data (such as Process 1), if the state is M, will write the data to the main memory first, and the state is E at this time, and then update the state after returning the Read Response For S.

In this way, Processor 0 will always read the latest data. Even if other processors make changes to this data, they will get the latest modification information of other processors.

Exclusive write
When processor 0 wants to write data to address A, if the cache entry status of address A is E, M, it means that Processor 0 has the exclusive right to the data, and Processor 0 can directly write the data to A , And then change the cache entry status to M

If the state of the written cache entry is S, Processor 0 needs to send an Invalidate message to the bus to obtain the exclusive right of the cache entry. After receiving the Invalidate Acknowledge message returned by all other processors, Processor 0 will determine that it has obtained it. Exclusive right, and then update the data to address A, and change the status of the corresponding cache entry to M

If the state of the written cache entry is I, processor 0 needs to send a Read Invalidate message to the bus to obtain the exclusive right of the cache entry, and the other steps are the same as S

It should be noted that if other processors that receive the Invalidate message and the cache entry status is M, the processor will first write the data to the main memory (to facilitate the processor that sent the Read Invalidate command to read the latest value), Then change the status to I

In this way, when Processor 0 writes with other processors, only one processor can obtain exclusive rights, that is, mutually exclusive writes are realized.

Write buffer and invalidation queue

According to the MESI protocol above, when multiple threads concurrently access the same shared variable, concurrent reads and mutually exclusive writes should have solved the problem of data consistency, so why do we still have "visibility" in programming so that threads are not safe What's the problem?

The reason lies in the introduction of write buffers and invalidation queues. Although the MESI protocol solves the problem of cache coherency, it has a performance defect: each time a processor writes data, it has to wait for all other processors to delete the corresponding data in its cache and receive the Read they return. The write operation is performed after the Response and Invalidate Acknowledge messages. This process is undoubtedly very time consuming.

Helpless hardware designers, after solving the cache coherency problem, in order to solve the newly emerging performance problems, they introduced new components: write buffers and invalidation queues.

1. Write buffer

The write buffer is a high-speed storage component inside the processor with a capacity smaller than the cache. Each processor has its own write buffer, and one processor cannot read the contents of the write buffer on another processor. The introduction of the write buffer is mainly to solve the aforementioned MESI write delay problem.

1.1 Write operation process

After introducing the write buffer, when the processor wants to write data:
if the corresponding cache entry status is E, M, then write directly without sending a message (as usual)
If the corresponding cache entry status is S, the processor will Information about the write operation is stored in the write buffer and an Invalidate message is sent. (No longer waiting for the response message)
If the corresponding cache entry status is I and a "write miss" occurs, store the write operation related information in the write buffer , and the Read Invalidate message is sent. (No longer waiting for response message)

When the processor writes the write operation into the write buffer, it is considered that the write operation has been completed. In fact, when the processor receives the Read Response and Invalidate Acknowledge messages from all other processors, the processor will write the corresponding write operation in the write buffer to the corresponding cache line. At this time, the write operation is counted. Really complete.

The write buffer allows the processor to perform a write operation without additional waiting, reduces the delay of the write operation, and improves the processor's instruction execution efficiency.

1.2 Read operation process

After the write buffer is introduced, when the processor reads data, since the update result of the data may still stay in the write buffer, the processor will first look for the data in the write buffer, and only from the cache when it is not found Find.

This technology in which the processor reads data directly from the write buffer is called: store-and-forward

2. Invalidate the queue

After receiving the Invalidate message, the processor does not immediately delete the copy data corresponding to the specified address in the message, but returns the Invalidate Acknowledge message after storing the message in the invalidation queue, thereby reducing the waiting time of the processor performing the write operation .

It should be noted that some processors (such as X86) may not use invalidation queues

3. New problems caused by write buffers and invalidation queues

The introduction of write buffers and invalidation queues brings performance improvements, and at the same time brings two new problems: memory reordering and visibility

3.1 Memory reordering

Processor 1 Processor 2
data = 1;//S1
ready = true;//S2
while(!ready) continue;//L3

As shown in the above table, the datainitial value of the variable is 0, and the readyinitial value of the variable is false. The two processors Processor 1 and Processor 2 execute the above code on their respective threads. The absolute time sequence of execution is S1 >S2 >L3 >L4

  • Take the StoreStore (write and write) operation as an example, look at the reordering caused by the write buffer.
    If the write operation of the data value of step S1 is written to the write buffer, it has not actually been written to the cache, and the ready value of step S2 is The write operation has been written to the cache. Then when reading the ready value in step L3, according to the MESI protocol, the correct ready value will be read: true; but when reading data in step L4, the initial value of data 0 will be read instead of writing in another processor The value in the buffer: Modified value 1
    From Processor 2, the execution order of S1 and S2 seems to be reversed, that is, reordering has occurred.

  • Take LoadLoad (read and read) as an example, see that the reordering caused by the invalidation queue is the same as the above steps. S2 has been synchronized to the cache, S1 is written to the write buffer, and an Invalidate message is sent. When L3 is executed, the correct value is read: true. When L4 is executed, due to the invalidation of the queue, although Processor 2 sends an Invalidate Acknowledge message, it does not delete the data in its cache, so it will read Data to its cache: 0

3.2 Visibility

The contents of the write buffer of a processor cannot be read by other processors. This also causes that after a processor updates a shared variable, other processors cannot see the updated value, that is, Visibility.

The write buffer is the hardware source of visibility problems.

Memory barrier

In order to solve the problem of visibility and reordering caused by write buffers and invalidation queues, worried hardware designers have introduced a new solution: memory barriers.

A memory barrier is an instruction inserted between two CPU instructions to prohibit reordering of processor instructions (like a barrier) to ensure orderliness. In addition, in order to achieve the effect of the barrier, it will also cause the processor to write the value of the write buffer to the cache before writing or reading the value, and clear the invalid queue, thereby "incidentally" guaranteeing visibility.

Memory barriers in fact volatile, synchronizedthe principle underlying implementation details will be discussed in the next chapter.

This article is summarized from: Huang Wenhai "Java Multithreaded Programming Practical Guide"-Chapter 12

Recommend to read more exciting content

  • Spring Cloud Spring Cloud provides developers with tools to quickly build some common patterns in distributed systems (such as configuration management, service discovery, circuit breakers, smart... Kakaro 2017 read 77,319 comments 13 likes 118
  • Java high-concurrency programming study notes (2) -Concurrency basics 1. CPU multi-level cache The left picture is the simplest cache configuration, data reading and storage are cached, CPU core and high-speed cache have a special. .. Schrodinger's cat _1406 Read 78 Comments 1 Like 3
  • Java Interview Collection Beta5.0 pdf download address: Java Interview Collection Chapter One Content Introduction 20 Chapter Two JavaSE Basics 21 1. Java Object-Oriented 21... Wang Zhenyang Read 80,685 Comments 26 Likes 514
  • Basic knowledge of Java The size of the nine basic data types and their encapsulation classes. (1) 9.basic data types and packaging classes (2) Automatic boxing and automatic unboxing What is automatic boxing... Guan Weilin lin Sir Read 956 comments 1 Like 47
  • Feeling in the rain, the sound of rain is gurgling, the phoenix is whirring...In Chinese culture, rain is always the sweat of the brave! Through thousands of years of history, we experience the rain, feel the rain,... Wei Yuwei Read 90 Comments 0 Likes 0
Comment 3 Like 7 7 Likes 8 Likes Praise