Is it enough to use MySQL transaction skillfully? In a conversation with Tencent bosses, it turns out that the test centers are all in these aspects!

Is it enough to use MySQL transaction skillfully? In a conversation with Tencent bosses, it turns out that the test centers are all in these aspects!

The vast earth is broken with one sword, where the prosperity will fall . Is there no place for me in the whole world? Why does this interviewer always have to compete with me? Isn't this embarrassing?

Every time I experience pain and comprehend, I will always go down the mountain leisurely and feel sad alone. There is a sea in the record, and only by working alone can we achieve the best in the industry.

Preface

One sunny morning, a man dressed in a snow-white white shirt and gray and white casual shorts walked towards me. It also showed me the black legs that showed the true color of a man. Holding a silver metallic MacBook Pro in hand, plus a pair of white shoes. Looking at the scarce amount of hair, under the shining of the light, there is even a little reflection. Through that unhurried face.

I can't bear a tremor in my heart, I'm afraid I'm going to meet someone today. A whole uncle of an architect.

Sure enough, he took my resume in his hand, scanned it quickly, and then looked at me from the corner of his eye, as if he was weighing something. Ask when you come up.

Introduction and principle of transaction

Interviewer: See your resume description proficient in Mysql optimization methods, then you can talk about your understanding of Mysql affairs first.

I calmed down a little bit by myself. This is not difficult, because it is a method for transaction definition and execution to be called. I don t know that the following content is all about torturing me.

Zhazhanghui: Well, database transaction refers to a set of SQL statements to access various data items of the database logic processing unit , in this set of SQL operations, either all executed successfully, or all executed failed.

A classic example is transfer. Transaction A needs to be transferred. Then, the transferred account will be deducted, and the transferred account will be charged. Both operations are executed successfully at the same time, or both of them fail. To do so is to ensure data consistency.

Zhazhahui: Generally, when requesting to enter MySQL, data is processed in the service layer according to different request types, and then the data is flushed to disk by a background thread. So as to avoid frequent read and write disks.

After answering, he smiled and modulated his hair, took care of his hairstyle, and tidyed up the bow tie on his neck.

Interviewer: Yelling, this kid still has some ideas. If you dare to break ground on Tai Sui's head, you have to turn against God. I'm still here to bully the lack of hair on my head. A bastard.

Interviewer: You just mentioned the internal work of MySQL. Do you know the characteristics of transactions? How is the data refreshed? How to modify the service layer data? What are the benefits of working in this way? How to avoid frequent writes? Do read requests and write requests work the same in the service layer?

I was dull inside, this old bald donkey, don't want to die, he caught asking me like that. It seems that I have to be more serious, feeling a murderous aura in the air.

She smiled awkwardly at him. Speaking immediately

Zhanhui: The four major characteristics of transactions in Mysql mainly include:
Atomicity,
Consistent,
Isalotion,
Durable, and ACID for short. Together, they ensure the consistency of the data.

Interviewer: Well, tell me about your understanding of these four characteristics?

:

  • Atomicity: As the name suggests, an atom is the smallest unit of elements, so a transaction is an inseparable smallest unit of work. The entire transaction operation is either all executed successfully, or all failed and rolled back.

Because execution success and failure are both state changes, if the execution effect is only half, the data will be incomplete, which violates the characteristics of the atom.

  • Consistency: Refers to the transition of the database from one consistent state to another consistent state. To ensure that the status before and after the transaction must be consistent

  • Isolation: Modifications made by one firm are invisible to other transactions before they are finally submitted.

  • Persistence: Once the transaction is committed, the changes made will be permanently saved to the disk. The execution result is always effective.

Interviewer: How does the bottom layer of the database implement these four characteristics?

Suddenly, I looked at his big, greasy face, but he did look at me calmly, how easy-going his face was. At this moment, my heart is constantly struggling, and the first question has been solved so much for me. What can I do later? I feel suspended

:

  • Atomicity: Guaranteed by undo log, also known as rollback log, mainly used to record information before data is modified

The undo log records the logical changes of the data. Therefore, the previous operations need to be recorded, and then the previous operation can be guaranteed to be rolled back when an error occurs or the transaction is rolled back.

Interviewer: How is the request process implemented?

For example, when a write request is an insert operation and enters the database service layer, it will first write data in the write buffer under the buffer pool, and then write the physical language of the write request to the read log, and record a delete current The logical statement of the statement goes to the undo log. Finally, the MySQL background thread periodically refreshes the data to the disk.

But the write buffer before the 5.5 called insert buffer , only for insert optimization; now update and delete update operations are effective. Now called: write buffer

Interviewer: What are the benefits of this write buffer?

Shaohui:
Avoid frequent disk I/O write operations. If you don t have this, you need to read the data in the disk before you can modify the data.

However, in most scenarios, there are more reads and less writes. If in high-concurrency scenarios, packaged batch processing can effectively avoid the resource overhead during write operation processing, which affects a large number of read requests.

  • Consistency: Guaranteed by Redo log, also called redo log, which is used to achieve transaction durability. The log file consists of two parts: redo log buffer (redo log buffer) and redo log file (redo log), the former is in memory, the latter is in disk.

    When the transaction is committed, all modification information will be stored in the log. Suppose there is a table called t1(id,username) and now you want to insert data (3,'Zhazhahui').

start transaction;
 //  
insert into t1 (id,username) values (3 ' ')
//    id =3 , username = ' '
commit;
 

Interviewer: What is the role of this redo log?

In order to improve performance, each modification will not be synchronized to the disk in real time, but will be stored in the Boffer Pool (buffer pool) first, and this will be used as a cache. Then use a background thread to synchronize the buffer pool and the disk.

Interviewer: What if the data in the buffer pool is down or out of power before the data in the buffer pool arrives in a hurry? This will lead to the loss of modification information of part of the committed transaction!

Zhanhui: So redo log is introduced to record the modification information of successfully committed transactions, and the redo log will be persisted to disk, and the latest data will be restored after the system restarts after reading the redo log

Interviewer: Can give a scenario, how to apply them?

Zhazhanghui: If the database crashes at some point, related transactions are being executed before the crash, half have been committed, and half have not yet been committed. When the database is restarted for crash-recovery , the changes of the committed transaction will be loaded into the memory through the Redo log, and then written to the data file, and the uncommitted ones will be rolled back through the Undo log.

Interviewer: Well, continue

At this time, my heart is hairy. Answered to this level, and if I didn't change my face, I just continued to send me away? You are really in advanced studies.

:

  • Isolation is also called multi-version concurrency control.

InnoDB's MVCC is implemented by storing two hidden columns behind each row of records. Of these two columns, one stores the creation time of the row, and the other stores the expiration time (or deletion time) of the row. Of course, what is stored is not the actual time value, but the system version number.
--Excerpt from the definition of MVCC in the book "High Performance Mysql"

Interviewer: How does MySQL implement different SQL requests?

:

  • The select request compares the current transaction version number of the system with the version number in the data column, and first filters the data column with expiration time. Then take out the records that meet the current version number or less.
  • insert innodb will keep the current system version number as the row version for each row inserted
  • delete innodb to delete each row, save the current system version number as the row delete identifier.
  • update innodb inserts a new row for insert, saves the current system version number as the row version number, and saves the current system version number to the original row as the row delete identifier.

Interviewer: What are the benefits of this mechanism?

Zhazhaohui:
Its main realization idea is to achieve read-write separation through multiple versions of data. Each request is given a different version of data for execution. So as to achieve read and write without lock and achieve parallel read and write.
The implementation of MVCC mechanism also depends on undo log and read view

undo log: Record multiple versions of data recorded in a row in the undo log.

read view: used to judge the readability of the current version of the data

Interviewer: Then you can draw a picture for me to see

What the hell, this thing still draws pictures. I looked at him directly with those big watery eyes. I saw him talking about it.

** Interviewer: Didn t you understand? Is to draw a schematic diagram with a whiteboard pen

Zha Zhahui: I hurried to respond. Amount amount, okay

Zhazhaohui: In this way, each request for modification will correspond to a version.

Interviewer: How does MVCC ensure that multiple versions of data are eventually consistent?

Zhazhahui: Because data operations are multiple versions, it supports parallel reading and writing.

Before requesting a transaction operation, it will first enter the read and write queues, and then execute the read and write requests respectively. If the first write request executes the transaction but is not committed, then it will get the latest version.

If the read request is entered at this time, it will also be executed directly, but the system version when reading the data is the version number of the unexecuted write request, and the data is read according to it.

In this way, before the transaction is committed, the data read by all read operations is the same because it is controlled by the current version of the system. The same version data is read.

Interviewer: MySQL is multi-threaded, so there are multiple threads that modify the same row of data at the same time. How to deal with this situation?

Shaohui: Generally, if the system achieves concurrent execution, the requirements for user requests are very high, at least tens of thousands of requests per second, if there are concurrency problems.

If both transactions A and B need to perform an update operation, when transaction A updates data first, it will first acquire the row lock and lock the data. When transaction B needs to perform the update operation, it will also try to acquire the row lock of the data row, but at this time Already occupied by transaction A, story service B can only wait.

Shaohui: For example, transaction A is a very time-consuming large SQL, and the lock has not been released for a long time. Then transaction B will wait for the timeout.

Query the global waiting transaction lock timeout time
SHOW GLOBAL VARIABLES LIKE'innodb_lock_wait_timeout';

Interviewer: This is the condition after the where of the update is when there is an index, right?

Zhanhui: Well, yes, because the number of rows in innodb is only triggered by the index, and the lock is the row lock

Interviewer: If there is no index, there is no way to quickly locate the data row?

Shaohui: If there is no index, it will degenerate into a table lock. Then after all rows are obtained, Mysql filters the rows that meet the conditions and releases the lock. In the future, only eligible rows will hold the lock.

Zhazhahui: This reduces the concurrency, and the performance overhead will be large.

Zhazhahui: And the final consistency is actually achieved through atomicity, persistence, and isolation

At this moment, I feel that I am somewhat relaxed, a little proud inside, and I feel that I have reached the peak moment. Then he took the mineral water in front of him, took a sip, and wiped off his sweat by the way. Everything is so smooth. However, the interviewer's face remained unchanged, as if something was brewing. This did not start again.

The working principle of MySQL kernel

Interviewer: Then please continue how to flush the data? How does this working mode avoid frequent reading and writing?

Zhanhui: When it comes to data flushing, first of all, I have to talk about the innodb buffer pool.

The buffer pool is mainly composed of data, index, insert buffer, adaptive hash index, lock, etc.

Different requests will perform related operations based on the corresponding part of the data in the buffer pool. Data buffer pool is periodically synchronized to the disk according to the configuration parameters; data only from the main brush plate where data and log composition.

data

* Determined by the innodb_max_dirty_pages_pct parameter. Before talking about this, I have to talk about dirty pages .

Dirty pages refer to data that exists in memory, but are not synchronized to the persistent storage. Because the data in the database is read by pages, it is called dirty pages.

The innodb_max_dirty_pages_pct parameter can be dynamically adjusted, the minimum value is 0, the maximum value is 99.99, and the default value is 75.

The ratio is calculated according to the size of the buffer pool and the number of dirty pages stored, and if it is satisfied, the background thread is called to flush the data to the persistence layer.

The advantage of this is to merge other data pages, thereby improving the efficiency of writing.

Log

The innodb_flush_log_at_trx_commit variable controls the frequency of log buffer flushing.

  • 0: The log buffer is written to the log file, and the persistent storage is refreshed every second, but nothing is done when the transaction is committed.

  • 1: Write the log buffer to the log file, and flush to the persistent storage for each transaction commit. This is the default (and the safest) setting. This setting can guarantee that any committed transactions will not be lost unless the disk or the operating system is "pseudo" flushed, that is, written to the disk buffer .

  • 2: The log buffer is written to the log file every time it is submitted, but it is not flushed to the persistent storage. InnoDB refreshes to persistent storage once every second. Generally, 2 is also selected. If the MySQL process "hangs", 2 will not lose any transactions. If the entire server "hangs" or loses power, you may still lose 1s of transactions.

Why does the above write the log buffer to the log file, but not the persistent storage yet?

Because in most operating systems, writing the log buffer to the log file simply transfers the data from InnoDB's memory buffer to the operating system's cache, which is also in the memory, and does not actually write the data to the persistent storage. Therefore, it is also necessary to flash the data disk synchronization.

Shaohui: The buffer pool is to reduce frequent read and write requests. From this point of view:

  • Read request: Request to find the cached data under the buffer pool first, so as to avoid disk IO for every access.
  • Write request: When the transaction request is executed, the data is first recorded in the log file according to the log buffer configuration. When the dirty page of the data reaches a certain percentage or the memory is insufficient, the data will be landed on the disk.
    Thus, the random I/O of a large number of disks is rewritten into sequential I/O.
    Each update request is based on the MVCC feature to get the latest version of the current operating data, so that multiple versions can be executed concurrently.

Interviewer: Yeah

I felt that I was squeezed dry in an instant, no matter how I made such moves, fortunately I was somewhat prepared before. Suddenly I felt sweat dripping from behind me. I saw that the interviewer looked at the resume, his expression on his face was peaceful, and he felt that he was looking at something. . . .

Interviewer: Boy, let s meet here today, I think you seem to be a bit hot, go back and wait for the notice.

Feelings are only appetizing when you are old! It can be done like this. Can you give it to Juhua? It makes me sweat so much, how much water I have to drink to make it up, it's worthy of you to make up. Burst in an instant

Zhazhaohui: That's it, that's okay. . . .

I can t help it, just for the unwilling. If it helps, follow the shared articles for continuous updates. You can search "Lotus Nezha" on WeChat to read it for the first time, as well as online technical Q&A group chat

)