WeChat mobile terminal database component WCDB series (2) Three quick fixes for database repair

WeChat mobile terminal database component WCDB series (2) 3.quick fixes for database repair

Preface

SQLite DB has been damaged for a long time, from mobile systems such as Android and iOS to desktop systems such as Windows and Linux. Since all WeChat messages are stored in DB, the server does not keep backups. Once damaged, user messages will be emptied, which is obviously unacceptable.

We are about to open source the mobile database component WCDB (WeChat Database), dedicated to solving the problem of data loss caused by DB damage.

The previous article "WeChat SQLite Database Repair Practice" introduced WeChat's practice of SQLite database repair and reduction of damage rate. This time we will introduce the specific plan and development process of WeChat database repair in depth.

Our needs

Specifically, WeChat needs a set of DB recovery solutions that meet the following conditions:

  • The recovery success rate is high.  Due to the core user data involved, the just give it a try solution is not enough. Although a 100% success rate is not realistic, a success rate of 90% or even more than 99% is what we want.

  • Support encrypted DB.  The Android WeChat client uses encrypted SQLCipher DB. Encryption will change the arrangement of information. Often, one byte of ciphertext can be changed to make a large piece of data unrecognizable after decryption. This is not good news for data recovery, and our solution must deal with this situation.

  • It can handle large amounts of data.  After statistical analysis, the DB size of individual heavy users has exceeded 2GB, and the recovery plan must ensure that the chain is not dropped under such a large amount of data.

  • Does not affect the experience.  Statistics found that only less than one in ten thousand users will experience DB damage. If the recovery plan needs to be prepared in advance (such as backup), it must be invisible to the user, and the experience of all users cannot be sacrificed for the very individual.

After years of continuous improvement, WeChat has adopted three different DB recovery solutions, which are getting closer and closer to the above goals.

The official Dump recovery program

Google SQLite DB recovery, it is not difficult to find a way to .dumprestore DB using commands. .dumpThe function of the command is to output the contents of the entire database as many SQL statements. As long as these statements are executed on an empty DB, the same DB can be obtained.

.dumpThe command principle is very simple: each SQLite DB has a sqlite_mastertable, which stores all table and index information (the information of the table itself, excluding the data inside), and you can get the names and CREATE TABLE ...SQL statements of all tables by traversing it  , Output the CREATE TABLEstatement, and then use the SELECT * FROM ... table name to traverse the entire table, output a INSERTstatement every time a row is read , and dump the entire DB after the traversal. This operation is the same as ordinary table lookup. It will return when it encounters damage SQLITE_CORRUPT. We ignore the damage error and continue to traverse the next table. Finally, we can read all the undamaged tables and the first half of the damaged tables . . Execute the dumped SQL statements line by line, and finally you can get an equivalent new DB. Since it runs directly on top of SQLite, it naturally supports encrypted SQLCipher without additional processing.

(Figure: Sample dump output)

This solution does not require any preparation, only users with bad DBs will take several minutes to run to recover, and most users do not perceive it. The amount of data mainly affects the temporary space required for recovery: first save the SQL space from the dump, which is about double the size of the DB, and double the size of the DB to create a new DB for recovery. As for the success rate we are most concerned about? After going online, the success rate is about 30% . The definition of this success rate is that at least one record has been restored, which means that most of the users have failed to restore one!

After researching, it is found that the user who failed to restore is because the sqlite_mastertable cannot be read, especially if the first page is damaged, it will cause all subsequent content to be unreadable, so it cannot be restored at all. The embarrassing situation of such a low recovery rate lasted for a long time, and other plans gradually emerged.

Backup and recovery plan

Damaged data cannot be repaired, and the most intuitive solution is to back up, so the backup and recovery plan is put on the agenda. The idea of backup and recovery is simple, and SQLite also has many backup mechanisms that can be used, specifically:

  • copy:  a way can no longer be straightforward. Since SQLite DB itself is a file (main DB + journal or WAL), copying the file directly can achieve the purpose of backup.

  • Dump:  The original purpose of the command used in the previous recovery plan. Execute when the DB is intact .dump, output all the contents of the DB as SQL statements to achieve the purpose of backup, and execute SQL when restoring.

  • Backup API:  A set of backup mechanism provided by SQLite itself, which is copied to the new DB in units of Pages, and supports hot backup.

Which is better than so many solutions? As a mobile app, all we care about is  backup size, backup performance, and recovery performance  indicators. As a heavy DB user, WeChat backup size and backup performance are the main concerns. The original user may have a 2GB DB. If the backup data itself is also 2GB in size, the user will probably not accept it; performance mainly affects the experience and backup success As a function that users don t perceive, it s impossible to take up too much system resources to cause a freeze. The longer the backup takes, the higher the probability of accidents such as being killed by the system.

After a simple test of the above plan, the backup plan is basically set. The DB size for testing is about  50MB , and the number of data entries is about  100,000 :

(Figure: Performance comparison of alternatives)

It can be seen that the more compromised choice is  Dump + compression . The backup size has obvious advantages, the backup performance is acceptable, and the recovery performance is poor. However, because there are fewer scenes to be restored, it can be regarded as an acceptable shortcoming.

WeChat is optimized on the Dump + gzip solution. Since formatting the SQL statement output takes a long time, it uses a custom binary format to carry the dump output. The second time-consuming compression operation is carried out in other threads at the same time, and basically no extra time consumption can be achieved in an environment with dual cores or more. Due to the need for data confidentiality, binary dump data has also been encrypted. Another advantage of using a custom binary format is that there is no need to repeatedly compile SQL statements when restoring, and the data of the entire table can be inserted after compiling once, and the restore performance is also improved to a certain extent. Compared with the original Dump + compression, the optimized solution  has increased the number of backup rows per second by 150% and the number of restored rows per second by 40%. .

(Figure: Performance optimization effect)

Even the optimized solution is time-consuming and power-consuming for extra-large DB backups. For mobile apps, there may not necessarily be such an opportunity to perform such heavy operations, or frequent backups will cause stalls. This also needs to be measured by developers. For example, Android WeChat will choose to  charge and turn off the screen  to perform DB backup when is . If you exit the above state during the backup process, the backup will be suspended and wait for the next opportunity.

After the backup solution went online, the recovery success rate reached 72% . However, when the DB of some severe users was damaged, the backup took too long and never succeeded. It is precisely these users who are more sensitive to DB data loss, so the new solution Came into being.

Analyze the B-tree recovery plan (RepairKit)

The high consumption of the backup solution forced us to consider another solution, so we once again focused on the previous Dump solution. The Dump scheme is essentially an attempt to read information from the bad DB. Generally speaking, this attempt will have two results:

  • The basic format of the DB is still alive, but individual data is damaged. SQLite returns
    SQLITE_CORRUPTan error when it is read, but the data that has been read can be restored.

  • The basic format is lost (file header or sqlite_masterdamaged), it will be returned when the tables are obtained, and there SQLITE_CORRUPTis no way to restore it.

The first one can be regarded as expected behavior, after all, no damaged data can be  partially recovered . Judging from the previous data, many users have encountered the second situation. Is there any room for salvation?

To answer this question, you must first figure out sqlite_masterwhat it is. It is a special table that every SQLite DB has. Whether you check the official document Database File Format or execute SQL statements  SELECT * FROM sqlite_master;, you can know that this system table saves the following information: table name, type (table/index), creation The SQL statement of this table/index, and the RootPage of the table. sqlite_masterThe table name and table structure are fixed, defined by the file format, and RootPage is fixed to page 1.

(Figure: sqlite_master table)

Under normal circumstances, after the SQLite engine opens the DB for the first time, it needs to be traversed first sqlite_master, and the SQL statements stored in it are parsed again, and stored in the memory for subsequent use in compiling SQL statements. If it is sqlite_masterdamaged and cannot be resolved, the "Dump recovery" method of taking the normal SQLite process will naturally be stuck in the first step. In order to allow the sqlite_masterdamaged DB to be opened, it is necessary to find a way to bypass the logic of the SQLite engine. Since the initialization logic of the SQLite engine is more complicated, in order to avoid side effects, the logic is not reused in a hack, but decided to imitate a minimal system that can only read data.

Although the fake minimization system can skip a lot of correctness checks, sqlite_masterthe information stored in it is also very important for recovery, especially RootPage, because it is the location of the root node of the B-tree structure corresponding to the table, without it We don't even know where to start parsing the corresponding table.

sqlite_masterThe amount of information is relatively small, and it will only change when the table structure is changed (such as execution CREATE TABLE, ALTER TABLE etc.). Therefore, the cost of backing up it is very low. Generally, it only takes a few milliseconds to tens of milliseconds to complete it. , Consistency is also easy to ensure, only need to re-backup once the above statement is executed. With backup, our logic can read the built-in DBsqlite_master use the backup information instead when failure.

In addition to the file header and sqlite_masterintegrity, the problem of DB initialization is encryption. SQLCipher encrypts the database, and the corresponding recovery logic also needs to add decryption logic. According to the implementation of SQLCipher, the encrypted DB is a complete encryption including the header according to the page. The key used is based on the original password entered by the user and the salt generated randomly when the DB is created. It can be guessed that if the salt is saved incorrectly, there will be no way to get the key used for encryption, which will cause all pages to be unreadable. Since the salt is randomly generated when the DB is created, it will not be modified later, and it can be included in the scope of the backup.

At this point, the data necessary for initialization is guaranteed, and the read logic can be imitated. The methods we routinely use to read DB (including dump recovery) are all implemented by executing SQL statements, which involves the SQL execution engine, the most complex subsystem of the SQLite system. Our recovery task only needs to traverse all the nodes of the B-tree and read the data to complete, without complex query logic, so the most complex SQL engine can be omitted. At the same time, because our system is read-only, you only need to directly call the SQLite interface to write the restored data to the new DB, so the same complicated logic of B-tree balance, journal, and synchronization can be omitted. The minimum system for the final recovery only needs:

  • VFS read part of the interface (Open/Read/Close), or directly use stdio's fopen/fread, Posix's open/read also

  • The decryption logic of SQLCipher

  • B-tree analysis logic

It can be achieved.

(Picture: Minimize the system)

Database File Format describes the SQLite file format in detail. Refer to the implementation of B-tree analysis to read SQLite DB. Encrypting SQLCipher is more complicated. Fortunately, the SQLCipher encryption part can be extracted separately and its decryption logic can be directly applied.

After the above logic is realized, the data in the DB can be read out for recovery, but there is still an episode. We know that using SQLite to query a table, the number of columns in each row is the same, which is guaranteed at the Schema level. But in the lower layer of the schema-the B-tree layer, there is no such guarantee. Each row of the B-tree (or each entry, each record) can have a different number of columns. Generally speaking, when SQLite inserts a row, the number of columns in the B-tree is the same as the number of columns in the actual table. But when an ALTER TABLE ADD COLUMNoperation is performed on a table , a column is added to the entire table, but the existing B-tree rows are actually not changed, and the original number of columns is still maintained. When SQLite finds the ALTER TABLEprevious row, the missing columns will be automatically filled in with default values. When recovering, you need to make the same judgment and support, otherwise there will be missing columns and you cannot insert into the new DB.

After the analysis of the B-tree solution went online, the success rate was about 78% . This success rate is calculated by dividing the number of successfully restored pages by the total number of pages. Since it is our own system, we can know the total number of pages, and the calculation method using the proportion of restored pages can reflect the real situation better than the number of people. The advantage of B-tree analysis is that the preparation cost is low, there is no need to update the backup frequently, and the backup overhead for most applications with relatively few tables is also small to almost negligible. After successful recovery, the latest data at the time of damage can be restored, and it is not affected by the backup time limit. . The disadvantage is that, like Dump, if it is damaged to the middle part of the table, such as a non-leaf node, subsequent data cannot be read.

Combination of different schemes

Because the principle of analyzing B-tree recovery is different from that of backup and recovery, the failure scenarios are also different. The two methods can be mixed to cover more damaged scenarios. In the WeChat database, some data is temporary or can be pulled from the server. This part of the data can choose not to be repaired. Some data is unrecoverable or the recovery cost is high, so it needs to be repaired.

If the repair process is successful all the way, it is undoubtedly better to use B-tree analysis and repair than backup recovery. Due to the timeliness of backup and recovery, some of the latest records will always be lost. Analysis and repair are directly based on the damaged DB, so there is no timeliness issue. If the damaged part is in the part that does not need to be repaired, the analytical repair may be completed without any errors.

If the repair process encounters an error, it is likely that the B-tree that needs to be repaired is damaged, which will cause some or all of the tables that need to be repaired to be missing. At this time, use the backup to repair, which can save some missing parts.

In the earliest Dump repair, the scene has been basically covered by the B-tree analysis and repair. If the B-tree repair is unsuccessful, the dump recovery is likely to be unsuccessful. Even so, if all the above attempts fail, Dump recovery will be attempted at the end.

(Picture: Recovery plan combination)

The three repair methods mentioned above, in principle, only involve the SQLite file format and the basic file system, which are cross-platform. In actual operation, each platform can make strategic adjustments with their own characteristics. For example, the Android system uses  JobScheduler  backup when the screen is off when charging.

Our components

WCDB-WeChat Database, WeChat's mobile database component, contains the above several repair solutions, as well as encryption, connection pool concurrency, ORM, performance optimization and other features. It will be open source in the near future. Welcome to pay attention.

For more information, please see previous articles:

"WeChat mobile terminal database component WCDB series (1)-iOS basic articles"

"WeChat SQLite Database Repair Practice"