Distributed database data backup/recovery, you must understand these

Distributed database data backup/recovery, you must understand these

I. Overview

Today, when data is used as a means of production, data has long become the source of life for all walks of life, and the importance of data security is self-evident. Data backup is the foundation of data security, and complete backup and effective recovery methods are important guarantees for dealing with all emergencies. At the same time, data backup also plays an important role in the reuse of data and the greater value of data.

The focus of data backup is to back up the database. Data backup of the database is carried out on a regular basis, so that in the future, system failures, human misoperations, etc., can be restored to prevent them.

As a new generation of distributed transactional database, Jushan Database adopts the architecture of "separation of computing and storage"; it coincides with the architecture of the beacon manufacturer AWS, which developed distributed databases at the same time. It can better support the needs of convenient application development in the era of microservices.

The architecture of separation of computing and storage also enables the giant sequoia database to have a variety of backup and recovery methods. Both the storage engine layer and the SQL instance layer of Jushan Database can achieve the purpose of database backup and recovery.

Computing and storage separation architecture diagram

2. Backup and recovery of the storage engine layer

Jushan Database adopts the self-developed SequoiaDB distributed storage engine, which supports physical backup (db.backup) and logical backup (sdbexprt).

Physical backup refers to dumping the physical files of the database (such as data files, log files, etc.). Once the database fails, these files can be used to restore; usually it can be divided into full backup and incremental backup. The physical backup of the giant sequoia database can be performed for the entire database or specified data groups;

When doing physical backup and recovery work of Jushan Database, pay attention to the difference between distributed database and relational database, that is, distributed database data recovery will involve the database physical files of multiple servers, not just the database on one server Physical files.

Logical backup refers to the use of tools to export database objects (such as users, tables, stored procedures, etc.), and tools can also be used to import logical backup files into the database. Logical backup of Jushan database can export collections (tables) to json or csv format, realizing fast and smooth migration.

Physical backup and recovery method

The following uses a three-machine giant sequoia database-level cluster (3 copies) to illustrate the physical backup and recovery method of the storage engine layer.

1. Preparation

  • Host information remarks:

IP

Hostname

SequoiaDB users

Remarks

192.168.1.3

sdb03

sdbadmin

192.168.1.4

sdb04

sdbadmin

192.168.1.5

sdb05

sdbadmin

Backup disk (mount directory/sdbdata/backup)

  • Share the backup directory via NFS
[root@sdb05 ~]# cat/etc/exports
/sdbdata/backup   *(insecure,rw,sync,no_wdelay,insecure_locks,no_root_squash)


[root@sdb03 ~]# mount -t nfs -o rw,bg,hard,nointr,tcp 192.168.1.5:/sdbdata/backup/sdbdata/backup
[root@sdb03 ~]# chown -R sdbadmin:sdbadmin_group/sdbdata/backup


[root@sdb04 ~]# mount -t nfs -o rw,bg,hard,nointr,tcp 192.168.1.5:/sdbdata/backup/sdbdata/backup
[root@sdb04 ~]# chown -R sdbadmin:sdbadmin_group/sdbdata/backup
[root@sdb04 ~]# df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/centos-root       15G  4.4G   11G  29%/
/dev/vdb                      985G  129G  806G  14%/sdbdata
/dev/vda1                     497M  172M  326M  35%/boot
/dev/vda3                      50G   13G   38G  26%/opt
192.168.1.5:/sdbdata/backup  985G  129G  807G  14%/sdbdata/backu 

2. Full backup

  • Full backup script
[sdbadmin@sdb05 ~]#cat backup_full.sh
#!/bin/bash
date
/opt/sequoiasql/mysql/bin/mysqldump --login-path=mysql -A -d >/sdbdata/backup/full/create`date +%y%m%d%H`.sql
/opt/sequoidb/bin/sdblist -l -m list >/sdbdata/backup/full/sdblist`date +%y%m%d%H`.sql
/opt/sequoiadb/bin/sdb 'db=new Sdb()'
/opt/sequoiadb/bin/sdb 'db.backup ( { Name : "cluster_backup", Path : "/sdbdata/backup/full/%g", Overwrite : true, Description : "full_backup" } ) ;'
date


db.backup() 
Name: 2016-01-01-15:00:00 YYYY-MM-DD-HH:mm:ss 
Description: 
Path: bkuppath 
EnsureInc: true  false   false 
OverWrite: true  false   false 
GroupName: ["datagroup1", "datagroup2"]  
  • Perform a full backup and view the records of table test100.b

  • Perform a full backup

3. Incremental backup

  • Incremental backup script
[sdbadmin@sdb05 ~]#cat backup_incre.sh
#!/bin/bash
date
/opt/sequoiasql/mysql/bin/mysqldump --login-path=mysql -A -d >/sdbdata/backup/full/create`date +%y%m%d%H`.sql
/opt/sequoidb/bin/sdblist -l -m list >/sdbdata/backup/full/sdblist`date +%y%m%d%H`.sql
/opt/sequoiadb/bin/sdb 'db=new Sdb();'
/opt/sequoiadb/bin/sdb 'db.backup ( { Name : "cluster_backup", Path : "/sdbdata/backup/full/%g", EnsureInc : true } );'
date 
  • Perform incremental backup, insert 1 million pieces of data into table test100.b

  • Take incremental backups

4. Full recovery

  • Full recovery script
(1) ssh  
[sdbadmin@sdb05 ~]#ssh-keygen -t rsa
[sdbadmin@sdb05 ~]#ssh-copy-id sdbadmin@sdb03
[sdbadmin@sdb05 ~]#ssh-copy-id sdbadmin@sdb04
[sdbadmin@sdb05 ~]#ssh-copy-id sdbadmin@sdb05


(2) 
[sdbadmin@sdb05 ~]#cat restore_full.sh
#!/bin/bash
# 
for hostname in {sdb03,sdb04,sdb05}
do
ssh hostname/opt/sequoiadb/bin/sdbstop -t all
done
# 
for groupname  in `cat/sdbdata/backup/full/sdblist20200717.sql | awk '{print $8}' | grep -Evi "GroupName|SYSCoord" `
do
/opt/sequoiadb/bin/sdbrestore -p/sdbdata/backup/full/$groupname/-n cluster_backup -b 0 -i 0
done
# 2 scp 
for hostname in {sdb03,sdb04}
do
   for dbpath in `cat/sdbdata/backup/full/sdblist20200717.sql | awk '{print $10}'| grep -Evi "dbpath|*11810"`
   do
     ssh -t sdbadmin@$hostname "rm -rf "$dbpath*
     scp -pr $dbpath* $hostname:$dbpath &
   done
done
# 
/bin/read -p " Do you want to start SequoiaDB(yes or no)?  " Do
while [[ "$Do" != "no" ]] && [[ "$Do" != "yes" ]]
do
/bin/read -p "Do you want to start SequoiaDB(yes/no)?" Do
done
if [ "$Do" == "yes" ]
then
   for hostname in {sdb03,sdb04,sdb05}             
   do                                              
   ssh $hostname/opt/sequoiadb/bin/sdbstart -t all
   done                                            
elif [ "$Do" == "no" ]                         
then 
   echo "please start SequoiaDB by hand"
fi


sdbrestore 
--bkpath -p: 
--increaseid -i:  ( -1  
--beginincreaseid -b:  ( -1 ) 
--bkname -n: 
--action -a: restore list restore 
--diaglevel -v:  WARNING ( 3  
  • Perform full recovery and clear the records of test100.b table

  • Perform full recovery

  • View the number of records in the test100.b table after full recovery

5. Incremental recovery

  • Incremental recovery script
[sdbadmin@sdb05 ~]#cat restore_incre.sh
#!/bin/bash
# 
for hostname in {sdb03,sdb04,sdb05}
do
ssh $hostname/opt/sequoiadb/bin/sdbstop -t all
done
# 
for groupname  in `cat/sdbdata/backup/full/sdblist20200717.sql | awk '{print $8}' | grep -Evi "GroupName|SYSCoord"`
do
/opt/sequoiadb/bin/sdbrestore -p/sdbdata/backup/full/$groupname/-n cluster_backup -b -1
done
# 
/bin/read -p " Do you want to start SequoiaDB(yes or no)?  " Do
while [[ "$Do" != "no" ]] && [[ "$Do" != "yes" ]]
do
/bin/read -p "Do you want to start SequoiaDB(yes/no)?" Do
done


if [ "$Do" == "yes" ]
then
   for hostname in {sdb03,sdb04,sdb05}             
   do                                              
   ssh $hostname/opt/sequoiadb/bin/sdbstart -t all
   done                                            
elif [ "$Do" == "no" ]                         
then 
   echo "please start SequoiaDB by hand"
fi 
  • Perform incremental recovery

  • View the number of records in the test100.b table after incremental recovery

  • After incremental restoration, the data of table test100.b is completely restored

Logical backup and recovery method

The logical backup and recovery tools provided by Jushan Database are sdbexprt and sdbimprt.

1. Use of sdbexprt tool

sdbexprt can export the collection from the SequoiaDB database to a data storage file in json format or csv format. sdbexprt supports exporting a collection to a file, as well as exporting multiple collections in batches to a specified directory.

  • sdbexprt common parameter description

  • SequoiaDB uses the json document type to define the data storage model (class object storage). Logical backup and recovery adopts json format, which is smoother and more efficient. Take the following example of exporting data in json format:
(1) sbtest.sbtest1  
sdbexprt/
--hostname   "localhost"/
--svcname    "11810"  /
--user       "sdbadmin"/
--password   "sdbadmin"
--type       'json'    /
--csname     'sbtest'/
--clname     'sbtestl'/
--file       '/tmp/sbtest.sbtest1.json'


(2) sbtest 
sdbexprt/
--hostname   "localhost"/
--svcname    "11810"  /
--user       "sdbadmin"/
--password   "sdbadmin"
--type       'json'    /
--cscl     'sbtest'/
--dir       '/tmp' 
Note:
The corresponding relationship between the SQL instance layer and the storage engine layer is that the library name is equivalent to the bottom collection space; the table name is equivalent to the bottom collection.

2. Use of sdbimprt tool

sdbimprt is a data import tool that comes with the giant sequoia database. It can efficiently and smoothly import data in json format or csv format into the giant sequoia database.

In order to maximize I/O and network throughput, it is recommended to divide the data import file into multiple sub-files and store them in different servers for 100GB-level files, using a multi-server parallel loading strategy; and use during file loading- The j parameter turns on multithreading.

sdbimprt parallel import architecture diagram

  • sdbimprt common parameter description

  • Let's take importing json format data as an example
sdbimprt/
--hosts        'localhost:11810'  /
--user         'sdbadmin'         /
--password     'sdbadmin'          /
--csname       'sbtest'         /
--clname       'sbtest1'         /
--insertnum    10000              /
--jobs         20                 /
--type         'json'              /
--file         '/tmp/sbtest.sbtest1.json'
--coord        false              /
--ignorenull   true               /
--verbose      true               /
--force        false              /
--errorstop    true               /
--sharding     true               /
--transaction  false              /
--allowkeydup  true      

3. Backup operations at the SQL instance layer

In the online transaction scenario of the giant sequoia database, the SQL instance layer is mainly based on MySQL; the native MySQL database also supports physical backup (Xtrabackup, etc.) and logical backup (mysqldump, mysqlpump, mydumper).

The physical files of the native MySQL database only exist on a single server and there is a big difference between a distributed database; physical backup tools (Xtrabackup, etc.) are not suitable for the SQL instance layer of the giant sequoia database. Logical backup tools (mysqldump, mysqlpump, mydumper) are completely applicable. The following uses mysqldump and mydumper as examples.

1. Use of mysqldump tool

mysqldump is a logical backup tool that comes with MySQL. Its backup principle is to connect to the MySQL database through a protocol, query the data that needs to be backed up, and convert the queried data into the corresponding insert statement.

The advantages of mysqldump are simplicity and flexibility, clear data format, easy editing, and different levels of backup. The disadvantage of mysqldump is that the backup process is serialized and cannot be backed up in parallel, which is slower.

  • mysqldump common parameter description

  • Use mysqldump to implement database backup, common scenarios
(1) :
mysqldump -h 192.168.3.6 -P 3306  -u root -p root123456  -A >/backup/all.sql
(2) 
mysqldump --h 192.168.3.6 -P 3306  -u root -p root123456  -B  1  2 >/backup/database.sql
(3) 
Mysqldump -h 192.168.3.6 -P 3306  -u root -p root123456     1  2>/backup/table.sql
(4)mysqldump 
mysql -h 192.168.3.6 -P 3306  -u root -p root123456 </backup/all.sql;
 
MySQL>source/backup/all.sql 

2. Use of mydumper&myloader tools

mydumper&myloader is an open source (GNU GPLv3) tool for multi-threaded backup and recovery of MySQL databases. The developers mainly come from MySQL, Facebook and SkySQL companies, and are currently developed and maintained by Percona.

  • Description of common parameters of mydumper:

  • Description of common parameters of myloader:

  • Example: backup and restore sbtest
$time mydumper -h 192.168.3.6 -P 3306  -u root -p root123456 -t 6  -c  -e -B sbtest -o/home/data/
$time myloader -h 192.168.3.6 -P 3306  -u root -p root123456 -t 6 -B sbtest -o -d/home/data 

summary

In addition to some of the technical points introduced above, we can also choose third-party backup software or data backup all-in-one machine for centralized data backup management in actual applications; realize real-time and regular backup of databases, files, operating systems, and virtual machines . Here, everyone can also interact with us and share some experiences of using third-party tools and software for data recovery and backup.

Data backup and recovery is a compulsory course for all DBAs and system operators. In a distributed environment, although distributed databases have different mechanisms to ensure data security, data backup and recovery are still special in the production environment. One important point that needs to be paid attention to, and I hope everyone must be proficient in it.