Hello everyone, I'm Xiao Yu
Recently used in the work of Hbase
this database, but also the way to make a record about Hbase of knowledge to share with everyone. In fact, Hbase's content system is really many, many, here are some technical points that Xiaoyu thinks will be used in his work, I hope it can help everyone.
It can be said that the Internet is built on all kinds of databases, and now there are so few mainstream database: the MySQL
represented and its distributed relational database solutions Redis
for the cache database represented to ES
the representative of retrieval database, then there are persistent distributed KV
database. In the open source field, especially in China, HBase
it is almost the preferred solution for distributed and persistent KV databases. HBase
There are many business scenarios for applications, such as user portraits, real-time (offline) recommendations, real-time risk control, social feed streams, product history orders, social chat records, monitoring systems, user behavior logs, and so on.
Preface
Every one of us will generate a lot of data no matter what technology products we use, and the storage and query of these data is actually difficult for small databases to meet our needs. Therefore, HBase distributed big data has appeared . HBase is built on Hadoop
top of the file system of column-oriented database management systems . HBase is similar to a Google s Big Table
data model, which is part of Hadoop ecosystem, it is stored in the data HDFS
, the data on the client can be realized by the HDFS HBase random access . It mainly has the following characteristics:
Does not support complex transactions, only supports row-level transactions , that is, the read and write of a single row of data is atomic;
Because it is employed as the underlying storage HDFS, and so HDFS
as to support structured, semi-structured and unstructured storage;
Support horizontal expansion by adding machines ;
Support data fragmentation ;
Support automatic failover between RegionServers ;
Easy-to-use Java client API
;
Support BlockCache
and bloom filter ;
The filter supports predicate pushdown .
HBase principle
concept
HBase is a distributed, column-oriented open source database (in fact, it is column-oriented to be accurate ). HDFS
Hbase to provide reliable underlying data storage services , MapReduce
to provide high performance for Hbase computing power , Zookeeper
providing Hbase stable service and Failover
mechanism, so we say Hbase is a high-speed storage and read through a lot of cheap machines to solve the massive data distributed Database solutions .
Columnar storage
Let's first look at the row-by-row storage of the previous relational database . As shown below:
It can be seen that only the first row of ID: 1 is filled in the data of this row of Xiao Yu, and the data of Xiao Na and Xiao Zhi are not filled in. In our row structure, they are all fixed, each row is the same, even if it is not filled in, it must be left blank.
Let's take a look at the effect diagram of column storage using non-relational database :
It can be seen that one column of data from Xiaoyu corresponds to one row of data from Xiaoyu now, and the original seven columns of data from Xiaoyu has become the current seven rows. The previous seven rows of data are in one row, sharing a primary key ID: 1. In the columnar storage, it becomes seven rows, and each row has a primary key corresponding to it, which is why the primary key ID of Koba: 1 is repeated seven times. The biggest advantage of this arrangement is that we don't need to add data that we don't need, which will greatly save our space resources . Because the selection rules in the query are defined by columns , the entire database is automatically indexed .
Comparison of NoSQL and relational databases
Compare the following figure:
RDBMS and Hbase comparison
Hbase stores data according to column families . There can be many columns under the column family, and the column family must be specified when the table is created . In order to deepen the understanding of Hbase column family, the following are simple relational database tables and Hbase database tables:
The main difference:
HBase architecture
Hbase by Client Zookeeper Master HRegionServer HDFS
several other core system components.
Client
Client uses the HBase RPC
mechanism HMaster, HRegionServer communication. Client communicates with HMaster for management type , and communicates with HRegion Server for data operation type .
Zookeeper
Hbase uses Zookeeper to do the high availability of master , the monitoring of RegionServer , the entry of metadata, and the maintenance of cluster configuration . The specific work is as follows:
-
Use Zoopkeeper to ensure that only one master is running in the cluster. If the master is abnormal, a new master will be generated through the competition mechanism to provide services
-
Monitor the status of the RegionServer through Zoopkeeper. When the RegionSevrer is abnormal, notify the Master RegionServer of the upper and lower limits of the information in the form of callbacks
-
The unified entry address for storing metadata through Zoopkeeper .
When using hbase client needs to be added zookeeper of ip
address paths and nodes , to establish a connection with the zookeeper, establishing a connection as shown in the following code:
Hmaster
The main responsibilities of the master node are as follows:
-
Assign to RegionServer
Region
-
Maintain load balance for the entire cluster
-
Maintain the metadata information of the cluster , find the failed Region, and assign the failed Region to the normal RegionServer. When the RegionSever fails, coordinate the split of the corresponding Hlog
HRegionServer
Series internal HRegionServer HRegion management objects, each HRegion corresponding to Table
one of the ColumnFamily
memory, i.e., a Store Manager of a Region on a column family (CF). Each Store contains a MemStore and 0 to more StoreFiles. Store is the storage core of HBase and consists of MemStore and StoreFile.
HLog
When data is written, it is first written to the Write Ahead Log. The write operation logs of all Regions of each HRegionServer service are stored in the same log file . Data is not written directly to HDFS, but is written in batches after a certain amount of cache is cached . After the writing is completed, the log is marked .
MemStore
Memstore is an ordered memory buffer , the data is first written into the user memstore, when after full memstore Flush
into a StoreFile (corresponding to the time of storage File
), when the number StoreFile increased to a certain threshold , triggering Compact
combined, a plurality of StoreFile merged into one StoreFile. StoreFiles are merged to gradually form larger and larger StoreFiles. When the total size of all StoreFiles (Hfiles) in the Region exceeds the threshold (hbase.hregion.max.filesize), the split is triggered Split
, and the current Region Split is divided into 2 Regions. Region offline , out of the new Spilt Region 2 children are assigned to the appropriate HMaster HRegionServer
on, so that the original pressure of Region 1 is split into two Region.
Region addressing mode
Through zookeeper.META, there are mainly the following steps:
-
Client requests ZK to obtain
.META.
the address of RegionServer . -
The client requests the RegionServer where the .META. is located to obtain the address of the RegionServer where the access data is located , and the client will cache the relevant information of the .META. for the next quick access.
-
The Client requests the RegionServer where the data is located, and obtains the required data .
HDFS
HDFS provides Hbase with the ultimate underlying data storage service , and at the same time provides Hbase with high availability (Hlog is stored in HDFS) support.
HBase components
Column Family
Column Family is also called column family . Hbase divides data storage through column family. Column family can contain any number of columns to realize flexible data access . The column family must be specified when the Hbase table is created. It's the same as when you must specify specific columns when a relational database is created. Hbase's column families are not as many as possible. The official recommendation is that the column families should be less than or equal to 3. The scene we use is generally 1 column family.
Rowkey
Rowkey concept and mysql
the primary key is exactly the same, Hbase Rowkey to use only difference data of a row. Hbase only supports three query methods: single row query based on Rowkey, range scan based on Rowkey , and full table scan .
Region
Region: The concept of Region is similar to partitioning or sharding of a relational database . Hbase will assign the data of a large table to different regions based on the different ranges of Rowkey , and each region is responsible for a certain range of data access and storage. In this way, even a huge table, because it is cut into different regions, the latency of access is very low .
TimeStamp multi-version
TimeStamp is the key to multi-version Hbase . Using different timestame in Hbase to identify the same rowkey
row of data corresponding to different versions. When writing data, if the user does not specify the corresponding timestamp, Hbase will automatically add a timestamp, and the timestamp is consistent with the server time . In Hbase, the data of the same rowkey is arranged in reverse order of timestamp . By default, the latest version is queried, and users can read the data of the old version by specifying the value of timestamp.
Hbase write logic
Hbase write process
There are three main steps:
-
Client gets the region where the data is written
RegionServer
-
Hlog is requested to be written, and Hlog is stored in HDFS. When an exception occurs in the RegionServer, Hlog needs to be used to restore data .
-
Request to write to MemStore, only when writing Hlog and writing to MemStore are successful, the request is considered to be completed. MemStore will be gradually flushed to HDFS in the future.
MemStore flashing
In order to improve the write performance of Hbase, when a write request is written to MemStore, it will not be flushed immediately. Instead, it will wait until a certain time to perform the operation of brushing the disk. Which specific scenarios will trigger the flashing operation? Summarized into the following scenarios:
-
This global parameter is to control the overall memory usage. When all memstores account for the largest proportion of the entire heap , the flashing operation will be triggered . This parameter is the
hbase.regionserver.global.memstore.upperLimit
default for the entire heap memory40%
. But this does not mean that global memory triggered brush disk operation will have carried out all the MemStore input tray, but by another parameterhbase.regionserver.global.memstore.lowerLimit
to control, the default is the entire heap memory35%
. Whenflush
all heap memory memstore the total ratio of35%
time, it stops the brush plate . This is mainly to reduce the impact of the flashing on the business and achieve the purpose of smoothing the system load . -
When MemStore size reached hbase.hregion.memstore.flush.size size triggers brush set, the default
128M
size -
As mentioned earlier, Hlog is to ensure the consistency of Hbase data. If there are too many Hlogs, it will take too long for fault recovery. Therefore, Hbase will limit the maximum number of Hlogs . When the maximum number of Hlog is reached, it will be forced to flush. This parameter is hase.regionserver.max.logs, the default is 32.
-
The flush operation can be triggered manually through hbase shell or java api .
-
Closing the RegionServer normally will trigger the flashing operation. After all the data is flashed, there is no need to use Hlog to restore the data.
-
When a RegionServer fails, the region above it will be migrated to other normal RegionServers. After the data of the region is restored, a flash disk will be triggered, and it will be provided for business access only after the disk flash is completed.
HBase middle layer
Phoenix
SQL is an open source HBase intermediate layer , which allows you to use standard JDBC
approach to the operation on the data HBase. Before Phoenix, if you want to access HBase, you can only call its Java API, but compared to using a single line of SQL to query data, HBase's API is still too complicated. Phoenix's philosophy is we put sql SQL back in NOSQL
that you can use standard SQL to perform operations on data on HBase. It also means you can integrate Spring Data JPA Mybatis
other commonly used persistence framework to operate HBase.
Secondly, the performance of Phoenix is also very good. The Phoenix query engine converts SQL queries into one or more HBase Scans, and generates standard JDBC result sets through parallel execution . It is through the use of direct and coprocessor and custom filters may be provided as a small data query millisecond performance , to provide second-level performance for a query million rows of data. At the same time, Phoenix also has features that HBase does not have, such as secondary indexes . Because of the above advantages, Phoenix has become the best SQL middle layer for HBase .HBase API
HBase installation and use
Download the HBase compressed package and unzip it first
Open the hbase-env.sh file to configure JAVA_HOME:
Configure hbase-site.xml:
Replace the above host name with your own host name to start HBase. The web page access is as follows:
HBase commands
The following are some of Hbase's frequently used items compiled by Xiaoyu
command:
HBase API usage
The API is as follows:
Examples are as follows:
HBase application scenarios
Object storage system
HBase MOB (Medium Object Storage), medium object storage is a new feature introduced in the hbase-2.0.0 version, which is used to solve the problem of poor performance of hbase storing medium files (0.1m~10m). This feature is suitable for storing pictures, documents, PDFs, and small videos in Hbase.
OLAP storage
Kylin's bottom layer uses HBase storage, which is fancy for its high concurrency and massive storage capabilities . The process of kylin building a cube will generate a large amount of pre-aggregated intermediate data , with a high data expansion rate, and high requirements on the storage capacity of the database.
Phoenix HBase is built on an SQL engine , you can call directly through JDBC interface operation Hbase phoenix, although there upsert operations, but more is used in the OLAP
scene, the disadvantage is very inflexible .
Time series data
The openTsDB application, which records and displays the values of indicators at various points in time, is generally used in monitoring scenarios and is an application on the upper layer of HBase.
User Portrait System
Dynamic column, sparse column characteristics. The number of dimensions used to describe user characteristics is variable and may grow dynamically (such as hobbies, gender, address, etc.), and not every characteristic dimension will have data.
Message/Order System
Strong consistency, good read performance, hbase can guarantee strong consistency .
Feed stream system storage
The feed stream system has the characteristics of more reads and less writes, simple data model, high concurrency, peak and valley access, persistent and reliable storage, and message sorting . For example, HBase's rowKey sorting in lexicographic order is suitable for this scenario.
Hbase optimization
Pre-partitioned
By default, a Region partition is automatically created when the HBase table is created. When data is imported, all HBase clients write data to this Region, and the region is not split until the Region is large enough . One way to speed up the batch write speed is to create some empty Regions in advance , so that when data is written to HBase, it will load balance the data in the cluster according to the region partition .
Rowkey optimization
Rowkey in HBase is stored in lexicographic order . Therefore, when designing Rowkey, you should make full use of the sorting feature to store data that is often read together in one block, and put data that may be accessed recently in one block.
In addition, if Rowkey incremental generation, it is recommended not to use the positive sequence Rowkey written directly, instead of using the reverse
way of reverse Rowkey, making Rowkey roughly evenly distributed , this design has the advantage that can RegionServer load balancing, or prone to all The phenomenon that new data is accumulated on a RegionServer, this can also be designed in conjunction with the pre-segmentation of the table.
Reduce the number of column families
Don't define too many ColumnFamily in a table . Currently Hbase and does not handle more than 2~3
one ColumnFamily
table. Because when a ColumnFamily is flushing, its neighboring ColumnFamily will also be flushed due to the correlation effect, which will eventually cause the system to generate more I/O.
Caching strategy
When creating a table, you can use HColumnDescriptor.setInMemory(true) to put the table in the RegionServer's cache to ensure that it is hit by the cache when it is read.
Set storage lifetime
When creating a table, you can set the storage lifetime of the data in the table through HColumnDescriptor.setTimeToLive(int timeToLive) , and the expired data will be deleted automatically.
Hard disk configuration
Each RegionServer manages 10~1000 Regions. If each Region is located 1~2G
, each Server needs to be at least 10G, and the maximum is 1000*2G=2TB. Considering 3 backups, 6TB is required. The first option is to use 3 2TB hard disks, and the other is to use 12 500G hard disks. When the bandwidth is sufficient, the latter can provide greater throughput , fine-grained redundant backup , and faster single-disk failure recovery.
Allocate appropriate memory to the RegionServer service
Without affecting other services, the bigger the better . In the example of HBase conf directory hbase-env.sh
added last export HBASE_REGIONSERVER_OPTS = "- Xmx16000m $ HBASE_REGIONSERVER_OPTS ", assigned to 16000m RegionServer wherein the memory size .
Number of backups of write data
Back up and read performance as proportional , and write performance is inversely proportional to the number and impact of backup for high availability . There are two configuration methods. One is to copy hdfs-site.xml to the conf directory of hbase, and then add or modify the value of the configuration item dfs.replication to the number of backups to be set. This modification affects all HBase The user tables are all valid. Another way is to rewrite the HBase code to allow HBase to support setting the number of backups for the column family . When creating the table, set the number of column family backups. The default is 3. This type of backup number is only for the set column family. Take effect .
WAL (Write Ahead Log)
A switch can be set to indicate that HBase does not need to write the log before writing data. The default is to open it . Turning it off will improve performance , but if the system fails (the inserted RegionServer hangs up), the data may be lost. When calling arranged WAL JavaAPI writing, disposed Put
instance WAL, call Put.setWriteToWAL(boolean)
.
Batch write
HBase's Put supports single inserts and batch inserts. Generally speaking, batch writing is faster and saves network overhead . When calling JavaAPI on the client side, put the batch of Puts into a Put list first, and then call the Put (Put list) function of HTable to write batches .
At last
In understanding HBase, HBase can be found in the design and in fact Elasticsearch
very similar, such as HBase the Flush&Compact
mechanism design and Elasticsearch exactly the same, so understanding them more smoothly.
In essence, HBase is positioned as a distributed storage system , and Elasticsearch is a distributed search engine . The two are not equivalent, but they are complementary. HBase limited search capabilities, only supports RowKey
indexing, and other advanced features of the other two indexes need to develop their own. Therefore, there are some cases that combine HBase and Elasticsearch to achieve storage + search capabilities. Make up for the lack of Elasticsearch storage capacity through HBase, and make up for the lack of HBase search capability through Elasticsearch.
In fact, it's not just HBase and Elasticsearch. Any kind of distributed framework or system, they have certain commonalities , the difference lies in their different concerns . Xiaoyu's feeling is that when learning distributed middleware, one should first clarify its core concerns, and then compare other middlewares to extract commonalities and characteristics to further deepen understanding.