Table structure design of HBase one-to-many relationship

Table structure design of HBase one-to-many relationship


    I just started using HBase just to access some simple JAVA objects or simple data, so generally only one of the column families and column labels is used. 
    Recently, there was a task of moving the messages in the system to HBase, and then I started to check the one-to-many relationship in HBase. I found that the information on the Internet was not very detailed. This blog records my design and ideas. After all, these ideas are unproven and need to be verified. If there is something wrong or wrong, please feel free to advise. 
    First of all, I will talk about two materials I refer to. Background: a one-to-many relationship between a main post and N replies. Those who have learned a little database should be able to understand it. I will not draw the picture: 
1. Official recommendation information:  
http://wiki.apache.org/hadoop/Hbase/DataModel 
2. A big and simple introduction to the HBase one-to-many table structure (I feel that he actually referred to Material 1, but it is not very... reasonable, and the following list The comment_title should be wrong, and the one-to-many example seems to be very puzzling): 
http://doudouclever.blog.163.com/blog/static/17511231020127893233972/ 
final solution is this table ( According to official information): 
TableRow KeyFamilyAttributes(ColumnKeys/Qualifiers)
BlogTableIDinfo:Author,Title,URL
  text:No ColumnKey,3version
  comment title:Column keys are written like YYYMMDDHHmmss. Should be IN-MEMORY and have a 1 version
  comment author:Same keys. 1 Version
  comment text:Same keys. 1 Version

Because I was looking at the second data at the beginning, I didn't look at the official data, which led to a misunderstanding. I’ve always wanted to understand how this one-to-many design is, but it’s enough to understand the following two knowledge points: 
1. HBase two-dimensional table structure : three important concepts are Column Family (hereinafter referred to as CF) and Column Key/Qualifier (Hereinafter referred to as CK) and RowKey. A CF can contain several CKs. It is equivalent to CF is a merged cell; CK is the specific column label and can be empty. Rowkey is the row label, which can be understood as the primary key. As shown in the figure below: 
View Image 
2. Hbase, the Column Key in a Column Family can be dynamically increased. The 
data stored in the relational database is as follows. Some fields are deleted for simplicity: 
header 
IDAuthorTitleBody
1Zhang SanMessage headerThis is the content Hello World!

list: 
IDHeadIDCommentAuthorTitleBody
11Li SiReply header 1This is the reply content 1
21Wang WuReply header 2This is the reply content 2

To transfer to Hbase for storage, you need to "extend vertically" the previous details (for the same header, add data one by one to the bottom of the schedule), and transform it into "horizontal extension" of HBase (for the same RowKey, add detailed ColumnKey), Hbase data stored in the following, the table does not get ITeye merge cells, so it used to show excel theme: 
View Image
Conclusion : it can be seen from the figure, HBase is the list of the previous field as a relational database ColumnFamily , and The primary key of the list is used as the ColumnKey structure to achieve one-to-many effect. For relational databases, when the details increase, data is added vertically; for Hbase, data is added through the increase of ColumnKey 
. Problems that may arise from this: 
  • 1. HBase officially does not recommend multiple Column Family, more than 3 is not recommended, see http://hbase.apache.org/book/number.of.cfs.html 
        but it is a one-to-many relationship It is necessary to use multiple Column Family, this contradiction makes me still very puzzled. . 
  • 2. RowKey storage problem, traditional database primary keys are generally a batch of certificate values ​​generated in an incremental manner, but Hbase uses this method as RowKey will cause the problem of excessive load on the regionserver, so the generation method of RowKey needs to be discussed again .
  • 3. In this one-to-many way, if you reply many, many, such as posting a post on W reply, it will cause a lot of ColumnKey, which means that the Hbase table will become very wide =. = Although I have read the post saying that HBase is not a two-dimensional structure in the traditional sense, it just does not leave space for a certain cell empty area to store data (I may not understand and describe it properly here), in short, this The "wide" table structure is an impact on the ideology of the traditional database table structure. I don't know if it will be a problem. . . 

Reference : https://blog.csdn.net/codepython/article/details/41910033