Looking back on 2018 | AnalyticDB: Do not forget the original intention and move forward

Looking back on 2018 | AnalyticDB: Do not forget the original intention and move forward

Abstract: 2018 is destined to be an extraordinary year for AnalyticDB. Great progress has been made in architecture evolution, stability, ecological construction, and compatibility.

Inscription

The analytical database AnalyticDB (hereinafter referred to as ADB) is the only PB-level real-time data warehouse independently developed by Alibaba that has been verified by super-large scale and core business. Up to now, existing external support customers include traditional large and medium-sized enterprises and government agencies, as well as numerous Internet companies, covering more than a dozen external industries. At the same time, ADB undertakes the high-concurrency and low-latency analysis and processing of many core businesses such as advertising and marketing, merchant data services, rookie logistics, and Hema new retail within Ali.

In 2018, we added the Shenzhen and Bay Area R&D centers to welcome more professional talents to join. We also received strong support from customers who challenged many business scenarios. We are destined to leave deep in the development history this year. Deep imprint. In the past year, ADB has ushered in rapid development in terms of architecture and product evolution. I would like to record the year 2018 that we have traveled together with this article.

Architecture evolution

1. Access layer and SQL Parser

1. Fully use self-developed Parser component-FastSQL

Due to historical reasons, there were multiple Parser components in each module of ADB. For example, at that time, the storage node used Druid , and the access layer SQL parsing used Antlr Parser, which made it difficult to improve SQL compatibility. For an ADB that has been online for many years and has served many internal and external data businesses, students who are familiar with databases know that it is difficult to replace Parser, which is comparable to changing an engine in flight.

After more than half a year of hard work, ADB has completed the unified upgrade of the above-mentioned Parser components and replaced them with FastSQL (Base on Druid, after 8 years of improvement in the open source community, the syntax support has been very complete). After upgrading to FastSQL, while SQL compatibility has been greatly improved, and the analysis speed of complex scenes has been increased by 30-100 times, FastSQL can also seamlessly integrate with the optimizer, providing constant folding, function transformation, expression conversion, function type inference, and constants. Inference, semantic deduplication and other functions are supported to facilitate the optimizer to generate the optimal execution plan.

2. Real-time write performance improved by 10 times

Starting from the ADB v2.7.4 version, in-depth technical optimization has been done on SQL Parser, which greatly improves the performance of the INSERT link. In the actual production environment, the performance is improved by 10 times. Taking 4*C4 on the cloud as an example, the item table can be The pressure is measured to 15w TPS.

3. Mass data streaming return

In versions prior to ADB v2.7, the data returned by the computing framework needed to be accumulated in memory and returned to the client after all executions were completed. When the concurrency is large or the result is large, there may be a risk of memory overflow. Starting from version 2.7, streaming returns are used by default, and data is returned to the client in real time, which can reduce latency and greatly improve the call stability of large result sets.

2. Query Optimizer

In 2018, on the one hand, cloud-based and cloud-based services were fully migrated from the previous generation of ADB's LM engine to Xihe MPP engine. On the other hand, more and more customers did not want to go offline or stream computing for cleaning, and real-time data warehouse scenarios broke out. At the same time, more and more visualization tools that automatically generate SQL are beginning to dock with ADB, which poses extremely high challenges for the optimizer team.

In order to meet these challenges, the optimizer team grew out of nothing during this year and gradually formed a sophisticated international team. In the continuous build-in process, the ADB optimizer has achieved the following results this year:

1. Build and improve the RBO Plus optimizer

Different from the traditional RBO optimizer, the following key features are implemented in the RBO Plus design:

1). Introduce cost model and estimation, use ADB's efficient storage interface, and introduce dynamic selectivity & cardinality estimation to optimize join reorder, so that ADB can handle complex query scenarios with up to 10+ table joins.

2). The execution plan of data shuffling, aggregation, etc. is specially optimized for MPP, and the performance of the LM engine business scenario is close to zero regression.

3). From the two dimensions of function and performance, various common and complex correlation sub-query scenarios are deeply optimized, and a series of correlation rule algorithms are designed. Among the corresponding queries in various standard benchmarks, some scenarios up to 20X Promote.

4). A parameterized plan cache is designed for the ultra-short delay (ms) check, which reduces the optimization time cost of these scenarios by more than 10 times.

2. Build a CBO optimizer

In the face of increasingly complex business query scenarios, RBO and RBO Plus have their corresponding limitations and challenges. CBO has become the key to the ADB optimizer moving towards a general business optimizer. We did not adopt Calcite, which is widely used but has many limitations. Optimizer, but set out to create an independent and controllable CBO optimizer to enhance ADB s core competitiveness:

1) Establish an efficient statistical information collection system, balance accuracy and collection costs, and provide "basic information facilities" for CBOs;

2) Construct the CBO framework of Cascades architecture and turn it into a scalable and optimized platform.

3. Xihe MPP engine

This year, the ADB architecture fully switched from the previous generation LM engine to the Xihe MPP engine. On the one hand, the Xihe engine must support the completion of the switch and meet the important task of customers with more flexible and free queries, and it must be eliminated through substantial performance optimization. Performance overhead in certain scenarios caused by engine switching.

1. Full Binary calculation

Based on Binary's calculation, the serialization and deserialization overhead of Shuffle is eliminated, and it is deeply bound with storage to achieve integration of storage and calculation, and there is no redundant serialization and deserialization overhead in the entire calculation process.

2. Internal pooling

Pooling is completed through fixed-length memory slices, which reduces the consumption caused by continuous memory expansion during the calculation process, and at the same time, fully self-controls memory through pooling, avoiding GC consumption in complex SQL scenarios.

3. CodeGen deep optimization

1). Cache use, by extracting CodeGen constants of operators and expressions, caching the code generated by CodeGen, avoiding the overhead of dynamically compiling code every time in high concurrency scenarios, and supporting high concurrency and QPS services. Good foundation.

2). Reduce the cost of materialization between operators through operator fusion.

4. Other optimizations

1). Calculate by column, cooperate with the JVM team to introduce JDK11 through columnar calculation to support the new SIMD instruction set for calculation optimization

2). Automatic algorithm optimization, according to data distribution, data sampling automatically optimizes part of the execution algorithm and memory;

3). Spill with adaptive operator, supports dynamic memory allocation and preemption, supports join, agg and other operators to place;

4). Introduce a new serialization and deserialization framework to replace the original JSON protocol, and introduce a new Netty to replace the original Jetty

5). Support run-time statistical information collection, including operator-level, stage-level, query-level memory and cpu cost statistics, and support partial automatic identification of slow SQL.

4. Basalt storage engine

In order to satisfy business queries with high concurrent writes and low latency, ADB has designed a read-write separation architecture. In the historical version, there are two problems under the read-write analysis architecture: 1. Writing to the visible is asynchronous, and some scenes require strong read-write consistency. 2. In the case of a large incremental data write, the query becomes slower. Similarly, this year's storage engine also made a major breakthrough in the real-time and performance of incremental data.

1. Support strong read and write consistency

In the latest ADB version, a complete set of consistent read-write separation architecture is designed, as shown in the following figure:

The blue line represents the write link, and the orange line represents the read link. Take one-time write and read as an example. When a user newly writes data to a table, he immediately initiates a query. At this time, FN will collect the latest written version number of the table on all BNs (step 3), and add the The version number (marked as V1) is sent along with the query request to the corresponding CN node (step 4), and the CN compares the local consumption version (marked as V2) of the table with the requested version number.

If V1>V2, CN will provide query after consumption of the latest written data (step 5); if V1

2. Improve the query performance of the incremental data area

In the ADB storage node, for sudden large-scale real-time writes, more data may be accumulated in the incremental data area in a short period of time. If the query occurs in the incremental data area, a large number of table scan reads will slow down the entire data analysis performance. For this reason, RocksDB is introduced into the incremental data area to build an index of incremental data to ensure the access performance of the incremental data area. Through the hierarchical storage results of LSM-Tree, it provides good write performance and better query capabilities.

5. Distributed Meta Service

The metadata stability of historical versions is challenging. Each module has a race condition for accessing metadata, the meta pressure is high, and the DDL experience is poor. This year we refactored the metadata module, launched the GMS service to provide unified management of metadata, and at the same time provide distributed DDL capabilities, and reduce the pressure on the meta library through distributed caching, and provide efficient metadata access efficiency.

Global metadata service release

1). Global Meta Service (GMS) is on-line production, providing distributed DDL and data scheduling capabilities, and simultaneously creating and deleting tables to improve user experience.

2). Improved table partition allocation algorithm, optimized for computing scheduling, supporting uniform data distribution in multi-table group scenarios

3). Data update (online), data redistribution (Rebalance) stability is greatly improved

6. Hardware acceleration

Although GPUs have been widely used in general-purpose computing, they are usually used in fields such as graphics processing, machine learning, and high-performance computing. How to organically combine the powerful computing power of GPU with ADB is not an easy problem to solve. To make good use of GPU, there are many challenges in GPU resource management, code generation, video memory management, data management, and execution plan optimization.

In 2018, ADB introduced GPU as a computing acceleration engine. The calculations that originally relied on offline analysis engines to be completed the next day can now be completed with only a second delay. The data value has been successfully online and brought huge value to customers. .

1. GPU resource management

How to get access to GPU resources is the first problem to be solved. ADB calls the CUDA API through jCUDA, which is used to manage and configure the GPU device and the boot interface package of the GPU kernel. This module acts as a bridge between Java and GPU, making it easy for JVM to call GPU resources.

2.CodeGen

The execution plan of ADB is prepared for the CPU and cannot be executed on the GPU. And due to the particularity of GPU architecture, the programming model of GPU is also different from that of CPU. In order to solve this problem, a new CodeGen module was introduced. CodeGen first compiles the physical plan into LLVM IR with the help of LLVM API, and the IR is optimized and converted into PTX code. Then call CUDA to convert the PTX code into local executable code, and start the GPU calculation function in it.

CodeGen can generate different codes for different processors, and can also be transferred to the CPU for execution when the GPU is not available. Compared with the traditional volcano model, ADB's CodeGen module effectively reduces the cost of function calls and makes full use of the GPU's concurrency capabilities. In addition, Code Generator uses operator fusion, such as group-by aggregation, join plus aggregation fusion, which greatly reduces the copy of intermediate results (especially the connection results of Join) and the occupation of video memory.

3. Video memory management

ADB developed VRAM Manager to manage the video memory of each GPU. Different from the way other GPU database systems use GPUs on the market, in order to improve the utilization of video memory and improve concurrency, combined with the characteristics of ADB's multi-partition and multi-threading, we designed a Slab-based VRAM Manager to uniformly manage all video memory applications.

The performance test shows that the average allocation time is 1ms, which is significantly faster than the 700ms of the GPU's built-in video memory allocation interface, which is conducive to improving the overall concurrency of the system.

4. Plan optimization

SQL is sent from FN to CN. Task Manager first selects whether to be processed by CPU or GPU according to the amount of calculated data and query characteristics, and then generates a physical plan suitable for GPU execution according to the logical plan. GPU Engine first rewrites the execution plan after receiving the physical plan. If the plan meets the fusion characteristics, the composite operator fusion is started, thereby greatly reducing the cost of temporary data transmission between operators.

Productization capability upgrade

1. Improved ease of use

1. Fully switch between Xi and MPP engines

Starting from version 2.6.2, all MPP engines within the group and public clouds are defaulted, completely bidding farewell to the various query restrictions of the previous generation of LM engines. MPP supports SQL writing more freely and flexibly, and ADB customers usher in the era of Full MPP.

2. SQL compatibility is greatly improved

After the access layer and storage layer Parser modules are all upgraded to FastSQL, plus the successful switching of MPP, the SQL compatibility after ADB v2.7 has been greatly improved compared with the historical version. Other optimizations of SQL include support for streaming and no longer limit query result set return, paging compatibility with MySQL, and so on.

3. Automatic expansion and contraction, up and down distribution

In June, a major feature was released: automatic expansion and contraction + flexible adjustment. In addition to scaling basic capabilities, customers can also switch between specifications. For example, if you switch from 10c4 to 4c8, you can switch from 4c8 to 10c4. At the same time, the real-time table also supports switching back and forth between high-performance SSD instances and large storage SATA instances.

Online effect: The configuration switch ensures that the reading is not interrupted, and the writing is interrupted for about 1-2 minutes. Afterwards, through the dynamic drift partition, the writing can also be completely non-stop service.

4. The new version of the console and DMS are online

The new version of the public cloud console displays richer content, supports console management, and view user portraits; it adds more places to interact with users. User management, acl and sub-account authorization are more convenient. DMS is newly revised, the experience is greatly improved, it is convenient to import and export, and it supports SQL prompt and SQL memory function.

5. Release the user-oriented monitoring and warning system

In order to enhance the customer's self-service ability, the indicators supported by user side monitoring include CPU usage, query and write volume, data skew, Top Slow SQL, etc.

6. The entire network database monitoring market goes online

This is the eye of the entire network database, and the operating status of the database on the cloud in the bomb is at a glance. Through the buried point analysis of various indicators of the database and system layer, the operating status of the database is constantly monitored, as shown in the following figure (demo data), and the abnormal indicators can be pushed to the customer Dingding group, which greatly improves the efficiency of operating duty.

7. Release Availability Zone

This year, the public cloud released multi-zone support, which completely solved the short-selling problem of a single region. Enterprise customers can also selectively use the available zones for disaster recovery.

2. Release new core functions

1. Vector analysis

In September of this year, the vector analysis capability was officially released, making structured and unstructured data capable of fusion analysis. Based on the vector partitioning rules of the vector clustering rule, partitions according to the clustering results, so that vectors with similar distances are stored nearby. In a proprietary cloud project, it supports 1:1 billion face recognition, QPS is over 10,000, the delay is within 100 milliseconds, and the data volume reaches the level of several terabytes.

At the same time, it supports face recognition, algorithm recommendation, and real-time fusion analysis of structured data in new retail scenes such as Intime and Hema for the first time, opening up the online and offline membership system in milliseconds, and supporting real-time data-based offline interaction and marketing.

2. Full text search

After ADB v2.7.4 version, the full-text search function is provided through the SQL language, and the commonly used structured data analysis operations are unified with the flexible unstructured data analysis operations. The same SQL language is used to operate multiple types of data, which reduces the learning and Development costs. On the one hand, it provides fusion retrieval of structured data and unstructured text, and multi-modal analysis capabilities. On the other hand, it provides comprehensive distributed computing capabilities based on MPP+DAG technology. At the same time, it has built-in intelligent word segmentation components from Taobao and Tmall search. , The word segmentation effect is better and faster.

3. New data type JSON & Decimal

In version 2.7, the JSON data type was officially released, which fully supports the retrieval and analysis of all JSON types including Object, NULL, Array, and provides the great flexibility of Schema less for the business, as well as fast retrieval performance. . In order to better facilitate financial customers, also in version 2.7, ADB officially released the Decimal data type, taking another important step towards the compatibility of traditional database data types.

3. Ecological construction

1. Data access

Customer data is often diverse and stored in various places. In order to pursue lower-cost and higher-efficiency data access capabilities and build real-time data warehouse capabilities, ADB has made many improvements in data access this year.
1). Copy From OSS & MaxCompute has been developed and will be online after New Year's Day.

2). ADB Uploader is released to facilitate the rapid import of local files.

3). ADB releases the Logstash plug-in, which facilitates the log data format to be directly written to ADB without going through MQ or HUB in the middle.

4). ADB Client SDK is released and open sourced, the programming logic of client writing is simplified, and aggregate writing performance is greatly improved.

5). The stability of batch table import is greatly improved, and the MaxCompute SDK upgrade and OpenMR switch are completed at the same time.

6). The Connector Service is online, providing a unified data source access layer

Next, ADB plans to access more data sources based on the existing framework (the gray part in the figure)

2. Industry cloud access

The completion of cloud access to the three major industries of financial cloud, logistics cloud, and Jushita enables financial, logistics, and e-commerce small and medium-sized enterprises to also enjoy low-cost real-time data analysis capabilities, and improve the level of refined data operations of enterprises.

Look to the future

The year 2018 is destined to be an extraordinary year for AnalyticDB, and considerable progress has been made in architecture evolution, stability, ecological construction and compatibility. This year we were successfully selected into the Contenders quadrant of the research report "The Forrester Wave : CloudData Warehouse, Q4 2018" released by Forrester, a global authoritative IT consulting organization , and the Magic Quadrant for Data Management Solutions for Analytics released by Gartner. , Began to enter the global analysis market.

Looking to the future, we will continue to expand and deepen the analysis performance, stability, and productization (ease of use, data channel, task management, visualization and other surrounding ecological construction)! ADB aims to help customers bring the entire data analysis and value from traditional offline analysis to the next generation of online real-time analysis mode. Thank you for the customers who have grown up with us all the way! Thanks, thanks!