Use go-randgen to test join queries

Use go-randgen to test join queries

In the database query, join is one of the most commonly used queries. Due to the complexity of the join algorithm implementation, the probability of problems is high. We analyze the join problems that have occurred in TiDB and classify the scenes that are prone The following categories:

The same join query, the join key is a different data type

Join on the partition table

The same join query, different join implementation algorithms

Special query conditions

We started with these scenarios and used the go-randgen framework to test TiDB in the past few months. The following will describe the go-randgen test framework and the related content of our test work in detail, divided into the following 4 parts:

Introduction to go-randgen testing framework

Introduce the use of go-randgen through examples

Practice and effect of go-randgen tool in TiDB test

The future can be based on the further work of go-randgen and the introduction of other related work

Introduction to go-randgen go-randgen is a fully configurable testing framework that allows you to create random data sets and run randomly generated queries on them, and then verify the correctness of the query results through A/B test. Use example of go-randgen Take the join test as an example. You can refer to github.com/pingcap/go- for the related syntax format of go-randgen, which is divided into 3 steps:

Define the zz file, specify the rules for table generation, such as data type, table type, number of rows, etc.

Define the yy file and specify the rules for generating random SQL

Run A/B test using the generated table structure and SQL

The following example describes each step in detail: 1. Define join.zz.lua. The zz file in this example can generate 6 tables, and each table has 17 fields corresponding to the types defined in files.types. The 6 tables are:

table_400_undef_undef_1 (400 rows of data)

table_400_undef_4_1 (400 rows of data and 4 partitions)

table_300_undef_undef_1 (300 rows of data)

table_300_undef_4_1 (300 rows of data and 4 partitions)

table_290_undef_undef_1 (290 rows of data)

table_290_undef_4_1 (290 rows of data and 4 partitions)

tables = {rows = {400, 300, 290}, partitions = {'undef', 4},}

fields = {types = {'int','tinyint','smallint','bigint','decimal(40, 20)','float','double','char(20)','varchar(20 )','enum','set','datetime','bool','bit(64)','timestamp','year','date'}, keys = {'key'},}

data = {numbers = {'null','tinyint','smallint','decimal', }, smallint = {null,'smallint'}, mediumint = {null,'mediumint'}, tinyint = {null, ' tinyint'}, bool = {1, 0, null}, year = {'null','year'}, datetime = {'null','datetime'}, timestamp = {'null','datetime'}, date = {'null','date'}, strings = {'null','letter','english'},} Copy the code 2. Define the join.yy file. In the yy file in this example, the inl_merge_join and inl_hash_join algorithm query statements are specified by the hint. In addition to the specified fields in the generated sql statement, the tables and fields in the query conditions will be randomly combined. The generated SQL example: SELECT/*+ inl_hash_join(t1) */t1.pk, t2.pk from table_290_undef_undef_1 t1, table_400_undef_undef_1 t2 where t1. col_enum_key_signed= t2. col_int_key_signedand t1. col_smallint_key_signed<-5418830167423061551 order by t1.pk, t2.pk ; Copy code query: select

select: SELECT hint_begin inl_merge_join(t1, t2) */col_list FROM _table t1, _table t2 where condition and condition1 order by t1.pk, t2.pk; SELECT hint_begin inl_hash_join(t1) */col_list from _table t1, _table t2 where condition and condition1 order by t1.pk, t2.pk;

col_list: t1.pk, t2.pk

condition: t1. _field = t2. _field

condition1: t1. _field_int <_int

hint_begin:/*+ Copy code 3. Run A/B test according to join.zz.lua and join.yy files. In this example, the TiDB query result will be compared with the MySQL query result. SQL with inconsistent query results will be recorded in the dump subdirectory of the current directory. ./go-randgen exec -Z join.zz.lua -Y join.yy --dsn1 "root:password@tcp(127.0.0.1:3306)/test" --dsn2 "root:@tcp(127.0.0.1: 4000)/test" -Q 2000 2020/12/25 16:37:18 Open DB ok, starting generate data in two db by ddls 2020/12/25 16:37:18 load zz from join.zz.lua 2020/12/25 16:37:20 generating data ok 2020/12/25 16:37:20 starting execute sqls generated by yy 2020/12/25 16:37:20 load yy from join.yy 2020/12/25 16: 37:32 dump ok Copy code go-randgen's practice in TiDB testing Through go-randgen to test TiDB's join algorithm, we have found 10 correctness-related issues, such as:

By covering different types, compare column values. Such as: select * from _table where _field> _field. It was found that the time column and the year column were compared incorrectly, which was recorded in tidb/issues/20121.

Test the distinct statement. Such as: select count(distinct(t1. _field)), count(distinct t1. _field, t1. _field) from table_400_utf8_undef t1, table_290_utf8_undef t2 where t1. _field = t2. _field and t1. _field = t2. _field_ t1. _field_t1. != _int. An error in the calculation of distinct is found, which is recorded in tidb/issues/20237.

In addition to the random type, by expanding the coverage of a single statement and combining statements randomly, the SQL statement context is relevant. Such as: alter table _table add index {print(string.format("t%d", math.random(10,2000000)))) (_field); SELECT t1.pk, t2.pk from t t1 left join t t2 on t1. _field = t2. _field where t1. _field != _int order by t1.pk, t2.pk. After discovering that the index was added, the query reported an error, which was recorded in tidb/issues/20698.

The issues found remind us to be in awe of the quality of TiDB, and also confirm that it is feasible to analyze the problems found in the past, summarize the scenarios and expand the scope of test points. Subsequent join tests will also continue to cover more data types, try more statement combinations and scenario combinations, such as adding data to and deleting data in a transaction, and then randomly combining them with join queries. In the future, we can improve the coverage of join test by continuously improving the zz and yy files. However, the SQL generated by go-randgen has a very fixed structure. If the test points cannot be known in advance, the query statement cannot be constructed to carry out effective test coverage. Is there a way to randomly generate join query statements for testing? We are currently implementing the function of randomly generating join queries in Horoscope (optimizer detection tool). In addition, we refer to the ideas in Manuel Rigger's "Testing Database Engines via Pivoted Query Synthesis" paper. Horoscope will randomly select a row of data in some tables as the pivot row to construct the query, so that the results returned by the query will contain these selections. Row. Due to the complexity of join queries, join testing will be a long-term but valuable thing. In addition, optimizer testing, region testing, TiDB cluster chaos testing, and transaction testing are also very important and valuable tasks. If you are interested, welcome to use go-randgen or other tools to test TiDB. If you find any problems, you can create an issue on GitHub and report it to us. If you have better test methods, test ideas and test tools, welcome to communicate with us in TUG. .markdown-body pre,.markdown-body pre>code.hljs{color:#333;background:#f8f8f8}.hljs-comment,.hljs-quote{color:#998;font-style:italic}.hljs- keyword,.hljs-selector-tag,.

Author: PingCAP link: juejin.cn/post/691014... Source: Nuggets forward ----- PingCAP