Can sub-databases and sub-tables be able to expand unlimitedly?

Can sub-databases and sub-tables be able to expand unlimitedly?

Preface

A rookie like me always has all kinds of questions. At first, questions about JDK API, questions about NIO, and questions about JVM. After working for a few years, the availability and scalability of the service will also be new. Question, what question? In fact, it is a commonplace topic: the issue of service expansion.

The evolution of services under normal circumstances

Let's start from the beginning.

  1. Monolithic applications. Every startup company is basically based on architectures like SSM and SSH. There is nothing to talk about. Basically every programmer has experienced it.
  2. When the business of RPC applications is getting bigger and bigger, we need to expand the service horizontally. The expansion is very simple. Just make sure that the service is stateless, as shown in the figure below:

When the business is getting bigger and bigger, our service relationship is intricate and complicated. At the same time, there are many service accesses that don't need to connect to the DB, only need to connect to the cache, then it can be separated and reduce the precious connection of DB. As shown below:

I believe most companies are at this stage. Dubbo was born to solve this problem.

  1. Sub-library and sub-table

If your company's products are very popular, your business continues to develop at a high speed, more and more data, and SQL operations become slower and slower, then the database will become a bottleneck. Then you will definitely think of sub-database sub-table, no matter by ID hash or range Any way is ok. As shown below:

There should be no problem now. No matter how many users you have, no matter how high the concurrency is, I just need to expand the database infinitely, and expand the application infinitely.

This is also the title of this article. Can sub-databases and sub-tables solve unlimited expansion?

In fact, the structure like the above cannot be solved.

In fact, this problem is similar to the RPC problem: too many database connections! ! !

Usually, because our RPC applications use middleware to access the database, the application actually does not know which database to access. The rules for accessing the database are determined by the middleware, such as sharding JDBC. As a result, this application must be connected to all databases. Just like our architecture diagram above, an RPC application needs to be connected to 3 mysql. If there are 30 RPC applications, the database connection pool size of each RPC is 8. Each mysql needs to maintain 240 connections. We know that the default number of mysql connections is 100, and the maximum number of connections is 16384. That is to say, assuming that the connection pool size of each application is 8, more than 2048 applications cannot continue to connect. , It cannot continue to expand. Note that since each physical library has many logical libraries, and the microservices movement is in full swing, 2048 is not as big as it seems.

Maybe you said that I can solve the problem of the number of connections by adding a proxy in front. In fact, the performance of the proxy will also become a problem. Why? The number of proxy connections cannot exceed 16384. If the concurrency exceeds 16384, it becomes 163840, and the proxy cannot solve the problem.

How to do? Let us look at the above architecture diagram again:

We found that the problem is that "every RPC application has to connect to all libraries", which leads to an increase in the number of connections to each database as the application expands. Even if the database is increased, the problem of the number of connections cannot be solved.

then what should we do?

Unitization

Unitization sounds tall. Usually at some XXX conferences, when you share awesome terms such as "about two places and three centers", "three places and five centers", "live in different places" and so on, the unitization will also work together. appear.

Here we do not discuss so awesome, just talk about the problem of "too many database connections".

In fact, the idea is very simple: we just don't let the application connect to all databases.

Suppose we divide into 10 libraries according to the range. Now there are 10 applications. We let each application connect to only one library. When the number of applications increases to 20 and the database connection is not enough, we divide the 10 libraries into 20. Libraries, in this way, no matter how many applications you expand to, you can solve the problem of excessive database connections.

Note: The prerequisite for doing this is: you must ensure that the database accessing your request of this application must be in this application. s

In other words, when users come in from DNS, they know which application they are going to. Therefore, the rules are set before DNS. Although this is a bit exaggerated, they must know which library to go to before entering the application.

Therefore, this usually requires a rule, for example, through the user ID hash, the configuration center broadcasts the hash rule. In this way, all components can maintain consistent rules to access the database correctly. As shown below:

At this point, we finally solved the problem of unlimited expansion.

At last

Starting from a single application, this article gradually describes the evolution of a normal background, knowing that sub-databases and sub-tables cannot solve the problem of "unlimited capacity expansion". Only unitization can solve this problem. However, unitization brings more complexity. But the benefits are self-evident.

More ideas brought by unitization.

With unitization, the problem of unlimited expansion has been solved, but we have not considered the single point problem, that is, the availability of services. You know, our database here is single point.

This is another topic-live more in different places. Due to space limitations, we will talk next time.