Skip to main content

Query Optimization Process

KlustronAbout 1 min

Query Optimization Process

Klustron is a distributed database system that separates computing and storage. When a query SQL is sent to any computing node (CN) of Klustron, the Klustron parser first parses the original query text and performs some simple validity checks. It then performs logic optimization on the query, such as query rewriting, partition pruning, column pruning, and predicate pushdown.

Klustron adopts the strategy of maximum pushdown during logic optimization.

Pushing down computation not only avoids network communication between CN and DN, but also fully utilizes the concurrent execution capabilities and resources of each DN, accelerating the query processing.

The optimized operators are divided into two categories:

  • Pushdown operators: The RemoteScan operator is pushed to the corresponding data node for execution, and the relevant data is pulled back to the computing node for subsequent processing after execution. The pushdown operators include filter conditions such as WHERE or HAVING, aggregation operators such as COUNT, GROUP BY, which are divided into two phases for aggregation calculation, sorting operators such as ORDER BY, JOIN, and subqueries, as well as project and distinct operators.

  • Partial operators that cannot be pushed down: For example, cross-shard joins need to pull data from data nodes to computing nodes for calculation. The optimizer will choose the best way to execute, such as selecting an appropriate parallelism strategy.

The global execution process is as follows:

The optimization process is as follows:

The maximum pushdown strategy is as follows:

In summary, to achieve optimal performance, it is necessary to fully consider the business scenario of executing SQL statements when defining the partition key, in order to minimize cross-node data operations.

END