Skip to main content

Online DDL and Repartitioning

KlustronAbout 3 min

Online DDL and Repartitioning

Key Takeaway:

The online DDL and repartitioning feature of Klustron facilitates seamless modification of table structures without disrupting business system operations.

1 Why is this feature needed?

Business requirements and changes in data volume can result in the need to modify the existing table structure. Some possible scenarios include:

  • Inefficient partitioning leading to slower query and insertion operations.
  • Reduced data size, requiring fewer partitions than before.
  • Need to add or remove columns or modify column data types.
  • Modifying primary keys or adding indexes.

When we encounter these challenges, we need to modify the table structure to meet new business requirements.

However, the tables that need modification are often actively accessed by the business system and may even be hotspots. This poses a challenge: How can we make these changes without disrupting the ongoing business operations?

To address this challenge, the Klustron team has developed the online DDL and table repartitioning feature, referred to as "repartition." This feature not only supports transitioning between single-table and partitioned-table structures, modifying table partitioning rules, and adjusting partitioning parameters but also enables all DDL operations to be performed online (without table locking and without disrupting business operations).

2 How?

![img](Online DDL和repartitioning/1.png)

Here's how Klustron achieves table changes without impacting business operations:

  • Exporting full table data: The node_mgr uses the mydumper tool to export data from the source table and transfer the data files to the computing node's server.
  • Loading full table data: The node_mgr utilizes the klustron_loader tool to load the exported data into the target table.
  • Binlog catch-up: Using the binlog2sync tool, the node_mgr synchronizes the remaining binlog events starting from the recorded position during the data export.
  • Renaming the source and target tables: Once the remaining binlog events are synchronized, the binlog2sync tool renames the target table to the source table name, and the business can resume normal operations.

In summary, Klustron's approach to online DDL is similar to Percona's gh-ost tool. Both methods involve copying the source table data and following the binlog to perform table structure changes without locking the table.

**3 **More Detail

Methods:

Klustron's online DDL offers two operational methods:

Method 1: Using the visual and user-friendly Klustron cluster management tool,

XPanel

![](Online DDL和repartitioning/2.png)

![](Online DDL和repartitioning/3.png)

As shown in the figure, users can perform online DDL or table repartitioning by simply filling in the relevant information, such as cluster, source table, and target table, in the table repartitioning interface of XPanel, and then submitting the request.

Method 2: Using the API of the cluster management tool, cluster_mgr

curl -d '

{

 "version":"1.0",

 "job_id":"",

 "job_type":"table_repartition",

 "timestamp":"1435749309",

 "user_name":"kunlun_test",

 "paras":{

  "src_cluster_id":"1",

  "dst_cluster_id":"3",

  "repartition_tables":"postgres_$$_public.transfer_accout=>postgres_$$_public.account"

 }

}

' -X POST http://127.0.0.1:58000/HttpService/Emit

We can illustrate the scenario of table structure modification through an example.

![](Online DDL和repartitioning/4.png)

In the figure, we use online DDL to transform a partitioned table "transfer_account" into a regular table "account."

There are some important points to note regarding online DDL:

Column Order: All column names from the source table must appear in the new table, but the order of these columns can be different. Additional columns can be added to the new table, but they must have default values or allow NULL values. A single online DDL operation can include multiple actions, such as adding columns (which can be placed at any position in the table), rearranging column order, and modifying column constraints and default values.

Column Data Types: Columns with the same name in the source table and new table must have identical data types or belong to the same data type category. The data type in the target table can be wider but not narrower than that in the source table. For example, if the source table column is of type "int," the corresponding column in the target table can be of type "bigint", but not "smallint" or "tinyint".

Indexes: The source table and new table can have completely different index definitions and primary key definitions. However, the data in the source table must satisfy all the unique index and primary key constraints of the new table, otherwise the data import will fail. For example, the new table may be intended to modify the primary key or add indexes or unique indexes.

Partitioning: The source table and new table can have completely different table partitioning rules and parameters. Both tables can be a regular table, mirrored table, or partitioned table. For example, the source table may be a regular table without partitioning, while the new table is partitioned according to certain rules.

Table Constraints: The source table and new table can have different table-level check constraints and trigger definitions. However, the data in the source table must pass the check constraints of the new table; otherwise, the data import will fail.

4 Q&A

Q1: MySQL also has online DDL functionality. How does your implementation differ from MySQL's, and what advantages does Klustron have?

Answer: The main differences are:

1: MySQL's online DDL handles concurrent DML operations by recording and replaying modifications on the new table, while Klustron uses binlogs.

2: MySQL's online DDL requires support from the storage engine, but Klustron does not.

3: MySQL's online DDL has limitations on certain DDL operations, such as dropping primary keys, whereas Klustron can handle all types of DDL operations.

Therefore, Klustron's online DDL has a broader range of applications and supports a wider range of DDL operations compared to the native online DDL in MySQL.

Q2: Is the target table an intermediate table? Will it eventually be renamed to the name of the source table? Will the old source table be dropped before the final rename?

Answer: Yes, by default, the target table is renamed to the source table once the synchronization is complete. However, users can choose not to rename it. The old source table is not immediately dropped but is instead renamed with a suffix "tb_repartition${job_id}". It is automatically deleted after 7 days by default. Users can also manually clean it up through XPanel or specify the number of days before automatic deletion.