Skip to main content

MySQL 8.0 Atomic DDL

KlustronAbout 4 min

MySQL 8.0 Atomic DDL

Key Takeaway

The introduction of atomic DDL in MySQL 8.0 stands out as a pivotal enhancement, significantly resolving longstanding challenges like DDL crash-safety that have troubled MySQL users by fundamentally restructuring MySQL's data dictionary.

The atomic DDL feature of MySQL 8.0 is a crucial advancement in this release. It solves the widely criticized issue of system anomalies caused by crashes during DDL execution in MySQL, achieving transactional atomicity for DDL operations. This tech talk will primarily focus on the principles and implementation methods of atomic DDL, as well as new features developed by Klustron based on this foundation, hoping to provide a comprehensive understanding of this key feature and a deeper insight into Klustron.

01 Overview of MySQL DDL:

Let's start with a concise overview of the evolution of MySQL DDL.

l Before MySQL 5.5, DDL operations were limited to the Copy method. This involved data duplication and prohibited concurrent writing to the table being modified.

l With the advent of MySQL 5.5, Inplace DDL operations were introduced. For DDL operations compatible with the inplace method, data duplication was unnecessary, allowing changes to be applied directly to the existing table data. However, concurrent modifications were still not supported.

l MySQL 5.6 marked the introduction of Online DDL. This enabled most DDL operations to be performed with concurrent reading and writing capabilities, greatly minimizing the DDL operations' impact on the overall system.

l MySQL 8.0 introduced instant DDL, a feature that accomplishes DDL operations instantaneously by only altering the data dictionary instead of the actual data.

Exploring the Pre-MySQL 8.0 Data Dictionary Structure:

Before the advent of MySQL 8.0, the data dictionary structure was characterized by:

l Dual-layer data dictionary structure: Both the server layer and the engine layer had data dictionaries, and these two needed to be synchronized.

l Information stored in multiple locations: Stored in text files, MyISAM, and InnoDB.

• Server layer:

• Text files: .FRM, .OPT files, etc.

• MyISAM system tables: user, events, etc.

• Storage Engine layer:

• InnoDB internal system tables: sys_tables, sys_indexes.

Why Atomic DDL?

Why is atomic DDL necessary? The main reasons include:

l Inconsistencies in data dictionary information: Frequent discrepancies between data dictionaries at the server layer and InnoDB layer.

l Absence of unified data dictionary management: Utilizing files and the MyISAM engine for storage without a consistent API.

l Lack of atomicity in DDL operations, leading to issues with crash safety and data replication.

l Limited scalability and challenges in system upgrades.

02 In-Depth Look at MySQL 8.0's Atomic DDL Feature

The Data Dictionary in MySQL 8.0:

Key design elements include:

l Unified data dictionary information: Removal of .FRM files and MyISAM system tables, centralizing everything into a unified system of tables within the InnoDB storage engine.

l Unified API for accessing and managing the data dictionary.

l Introduction of a new data dictionary cache.

l Development of a novel set of storage engine APIs to facilitate atomic DDL and ensure crash safety.

l Re-engineering of the information schema as views based on data dictionary tables.

MySQL 8.0 Data Dictionary API Framework:

• Data Dictionary Client: Provides a unified read-write access interface for SQL clients and the storage engine layer.

• DD Shared Cache: A cache for shared dictionary objects.

• Storage Adaptor: An interface encapsulating the InnoDB system tables.

• Storage Layer Interface: The Handler interface.

Implementation Method of Atomic DDL in MySQL 8.0:

• A DDL operation completed by multiple transactions becomes a single DDL Trx transaction, ensuring atomicity in dictionary table operations.

• Recording DDL logs (DDL log) to ensure atomicity of data file operations.

• During crash recovery, the commit status of the DDL transaction is used to decide whether to commit or roll back.

Four Phases of MySQL 8.0 DDL:

• Prepare: Create necessary objects and write DDL log into the system table mysql.innodb_ddl_log.

• Perform: Execute the DDL operation, such as the necessary actions to create a table.

• Commit: Update the data dictionary and commit the DDL Trx.

• Post-DDL: Replay and delete the DDL log.

Example of Atomic DDL (drop table):

Drop table T1;

• Prepare: Locate the dictionary object T1.

• Perform: Remove related information of T1 from the data dictionary tables.

• Commit: Submit the DDL transaction, evict the T1 dictionary object cache.

• Post-DDL: Delete the data file t1.ibd, etc.

The following diagram illustrates the call process for the drop table DDL operation:

The next diagram shows the DDL log content for the Drop table record:

As inferred from the discussion, when a DDL operation successfully completes and commits, the system will delete the data files in the post-DDL phase. In case of a rollback, since the contents of the DDL log will be rolled back, the contents of the DDL log are not executed, and all changes to the system tables are simply rolled back.

03 Klustron's DDL

To begin, let's introduce the core architecture of Klustron, a distributed data solution developed by Zetuo Tech:

Klustron’s Distributed Storage-Compute Architecture

• Compute Layer (Klustron-server): Comprising multiple PostgreSQL instances, the compute nodes are responsible for accepting and validating connection requests from application software, receiving SQL query requests from established connections, executing these requests, and returning the results.

• Storage Layer (Klustron-storage): Three or more MySQL 8.0 instances form storage nodes, creating a storage cluster (also known as a shard). Each shard stores a portion of user tables or table partitions.

• The metadata cluster stores Klustron cluster metadata including topology, node connection information, DDL logs, commit logs, and other cluster management logs.

• The cluster_mgr is responsible for maintaining correct cluster and node states, implementing cluster management, logical backup and recovery, physical backup and recovery, horizontal scaling, and other functions.

Next, let’s discuss the Online DDL feature of Klustron.

Klustron’s Online DDL (Repartition)

Method: Export data from the source table and write it into the target table, then import updates made to the source table during this period into the target table. Detailed steps:

  1. Exporting full table data: node_mgr uses mydumper to dump data from the source table and transfers data files to the compute node server.
  2. Loading full table data: node_mgr uses the kunlun_loader tool to import the dumped full data of the source table into the target table.
  3. Binlog catch-up: node_mgr uses the binlog2sync tool, starting from the recorded binlog starting position on each shard at the time of the dump, to dump binlog events.
  4. Renaming source and target tables: The binlog2sync tool quickly completes the remaining binlog synchronization, then renames the target table to the source table name, resuming normal business operations.

Klustron's DDL Future Prospects:

Klustron has successfully integrated the critical Online DDL feature and is actively working on developing and planning additional DDL-related functionalities, such as:

l Advanced improvements and optimizations for Online DDL (like performance enhancements through parallel processing).

l Transactional DDL (advancing beyond atomicity to fully transactional DDL). and others.

04 Q&A

Q1: What is the difference between transactional DDL and atomic DDL?

Atomicity in DDL operations ensures indivisibility, meaning a DDL either commits (executes successfully) or rolls back (fails), with no intermediate state. However, DDL operations in MySQL still cannot coexist with other transactions. That is, a DDL cannot be included in a standard user transaction to commit or roll back with it. Transactional DDL allows for DDL operations to be part of regular user transactions, leaving the decision to commit or roll back to the user. PostgreSQL supports transactional DDL.

Q2: Where can I try Klustron?

Those interested in Klustron can download and try it from our official website, following the installation documentation for deployment. Additionally, we offer serverless services of Klustron on Amazon's marketplace and Alibaba Cloud, which you can also try if interested.