KunlunBase Financial-Grade High Reliability

KunlunBase’s financial-grade high reliability is composed of a series of capabilities and technologies that form a comprehensive technical system. These capabilities and mechanisms act like solid stones, building a robust and indestructible financial-grade high reliability system for Klustron.

Klustron's complete financial-grade high reliability capabilities ensure that the Klustron cluster can quickly and automatically recover and maintain correct, complete, and consistent data in the event of various software, hardware, and network failures, without any manual intervention, greatly reducing the workload of DBAs.

Klustron's high reliability system operates automatically in the background, requiring no daily maintenance work from DBAs, nor any development work from application developers. Reading the related documents in this chapter will help you understand the working principles of the various technologies within Klustron's financial-grade high reliability system.

High Availability

First, the storage clusters (storage shards, or shards for short) composed of Klustron-storage instances have high availability. This means that in the event of a primary node failure within a shard, KunlunBase ensures that all successfully committed transaction data is correct, consistent, and not lost (RPO = 0). At the same time, it guarantees that the KunlunBase cluster can continuously provide data read/write access and transaction processing services (RTO < 30s).

In simple terms, the Kunlun-storage nodes use binlog master-slave replication to keep the replicas continuously receiving data changes from the primary node. The fullsync technologyensures that for each committing transaction, the primary node only confirms the transaction to the KunlunBase compute nodes after the replicas acknowledge receipt of all its binlogs. Then, the compute nodes confirm the transaction commit to the client application, ensuring that any confirmed transaction commit is present on several replicas of the shard. Therefore, during a primary node failover, the new primary node will have all the data updates of all transaction branches.

In the event of a primary node failure, KunlunBase’s fullsync HA technology can promptly detect the failure, elect a new primary node, and allow other replicas to continue replicating data updates from the new primary node.

Klustron storage clusters use Klustron-storage to form a 2 * N + 1 node master-slave replication cluster. The primary node replicates data changes to the replicas via binlog replication, and the replicas replay the data change events to achieve eventual consistency. Klustron’s fullsync technology ensures that any committed or prepared transaction’s binlog will be written to the relay logs of a specified number of replicas before returning success to the corresponding compute nodes. For a shard with 2 * N + 1 nodes, Klustron typically sets that each committed transaction must be replicated to at least N replicas, ensuring that in case of failure of up to N nodes, Klustron can still elect a replica with all committed and prepared transactions as the new primary node to continue handling data read/write loads.

After a primary node failure, compute nodes can automatically failover the storage cluster's primary node without manual intervention.

For the working principles of Fullsync and Fullsync HA, see Klustron Storage Cluster Fullsync Mechanism and Introduction of Klustron Fullsync HA Mechanism.

Klustron-Storage XA Transaction Processing

The Klustron team has fixed and filled a series of XA transaction recovery vulnerabilities and gaps in the community version of MySQL within kunlun-storage. This ensures that kunlun-storage can recover without data loss during node failures and restarts while executing XA transactions. See this article to learn about the work we have done.

Fault Recovery Capability of Distributed Transactions

Klustron’s distributed transaction processing has robust and complete fault recovery capabilities (crash safety & fault tolerance), ensuring that cluster node failures or network failures do not cause service interruption, data loss, or inconsistency. In case of compute node failure, there will be no data loss or inconsistency, only the client connections of the application software will be disconnected, causing the ongoing transactions to automatically roll back. As long as at least one compute node is running, the KunlunBase cluster can still perform normal read/write operations. Additionally, users can add more compute nodes at any time, and the new compute nodes will automatically update metadata and start working normally.

When a database server starts, all nodes of KunlunBase, including compute nodes, storage nodes, and metadata cluster nodes, are automatically started by the KunlunBase cluster management component, and data recovery is automatically completed to restore the data to the consistent state before the node's last exit.

Klustron ensures the ACID properties of distributed transactions initiated by clients during node/network failures through two-phase commit technology ensuring that data updates of committed transactions are not lost, damaged, or cause any errors.

Distributed DDL Transaction Processing and DDL Replication

A Klustron cluster can contain any number of compute nodes, which can be added or removed as needed. Klustron’s distributed DDL transaction processing and DDL replication technology ensures that when a client connected to any compute node in the cluster executes a DDL statement, the statement is automatically executed on all other compute nodes in the cluster. The execution order of all DDL statements is identical across all compute nodes, ensuring identical metadata. The execution of each DDL statement transaction involves compute nodes, metadata nodes, and storage nodes. Node failures will not cause errors in DDL execution and replication playback, ensuring transaction ACID properties.

Multi-Data Center High Availability

Starting from Klustron-1.2, Klustron not only ensures high availability at the cluster level, ensuring that the cluster continues to provide services and does not lose or damage data in the event of server failures, but also achieves high availability at the data center level. This means that even if an entire data center fails, the Klustron cluster can still work normally and continue to provide services without data loss or damage. This requires appropriate multi-data center installation and deployment of the Klustron cluster. See Multi-Data Center High Availability and Introduction to Klustron Cluster RCR Feature and Use Cases.

Essential Key Variable Settings

The key variable settings required for Klustron’s financial-grade high reliability system are already set in the configuration template files of the compute nodes and storage nodes. Users should not modify these settings. Modifying these variables may result in a loss of high reliability and high availability capabilities. The key variables and their correct values are as follows:

Storage Nodes and Metadata Nodes

innodb_doublewrite=1
innodb_flush_log_at_trx_commit =1
sync_binlog = 1
enable_fullsync = true
fullsync_consistency_level=N (If a storage shard has 3 nodes, then N=1. If a shard has 4 or 5 nodes, then N=2. N is the smallest positive integer that satisfies shard node count <= 2 * N + 1, and the shard node count must be greater than 1.)
fullsync_relaylog_fsync_ack_level = 2

Compute Nodes

fsync = on
synchronous_commit=on
full_page_writes=on
wal_writer_flush_after=0

Contents of This Chapter:

KunlunBase Financial-Grade High Reliability

# KunlunBase Financial-Grade High Reliability

# High Availability

# Klustron-Storage XA Transaction Processing

# Fault Recovery Capability of Distributed Transactions

# Distributed DDL Transaction Processing and DDL Replication

# Multi-Data Center High Availability

# Essential Key Variable Settings

# Storage Nodes and Metadata Nodes

# Compute Nodes

KunlunBase Financial-Grade High Reliability

High Availability

Klustron-Storage XA Transaction Processing

Fault Recovery Capability of Distributed Transactions

Distributed DDL Transaction Processing and DDL Replication

Multi-Data Center High Availability

Essential Key Variable Settings

Storage Nodes and Metadata Nodes

Compute Nodes