Skip to main content

Global MVCC Mechanism

KlustronAbout 4 min

Global MVCC Mechanism

Introduction

Klustron, as a distributed database capable of fully supporting strong consistency scenarios such as finance and securities, considers data read consistency an indispensable feature. Global MVCC is a global consistency mechanism designed to solve the problem of read consistency in distributed environments. It achieves global data read consistency by setting a global data version number for distributed transactions, which allows the current transaction to capture a snapshot.

In this Tech Talk, we will delve into the Global MVCC feature and its technical implementation, aiming to enhance understanding of this key feature while offering an in-depth look at Klustron.

Key Takeaway: Klustron's Global MVCC feature creates a snapshot of the current transaction by setting a global data version number, thereby achieving global data read consistency.

To facilitate communication within our team and beyond, we are excited to announce the launch of the Klustron's BBS forum. We invite everyone to join us there! (Link: https://forum.klustron.com/)

The forum is currently in a beta version and may experience some instability. We encourage you to share information and seek your understanding should any issues occur.

01 Klustron Introduction

First, let's briefly introduce Klustron, the distributed database product from Zetuo Technology. The following figure illustrates the overall architecture of Klustron. As depicted, Klustron is a distributed database product with separate storage and computation.

As a comprehensive distributed database solution, Klustron is equipped with several robust features:

  • Scalable compute and storage abilities

    • Data partitioning: hash, range, list

      • Support for any number and type of partition columns
    • Data distribution

      • Including auto, random, mirror, and table grouping
    • Automatic, flexible, non-disruptive, business-unintrusive, and transparent to end-users

  • Financial-grade high reliability

    • Automatically handles software, hardware, network failures, and complete data center outages

      • Ensures data integrity and continuous service
      • Aims for RTO < 30 seconds & RPO=0
    • Automatic detection of primary node failures with primary/standby switching

  • HTAP: Harmonious OLTP & OLAP without interference

    • OLTP-focused: Equivalent to using MySQL or PostgreSQL for application software

    • OLAP as a secondary focus: High-performance through multi-level parallel queries

    • Flexible computing with multi-language stored procedures: ML, privacy computing

  • Ecosystem compatibility

    • Supports PostgreSQL and MySQL connection protocols and SQL syntax

    • Compatible with common MySQL DDL syntax

    • Supports JDBC, ODBC, and common programming language connectors for PostgreSQL and MySQL clients

  • Comprehensive multi-level security

    • Encrypted storage and transmission
    • Multi-level access control mechanisms

02 WHY

Why is Global MVCC Necessary?

Let's explore the issue of read consistency in distributed transactions. Illustrated below is a typical scenario:

  • Ongoing distributed transaction GT1

    • Writes to multiple shards (shard1 GT1.t1 & shard2 GT1.t2)
    • Employs a two-phase commit process
  • Active SELECT query (GT2)

    • Reads the update GT1.t1 in shard1
    • Fails to read the update GT1.t2 in shard2

This situation leads to inconsistency, where only partial data of the transaction is readable.

To address this issue, Klustron has implemented Global MVCC. The principle behind Global MVCC is to establish a global snapshot that captures the visible data for the current transaction, thereby ensuring read consistency across the distributed environment.

03 Principles and Implementation of Global MVCC

The implementation of Global MVCC requires modifications at various levels, including the upper compute nodes and metadata clusters, as well as the lower storage nodes.

Starting with the upper compute nodes, as shown in the following figure, it is essential to assign a global version number to all distributed transactions. Then, this global version number is used to establish a global snapshot.

At the lower storage nodes, modifications are made to the MySQL InnoDB storage engine to support the global snapshot, as depicted in the subsequent figure. Key changes were made to InnoDB's transaction visibility judgment process.

Global Visibility Judgment Algorithm: Specifically for XA Transaction Updates

  • Begin with local visibility assessment

  • Local visibility does not necessarily imply global visibility

    • Transactions with a version number less than local_xmin are definitely visible
  • Local invisibility doesn't always mean global invisibility

    • Transactions not started at the time of snapshot acquisition are definitely not visible
  • Global version number comparison

  • What if it's globally invisible?

    • Use undo log to generate an older version of the row

After these modifications, let's compare the processes before and after the changes, as shown in the next figure. With the global version number and global transaction snapshot, consistency issues in transactions can be avoided.

Finally, let's analyze the performance cost of Global MVCC. Since some key processes of Global MVCC incur time and resource costs, there is a certain performance degradation. Based on our tests and analysis, the performance loss is between 5% and 10%, which is within an acceptable range.

Comprehensive Analysis:

Compute Nodes:

  • No new time overhead added

  • Assigning GVNO (Global Version Number): An integer is issued with the XA COMMIT statement and stored in the tgvc_cache's tgvc

    • This is considered negligible
  • Acquiring Global Snapshot: Involves network transmission overhead

    • Occurs once for each SELECT statement (RC) or per transaction (RR)
    • The current value is obtained from the metadata cluster sequence with select currval('global_mvcc_seq')
  • Allocating Global Snapshot: An integer is sent with each SELECT statement

    • This is also considered negligible

Storage Nodes:

  • Management of tgvc (Transaction Global Version Control): Negligible

  • Global MVCC Visibility Judgment Logic: Involves integer comparisons

    • Minimal READ waiting time for setting the global version number: Typically < 20ms
  • Covered Index Searches: Relating to the max_trx_id at the page header, which indicates the last transaction that updated the page

    • Previously: If the readview was visible for all rows on the page (rv.m_up_limit_id > max_trx_id), the index row was directly returned.

    • Now: The above condition must be met, and if max_trx_id > local_xmin, a table lookup is required.

      • This results in a slightly increased proportion of table lookups

Purge: Undo logs are retained until they are no longer needed by the increase in global_xmin due to Global MVCC

04 Q&A

q1: In what scenarios should Global MVCC be enabled?

a1: Global MVCC is particularly beneficial in scenarios where data consistency is a high priority, such as in finance and securities. Since enabling this feature can lead to a performance decrease of approximately 5% - 10%, it's important to weigh the need for this level of consistency against the potential impact on performance. Deciding whether to enable Global MVCC should be based on the specific requirements of your application scenario.

q2: How can one try out Klustron?

a2: Those interested in Klustron can download a trial version from our official website and deploy it according to the installation guide. Additionally, we offer Klustron's serverless services on Amazon's marketplace and Alibaba Cloud, which are also available for trial.

We invite everyone to download and install the Klustron database cluster for free use (no registration code required).

Download the complete Klustron software package here:

http://downloads.klustron.com/

For purchases, please reach out to us at sales_vip@klustron.com. For further inquiries, our assistant is available on WeChat for support.

END