Overview and Advantages of Klustron (formerly KunlunBase)

Value of Klustron for DBAs and Application Development Teams

Zetuo Kunlun's Klustron is a distributed database system that addresses a series of technical challenges in the storage, management, analysis, and utilization of massive data. It supports high-concurrency, high-load transaction processing, delivering exceptional performance with high throughput and low latency. Built on the relational data model, Klustron seamlessly integrates GIS, JSON, text, and vector data management and query retrieval. This significantly simplifies application architecture design and development complexity, reduces backend system maintenance complexity and hardware resource costs, and offers standard interoperability and compatibility with upper-layer applications and other data processing components, helping users achieve a pluggable, standardized, component-based IT system architecture.

Klustron offers comprehensive distributed database features, including automatic data partitioning, horizontal elastic scaling, distributed transaction processing, distributed parallel query processing, high availability, strong consistency, automatic fault recovery, physical and logical data backup and recovery, disaster recovery, and data streaming export (CDC). It also has the capability to integrate multiple data models, as well as in-database computing and machine learning capabilities, making it an AI-ready scalable data infrastructure.

Technical challenges in the storage, management, and analysis of massive data include:

Limited computing and storage resources of a single server are insufficient to handle the continuously growing scale of data management and access loads.
Various software and hardware failures can lead to computer server node and network failures, posing significant challenges to maintaining long-term reliability, continuous data read/write services, and ensuring data durability and consistency.
Maintaining consistently high throughput and low latency under elastic and fluctuating data read/write loads is technically challenging but essential for a smooth and seamless end-user experience.
The compatibility with existing MySQL and PostgreSQL offers tremendous value, leveraging the vast ecosystem of applications built on these databases.
Managing relational data, JSON, GIS, text, and vector data using multiple databases increases maintenance burden and application system development complexity.

1. The Headache of Database Sharding Middleware

To address the first challenge, many MySQL users have turned to database sharding middleware or implemented sharding logic within their applications. These "makeshift solutions" have numerous significant drawbacks and fail to address the second, third, and fourth challenges. Using these methods requires business systems to handle data management functions on a case-by-case basis, including transaction processing and fault tolerance. This is an almost impossible task for most application development teams, leading to severe issues with reliability, stability, and maintainability. Development becomes more complex, timelines become unpredictable, project delays become more likely, and labor costs increase. Furthermore, these solutions cannot achieve automatic elastic scaling because the sharding logic is tightly coupled with the application.

Klustron completely and reliably addresses all these challenges! Klustron encapsulates all database system functions, allowing application developers to focus solely on implementing business logic. Regardless of the volume of data to be stored and managed or the online access load, users (DBAs, application developers, and architects) can rely on Klustron to manage data. DBAs can add or remove database server hardware as needed, and Klustron will automatically handle the elastic scaling to accommodate varying loads. This greatly enhances developer productivity, significantly reduces the workload and technical difficulty of application development, ensures the quality, stability, and reliability of the software, and lowers project timelines and costs.

For application developers, using Klustron is just like using MySQL and PostgreSQL. Klustron supports JDBC, ODBC, Hibernate, MyBatis, and client libraries for all common programming languages. Applications written in these languages can connect to Klustron and execute all standard SQL statements, as well as MySQL and PostgreSQL-specific DML statements, without any modifications. Additionally, Klustron supports full and incremental data import from all common relational databases, making it easy for users to migrate to Klustron or from Klustron to other databases at any time.

Klustron allows developers to focus on creating and implementing application logic and functional requirements without needing to handle data management within the application system. This significantly enhances their productivity and the reliability and user experience of their business systems and products, while reducing IT system costs for hardware, software, and personnel, and ensuring predictable and controlled online business system deployment timelines.

Let's analyze the issues with using application-layer sharding or sharding middleware in detail. Problems with row-routing middleware like mysql_proxy, mysql_router, and mycat include:

1.1 They Do Not Support Comprehensive Distributed Query Processing

These middleware solutions cannot handle advanced SQL functionalities, such as multi-table joins, subqueries, CTEs, window functions, and aggregations. As a result, numerous tools and algorithms within the SQL ecosystem, such as low-code tools, ORM middleware (e.g., Hibernate), machine learning algorithms, and various data analysis tools, cannot interact or collaborate with these middleware solutions.

When using application-layer sharding, developers must write business code to query data segments from different storage clusters and then assemble the final result in the application code. This process essentially implements query processing and execution for each specific SQL statement at the application layer. If the "query statement" needs modification, it requires significant changes to the application code, making maintenance highly costly.

This task, which could be accomplished by sending a SQL statement to a distributed database, becomes a significant workload without a distributed database. Particularly, such query processing code may need repeated modifications due to changes in business logic requirements, making this development work much larger and more complex than directly modifying SQL statements.

1.2 They Do Not Support Reliable Disaster Recovery for Distributed Transactions

Many developers do not realize the business risks of not using two-phase commit for distributed transactions, remaining unaware of the potential issues. A few developers acknowledge the risk but are unable to resolve it, leading to a laissez-faire attitude.

Some developers recognize and can address the problem but only on a case-by-case basis. For example, to ensure reliable transfer functionality, they need to design a technique at the business layer to handle disaster recovery for transfer scenarios. Other scenarios would require new designs and implementations of algorithms.

This significantly raises the technical barriers and workload of application development, posing substantial risks and uncertainties to product reliability and stability. Project delays become more likely, and development costs increase. Although some middleware uses MySQL's XA feature for two-phase commit, they cannot reliably ensure ACID properties during node failures, network issues, or timeouts.

A common issue with these solutions is that developers need to know where each table is stored across different storage clusters to correctly manage data and queries. This further ties data management to business logic, which goes against the primary purpose of database systems—to encapsulate data management and let developers focus solely on business logic without worrying about data storage details.

1.3 They Cannot Achieve Automatic Horizontal Elastic Scaling

Scaling requires manual intervention by DBAs and necessitates service downtime (e.g., several hours), which severely impacts business continuity and user experience.

2. Application Sharding: Approach in the Stone Age

An even more primitive method to address the challenges of large-scale data storage and heavy access loads is application-layer data partitioning. This approach is considered more primitive because it not only suffers from all the aforementioned complex issues but also introduces a range of serious problems, to the extent that we can say these companies' products and services are still in the Stone Age. Surprisingly, there are still quite a few companies that operate in this archaic manner.

Unique Issues with Application-Level Sharding:

2.1 Hard-Coded Sharding Logic

This method requires implementing similar functionality for each table, resulting in a high development burden and complexity. Particularly, if multiple applications or web services need to use the same set of data tables (a common scenario), maintaining consistent sharding rules across all these programs will exponentially increase the development workload and complexity, not just linearly.

Even if you try to be smart by using configuration files like the aforementioned middleware, you still face problems. Implementing sharding logic greatly increases the business development workload. Ultimately, you end up with a mediocre middleware solution, used only by your company or team and possibly only suitable for your specific business scenarios. This further binds data management tightly to application logic, which is a very poor system design.

2.2 Horizontal Scaling is Even More Difficult

With hard-coded sharding logic, elastic scaling becomes nearly impossible because you need to modify business code to implement new data partitioning rules for scaling, which is a nightmare for developers and DBAs.

Therefore, we decided to develop a truly distributed database product to rescue users from the frustration of traditional methods and the "Stone Age" of application-level sharding, bringing them into the modern era of technology to experience the charm of cutting-edge modern tech.

From now on, they will no longer have to rack their brains to design and implement distributed data management and query programs. They can simply send SQL statements to initiate and commit distributed transactions and execute distributed queries to directly get the results.

This effectively separates data management from application software, abstracting data management from application logic—this was the original intention of the database theory and technology pioneers 50 years ago. They learned the hard way, investing countless man-months and dollars to:

Develop independent database systems for data management, separating application development from general data management logic, thereby maximizing software reuse and simplifying application development. This greatly enhances developer productivity and the reliability of their business logic and products, lowers the technical barriers for user business systems, significantly improves their reliability, reduces development costs, and ensures the online business system's launch time is controllable and predictable.

3. Klustron's Integration Capabilities

Klustron leverages and enhances the extensive plugins supported by the PostgreSQL community. Among these plugins, the Klustron team has extended the capabilities of the PostGIS and PGVector plugins, while other plugins can be directly used by compiling the community version source code with Klustron headers. Through the PostGIS and PGVector plugins, Klustron gains GIS and vector data management capabilities, making it a distributed PostGIS and a distributed PGVector. Klustron can manage and query JSON data and provide text data storage and retrieval, particularly the combined search capabilities of text and vector data. Users can query target data within a single SQL statement using relational data predicates, JSON content fragments, spatial relationships, vector distances, and text keywords. For more details, see section 5 of this chapter.

Below, you can read the main contents of this chapter to gain a detailed understanding of Klustron's architecture, basic concepts, and technical advantages, preparing you for using Klustron.

1. Klustron System Architecture

2. Klustron Core Capabilities

3. Klustron Technical Advantages

4. KunlunBase FAQ

5. Integration of Multiple Data Models

Overview and Advantages of Klustron (formerly KunlunBase)

# Overview and Advantages of Klustron (formerly KunlunBase)

# Value of Klustron for DBAs and Application Development Teams

# 1. The Headache of Database Sharding Middleware

# 1.1 They Do Not Support Comprehensive Distributed Query Processing

# 1.2 They Do Not Support Reliable Disaster Recovery for Distributed Transactions

# 1.3 They Cannot Achieve Automatic Horizontal Elastic Scaling

# 2. Application Sharding: Approach in the Stone Age

# 2.1 Hard-Coded Sharding Logic

# 2.2 Horizontal Scaling is Even More Difficult

# 3. Klustron's Integration Capabilities

# END

Overview and Advantages of Klustron (formerly KunlunBase)

Value of Klustron for DBAs and Application Development Teams

1. The Headache of Database Sharding Middleware

1.1 They Do Not Support Comprehensive Distributed Query Processing

1.2 They Do Not Support Reliable Disaster Recovery for Distributed Transactions

1.3 They Cannot Achieve Automatic Horizontal Elastic Scaling

2. Application Sharding: Approach in the Stone Age

2.1 Hard-Coded Sharding Logic

2.2 Horizontal Scaling is Even More Difficult

3. Klustron's Integration Capabilities

END