Klustron Cross-City High Availability: Two Cities and Three Data Centers
Klustron Cross-City High Availability: Two Cities and Three Data Centers
Note:
Unless specifically stated, any version number mentioned can be substituted with any released version number. For all released versions, please visit: http://doc.klustron.com/zh/Release_notes.html.
Objective:
When a Klustron cluster is deployed in multiple data centers within the same city using the multi-data center model, Klustron's high availability capabilities ensure that if an entire data center fails, the primary node of the Klustron cluster will automatically switch to another data center. This ensures that the Klustron cluster automatically recovers from the failure, with no data loss or corruption, RPO = 0, and RTO < 30 seconds.
01 Background
Klustron's fullsync synchronous replication technology and fullsync HA high availability technology ensure that when the primary node of a storage cluster (storage shard) composed of kunlun-storage nodes fails, Klustron can automatically and promptly detect the failure and automatically elect a new primary node to continue providing data read and write services, without losing or damaging user data. For most application scenarios requiring high reliability and given the available resources, this level of reliability is sufficient.
However, for the highest levels of financial-grade reliability, this is still not enough. If an entire data center (IDC) fails (due to power outages, earthquakes, fires, floods, etc.), and all nodes of a Klustron cluster are deployed in that data center, they will disappear simultaneously. As a result, user data will be lost, and database services will be interrupted.
To achieve IDC-level high availability, the Klustron team has developed data center (IDC) level high availability technology. Starting from version 1.2, Klustron supports data center (IDC) high availability features.
02 Overview of Feature
Klustron IDC disaster recovery technology ensures that if the primary IDC in the main city fails, Klustron will automatically detect the failure and switch the primary node of each Klustron storage shard to the standby primary node in a backup IDC within the same main city. This process is referred to as upgrading the backup IDC to the primary IDC. If all IDCs in the main city fail, the DBA will receive an alert and can manually switch the primary node of each shard to the standby primary node in the backup city's IDC using the Klustron XPanel GUI or by calling the Klustron cluster_mgr API (this scenario may require minimal manual intervention). This process is referred to as upgrading the backup city's IDC to the primary IDC.
Both operations essentially involve switching all primary nodes of a Klustron cluster's storage shards and metadata clusters to the standby primary nodes in another data center (IDC).
03 Two Cities and Three Centers Architecture
The two cities and three centers architecture refers to deploying two data centers (IDCs) in the same city and one data center in a backup city to achieve cross-regional IDC-level high availability and disaster recovery. The topology of each shard in a Klustron cluster is shown in the above diagram. Users can deploy any number of compute nodes (Klustron-server) and clusterManager nodes in each IDC as needed. All compute nodes continuously synchronize user metadata updates from the metadata cluster. The two cities and three centers architecture ensures automatic disaster recovery for any IDC in the same city and manual recovery for the backup city's IDC in case of a disaster in the main city, with Klustron providing tools to assist users in switching.
For more information, please visit: https://doc.klustron.com/zh/Klustron_idc_high_availability_architecture.html
04 Configuring IDC Management
Open a browser on a machine that can access 192.168.56.112 and enter the address: http://192.168.56.112:18080/KunlunXPanel/#/login?redirect=%2Fdashboard
After logging in, the homepage will display as follows:
Next, add an IDC in the IDC Management section.
4.1 Click "IDC Management" and then "Data Center List." On the Data Center List page, click the "Add" button.
4.2 Add a new IDC, specify the IDC name "IDC1," the city "GuangDong/ShenZhen," and then click "Confirm."
4.3 The new data center "IDC1" is now added.
4.4 Repeat steps 4.1 and 4.2 to add data centers "IDC2" and "IDC3." After adding them, check the IDC data center configuration.
Next, bind the IDC in the Computer Management Center.
4.5 Click "Computer Management" and then "Computer List." On the Computer List page, click the "Update IDC" button.
4.6 Select the appropriate computer resources to bind to the IDC1 data center.
4.7 Click "Confirm" and check the completion status of the cluster creation task.
4.8 Use the same method to bind computer resources to data centers IDC2 and IDC3. After binding the IDCs, check the IDC binding status.
05 Creating a Cross-City IDC HA Cluster
5.1 Click "Cluster Management" and then "Cluster List." On the Cluster List page, click the "Add" button.
5.2 Add a new cluster, specify the business name "IDC_Cluster," and select the cluster type "IDC Cluster."
5.3 In the purchase type for the IDC cluster, select "Cross-City Purchase."
5.4 For the primary city IDC in the cross-city IDC cluster, select the city where IDC is located, "ShenZhen," and choose primary IDC1 and backup IDC2. For the backup city IDC, select the city where IDC3 is located, "BeiJing."
5.5 Select the compute nodes for the cross-city IDC cluster.
5.6 Create cluster configuration information, using the default values here.
5.7 Create an overview of the new cluster.
5.8 Click "Confirm" and check the completion status of the cluster creation task.
5.9 After creating and configuring the cross-city IDC cluster, check the operating status of the IDC cluster "IDC_Cluster" as follows:
06 Cross-City IDC HA Cluster Same-City Primary-Backup IDC Switch Test
6.1 Prepare test data by creating a test table in the database and inserting test data.
[kunlun@kunlun1 ~]$ psql -h 192.168.56.112 -p 47001 -U abc postgres
postgres=# create table prod_part (id int primary key, name char(8)) partition by hash(id);
postgres=# create table prod_part_p1 partition of prod_part for values with (modulus 6, remainder 0);
postgres=# create table prod_part_p2 partition of prod_part for values with (modulus 6, remainder 1);
postgres=# create table prod_part_p3 partition of prod_part for values with (modulus 6, remainder 2);
postgres=# create table prod_part_p4 partition of prod_part for values with (modulus 6, remainder 3);
postgres=# create table prod_part_p5 partition of prod_part for values with (modulus 6, remainder 4);
postgres=# create table prod_part_p6 partition of prod_part for values with (modulus 6, remainder 5);
postgres=# insert into prod_part select i,'text'||i from generate_series(1,300) i;
postgres=# analyze prod_part;
6.2 Prepare a Python script pyprod.py
to continuously operate on the database during the IDC HA cluster primary-backup switch. The script is as follows:
import psycopg2.extras
from psycopg2 import DatabaseError
import time
import datetime
conn = psycopg2.connect(database='postgres',user='abc',
password='abc',host='192.168.56.112',port='47001')
select_sql = ''' select * from prod_part where id=%s; '''
i = 1
try:
while (i <= 1000) :
cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
cursor.execute(select_sql,[i])
res = cursor.fetchall()
print(dict(res[0]))
current_datetime = datetime.datetime.now()
print("Current date and time:", current_datetime)
if (i == 1000) :
i = 1
else :
i = i+1
cursor.close()
conn.commit()
time.sleep(1)
except (Exception, DatabaseError) as e:
print(e)
input('Press any key and Enter to continue ~!')
conn = psycopg2.connect(database='postgres', user='abc',
password='abc', host='192.168.56.112', port='47001')
select_sql = ''' select * from prod_part where id=%s; '''
while (i <= 1000):
cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
cursor.execute(select_sql, [i])
res = cursor.fetchall()
print(dict(res[0]))
current_datetime = datetime.datetime.now()
print("Current date and time:", current_datetime)
if (i == 1000):
i = 1
else:
i = i + 1
cursor.close()
conn.commit()
time.sleep(1)
finally:
conn.close()
6.3 Run the pyprod.py
script to continuously operate on the database.
[kunlun@kunlun1 ~]$ python pyprod.py
6.4 Check the current primary and backup IDC of the cross-city IDC HA cluster. View the "Cluster List Information"; the primary node is "192.168.56.113," the same-city (IDC2) backup node is "192.168.56.114," and the cross-city (IDC3) backup node is "192.168.56.112." Then click the "Settings" button.
6.5 In the "Cluster Settings" interface, click the "IDC Switch" button to switch the primary and backup IDC.
6.6 In the IDC switch interface, select the same-city IDC name "IDC2."
6.7 Click "Confirm" and check the completion of the IDC switch task.
6.8 The primary-backup IDC switch is completed. View the "Cluster List Information"; the primary node is now "192.168.56.114," the same-city (IDC1) backup node is "192.168.56.113," and the cross-city (IDC3) backup node is "192.168.56.112."
6.9 After the primary-backup IDC switch, the application continues to access and operate the data table, continuously operating on the database.
07 Cross-City IDC HA Cluster Cross-City Primary-Backup IDC Switch Test
7.1 Check the current primary and backup IDC of the IDC HA cluster. View the "Cluster List Information"; the primary node (IDC1) is "192.168.56.113," the same-city backup node (IDC2) is "192.168.56.114," and the cross-city backup node (IDC3) is "192.168.56.112."
7.2 Before performing the cross-city IDC switch, run the pyprod.py script to continuously operate on the database.
[kunlun@kunlun1 ~]$ python pyprod.py
7.3 In the "Cluster Settings" interface, click the "IDC Switch" button to switch the primary and backup IDC.
7.4 In the IDC switch interface, select the cross-city IDC name "IDC3."
7.5 Click "Confirm" and check the completion of the cross-city IDC switch task.
7.6 The cross-city IDC switch is completed. View the "Cluster List Information"; the primary node is now "192.168.56.112," and the backup nodes are "192.168.56.113" and "192.168.56.114."
7.7 After the cross-city IDC switch, the application continues to access and operate the data table, continuously operating on the database.
08 Cross-City IDC HA Cluster Forced Cross-City Switch Test
8.1 Check the current primary and backup IDC of the IDC HA cluster. View the "Cluster List Information"; the primary node (IDC1) is "192.168.56.113," the same-city backup node (IDC2) is "192.168.56.114," and the cross-city backup node (IDC3) is "192.168.56.112."
8.2 Before performing the cross-city IDC switch, run the pyprod.py script to continuously operate on the database.
[kunlun@kunlun1 ~]$ python pyprod.py
8.3 In the "Cluster Settings" interface, click the "Force Cross-City Switch" button to perform the cross-city primary-backup switch.
8.4 In the forced cross-city switch interface, select the cross-city name "BeiJing."
8.5 Click "Confirm" and check the completion of the forced cross-city switch task.
8.6 After the cross-city IDC switch, the application continues to access and operate the data table, continuously operating on the database.
8.7 The forced cross-city switch of the IDC HA cluster is completed. View the "Cluster List Information"; the primary node is now "192.168.56.112," and the backup nodes are "192.168.56.113" and "192.168.56.114."
This completes the creation and configuration of the cross-city IDC HA cluster for two cities and three centers, the same-city primary-backup IDC switch test, the cross-city primary-backup IDC switch test, and the forced cross-city switch test.