Klustron Cluster Logical Backup and Restore Operations
Klustron Cluster Logical Backup and Restore Operations
Note:
Unless otherwise specified, version numbers in this document can be replaced with the version numbers of any released version. For a list of all released versions, please refer to the Release Notes.
Document Objectives:
This document assumes that the user is already familiar with creating clusters in the Klustron database and knows how to perform routine operations on Klustron database cluster instances through the web interface provided by XPanel service.
In the content of this document, we will demonstrate how to perform logical backups of tables, schemas, and databases within a cluster instance, and then restore them to another cluster instance. We will also show how to achieve the same backup and restore functionality through APIs, with detailed step-by-step instructions.
Furthermore, logical backups can be performed online. This document will also demonstrate a scenario where continuous data is added to a table in an online application while backup operations are performed. We will validate this capability by restoring the data to another cluster instance.
Environment Details:
In this document, the XPanel service is installed on a server with IP address 192.168.0.152. Open a web browser and enter the following address: http://192.168.0.152:40180/KunlunXPanel/#/login?redirect=%2Fdashboard (The initial login username and password are: super_dba/super_dba, and you need to change the super_dba password on first login).
After logging in, check the "Cluster List." Here, two clusters have been prepared in advance:
Cluster name: "cluster1." The IP address of the computing node in this cluster is 192.168.0.153, and the service port is 47001. It has 1 storage shard.
Cluster name: "cluster2." The IP address of the computing node in this cluster is 192.168.0.155, and the service port is 47001. It also has 1 storage shard.
Before performing logical backups, you need to configure the backup storage target in XPanel. Prior to this, you should prepare the HDFS service. For the specific configuration process of the HDFS service, please refer to "Klustron HDFS Backup Storage Configuration." In the environment referred to in this document, the HDFS service is configured on node 192.168.0.152. Later, you can add it to the system through "Backup Storage Target Management" in XPanel, as shown in the following interface:
To demonstrate table-level backup functionality mentioned in this document, we have prepared some test data tables (testtable1, testtable2, testtable3) in the cluster cluster1, as shown below:
The detailed data of the test tables are as follows:
Note: Below is the process and script for preparing the test data:
SSH into 192.168.0.153 as the kunlun user.
Modify /kunlun/env.sh and change envtype="${envtype:-no}" to envtype="all". Save and exit.
Source /kunlun/env.sh.
Connect to the PostgreSQL database using the following command:
psql -h 192.168.0.153 -p 47001 -U abc postgres
- Create the test tables and insert data using the SQL commands provided.
create table testtable1 (id int primary key);
insert into testtable1 select generate_series(1,10);
create table testtable2 (id int primary key);
insert into testtable2 select generate_series(11,20);
create table testtable3 (id int primary key);
insert into testtable3 select generate_series(21,30);
1 Backup Tables Using XPanel and API
1.1 Click the "Settings" button on the right side of cluster1.
1.2 Select "Logical Backup" from the left menu.
1.3 In the backup type, select "table." Choose the table you want to perform logical backup on in the "Backup Table" section. In this example, select "testtable1." Then, select the "Backup Time Range" from 17:00 to 18:00 (Note: You can choose the time after creating the test data table. This time range means the system will automatically perform logical backups of testtable1 daily, and the backup execution time will be determined by the system and will be completed between 17:00 and 18:00).
If you want to perform backup operations on multiple tables in one backup operation, click the "+" icon on the right and select the additional tables you want to backup and define backup time slots for them.
1.4 Click "Save" to start the backup operation.
1.5 After a successful backup, you will receive the following notification:
1.6 For successfully completed logical backup tasks, the backup data for the relevant tables will be stored in the HDFS backup target previously configured. To view the backups, log in to the HDFS backup target IP mentioned in this document, 192.168.0.152, and execute the following command:
hdfs dfs -ls /kunlun/logicalbackup/cluster_1684593444_000002
Subsequent logical restore tasks for tables will select appropriate backup sets based on the parameters specified in the restore operation.
1.7 Perform table backup for single or multiple tables using the API:
kunlun@kunlun1:~$ curl -d '
{
"version":"1.0",
"job_id":"",
"job_type":"logical_backup",
"timestamp":"1684678534", # You can obtain this timestamp by running "date +%s" in a Linux command prompt
"user_name":"super_dba",
"paras":{
"cluster_id":"2", # Source cluster's cluster ID, obtained from XPanel console
"backup_type": "table",
"backup":[{
"db_table":"postgres_$$_public.testtable2",
"backup_time":"22:00:00-23:00:00" # Specify when to automatically perform backup on this table
},{
"db_table":"postgres_$$_public.testtable3",
"backup_time":"22:00:00-23:00:00"
}]
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output will look like:
{"attachment":null,"error_code":"0","error_info":"","job_id":"46","status":"accept","version":"1.0"}
In the output, the key information is the error_code. If it's 0, it means the backup task has been accepted by the database. You can then retrieve the task execution result by using job_id as a parameter in the following call:
kunlun@kunlun1:~$ curl -d '
{
"version":"1.0",
"job_id":"46",
"job_type":"get_status",
"timestamp":"1684679054",
"user_name":"super_dba",
"paras":{
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output will look like:
{"attachment":{},"error_code":"0","error_info":"OK","job_id":"46","job_type":"","s
tatus":"done","version":"1.0"}
With error_code 0 and status "done," the multi-table logical backup has been successfully executed. The backup results can be seen on HDFS:
This successfully completes the backup task for testtable1 using the API.
2 Restore Tables Using XPanel and API
2.1 Click the "Settings" button on the right side of cluster1.
2.2 Select "Logical Restore" from the left menu.
2.3 In the restore type, choose "table." Under "Target Table Cluster," select cluster2 (in this document, it's cluster_1684593523_000003). In the "Backup Table" dropdown, choose the previously backed-up table: postgres_$$_public.testtable1(2023-05-21 17:32:30). Set the "Restore Start Time" as 2023-05-21 17:45:00 (which is after the last backup time).
2.4 Click "Save" to initiate the logical restore operation.
2.5 After successful table restoration, you will receive the following confirmation message:
2.6 To confirm that the table data for testtable1 has been successfully restored to the target cluster cluster2, use the kunlun user to SSH into the computing node of that cluster (IP: 192.168.0.155) and perform the following checks:
Confirm that /kunlun/env.sh has envtype="${envtype:-no}" replaced with:
source env.sh
psql -h 192.168.0.155 -p 47001 -U abc postgres
postgres=# \dt
postgres=# select * from testtable1 ;
You can see that testtable1 exists, and its data content and row count are correct. This successfully completes the logical restore operation for the testtable1 table.
2.7 To restore single or multiple tables using the API:
kunlun@kunlun1:~$ curl -d '
{
"version":"1.0",
"job_id":"",
"job_type":"logical_restore",
"timestamp":"1684763090",
"user_name":"super_dba",
"paras":{
"src_cluster_id":"2",
"dst_cluster_id":"3",
"restore_type":"table",
"restore":[{
"db_table":"postgres_$$_public.testtable2",
"restore_time":"2023-05-21 22:30:00"
},{
"db_table":"postgres_$$_public.testtable3",
"restore_time":"2023-05-21 22:30:00"
}]
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output will look like:
{"attachment":null,"error_code":"0","error_info":"","job_id":"55","status":"accept","version":"1.0
"}
In the output, the key information is the error_code. If it's 0, it means the restore task has been accepted by the database. You can then retrieve the task execution result by using job_id as a parameter in the following call:
kunlun@kunlun1:~$ curl -d '
{
"version":"1.0",
"job_id":"55",
"job_type":"get_status",
"timestamp":"1684679705",
"user_name":"super_dba",
"paras":{
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output will look like:
{"attachment":{"done_dts":"postgres_$$_public.testtable2,postgres_$$_public.testtable3,"},"error
_code":"0","error_info":"","job_id":"","job_type":"","status":"done","version":"1.0"}
With error_code 0 and status "done," the multi-table logical restore has been successfully executed. To validate the results of the restore operation, perform the following operations in the target cluster:
Confirm that /kunlun/env.sh has envtype="${envtype:-no}" replaced with:
source env.sh
psql -h 192.168.0.155 -p 47001 -U abc postgres
postgres=# \dt
postgres=#select * from testtable2 ;
postgres=#select * from testtable3 ;
From the output, you can see that testtable2 and testtable3 exist, and their data content and row count are correct. This successfully completes the logical restore operation for tables testtable2 and testtable3.
For schema-level backups, we have prepared test schemas and data tables in cluster1, as follows:
Note: The process and scripts for preparing test data are as follows:
SSH into 192.168.0.153 as the kunlun user.
Modify /kunlun/env.sh to ensure envtype="${envtype:-no}" is replaced with envtype="all," then save and exit.
Source /kunlun/env.sh.
Run the following commands:
psql -h 192.168.0.153 -p 47001 -U abc postgres
show schemas;
create schema test1 ;
create table test1.testtable4 (id int primary key);
insert into test1.testtable4 select generate_series(1,3);
create schema test2;
create table test2.testtable5 (id int primary key);
insert into test2.testtable5 select generate_series(1,3);
create schema test3 ;
create table test3.testtable6 (id int primary key);
insert into test3.testtable6 select generate_series(1,3);
create schema test4;
create table test4.testtable7 (id int primary key);
insert into test4.testtable7 select generate_series(1,3);
3 Backup Schemas Using XPanel and API
3.1 Click the "Settings" button on the right side of cluster1.
3.2 Select "Logical Backup" from the left menu.
3.3 In the backup type, choose "schema." Select the schema names you want to logically backup, in this example, we select two schemas: test1 and test2. Then, in the "Backup Time Range," choose 10:30 – 11:30 (Note: The time selection can be made after creating test data tables in these schemas. This time range means the system will automatically perform daily logical backups of the data tables in test1 and test2 between 10:30 AM and 11:30 AM).
3.4 Click "Save" to initiate the backup operation.
3.5 After successful backup, you will receive the following confirmation message:
3.6 For successfully completed logical backup tasks, the backup data of the respective schemas will be saved in the previously configured HDFS backup target. Log in to the HDFS backup target IP: 192.168.0.152 as follows and execute the command to view the backup:
hdfs dfs -ls /kunlun/logicalbackup/cluster_1684593444_000002
Subsequent schema logical restore tasks will select the appropriate backup sets based on the parameters specified in the restore operation.
3.7 To backup schemas using the API
kunlun@kunlun1:~$ curl -d '
{
"version":"1.0",
"job_id":"",
"job_type":"logical_backup",
"timestamp":"1684811103",
"user_name":"super_dba",
"paras":{
"cluster_id":"2",
"backup_type": "schema",
"backup":[{
"db_table":"postgres_$$_test3",
"backup_time":"10:30:00-11:30:00"
},{
"db_table":"postgres_$$_test4",
"backup_time":"10:30:00-11:30:00"
}]
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output will look like:
{"attachment":null,"error_code":"0","error_info":"","job_id":"62","status":"accept","version":"1.0
"}
In the output, the key information is the error_code. If it is 0, it indicates that everything is normal, and the database has accepted the recovery task. Afterward, we can obtain the task execution results by using job_id 62 as a parameter in the following call:
kunlun@kunlun1:~$ curl -d '
{
"version":"1.0",
"job_id":"62",
"job_type":"get_status",
"timestamp":"1684811103",
"user_name":"super_dba",
"paras":{
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output will look like:
{"attachment":{},"error_code":"0","error_info":"OK","job_id":"62","job_type":"","status":"done"
,"version":"1.0"}
The error_code is 0, and the status is "done," indicating the successful execution of the multi-schema logical backup task. The following figure shows the backup results as viewed in HDFS:
Thus, the API-based backup operation for test3 and test4 with multiple schemas has been successfully completed.
4 Restore Schemas Using XPanel and API
4.1 Click the "Settings" button on the right side of cluster1.
4.2 Select "Logical Restore" from the left menu.
4.3 In the restore type, choose "schema." Under "Target Table Cluster," select cluster2 (in this document, it's cluster_1684593523_000003). In the "Backup Table" dropdown, choose the previously backed-up schemas: postgres__test1(2023-05-23 10:57:15) and postgres__test2(2023-05-23 10:57:15). Set the "Start Time" to 2023-05-23 12:00:00 (which is after the last backup time).
4.4 Click "Save" to initiate the schema logical restore operation.
4.5 After successful schema restoration, you will receive the following confirmation message:
4.6 To confirm that the tables under two schemas have been successfully restored to the target cluster cluster2, use the kunlun user to SSH into the computing node of that cluster (IP: 192.168.0.155) and perform the following checks:
Confirm that /kunlun/env.sh has envtype="${envtype:-no}" replaced with envtype="all".
source env.sh
psql -h 192.168.0.155 -p 47001 -U abc postgres
postgres=# select * from test1.testtable4 ;
postgres=# select * from test2.testtable5 ;
From the output, you can see that the tables in both schemas already exist, and their data content and row count are correct. This successfully completes the logical restore operation for tables under multiple schemas.
4.7 To restore schemas using the API
kunlun@kunlun1:~$ curl -d '
{
"version":"1.0",
"job_id":"",
"job_type":"logical_restore",
"timestamp":"1684815815",
"user_name":"super_dba",
"paras":{
"src_cluster_id":"2",
"dst_cluster_id":"3",
"restore_type":"schema",
"restore":[{
"db_table":"postgres_$$_test3",
"restore_time":"2023-05-23 12:00:00"
},{
"db_table":"postgres_$$_test4",
"restore_time":"2023-05-23 12:00:00"
}]
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output will look like:
{"attachment":null,"error_code":"0","error_info":"","job_id":"66","status":"accept","version":"1.0"}
In the output, the key information is the error_code. If it is 0, it indicates that everything is normal, and the database has accepted the recovery task. Afterward, we can obtain the task execution results by using job_id 66 as a parameter in the following call:
kunlun@kunlun1:~$ curl -d '
{
"version":"1.0",
"job_id":"66",
"job_type":"get_status",
"timestamp":"1684679705",
"user_name":"super_dba",
"paras":{
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output will look like:
{"attachment":{"done_dts":"postgres_$$_test3.testtable6,postgres_$$_test4.testtable7,"},"error_c
ode":"0","error_info":"","job_id":"","job_type":"","status":"done","version":"1.0"}
The error_code is 0, and the status is "done," indicating that the task has been successfully completed. To verify the results of the recovery operation in the target cluster, we perform the following steps:
Confirm that in /kunlun/env.sh, envtype="${envtype:-no}" has been replaced with envtype="all."
source env.sh
psql -h 192.168.0.155 -p 47001 -U abc postgres
postgres=#select * from test3.testtable6 ;
postgres=#select * from test4.testtable7 ;
From the output, it can be seen that the tables under the two schemas exist, and the data content and row count are correct. This signifies the successful completion of the logical recovery operation via API for schemas.
Regarding the database-level backup functionality, we have prepared the corresponding databases and tables in cluster1 as shown below:
Note: The following is the process and script for preparing test data:
SSH into 192.168.0.153 with the username "kunlun."
Modify /kunlun/env.sh to ensure envtype="${envtype:-no}" has been replaced with envtype="all," and save the changes.
Source /kunlun/env.sh.
Execute the following commands:
psql -h 192.168.0.153 -p 47001 -U abc postgres
create database testdb1 with owner abc;
psql -h 192.168.0.153 -p 47001 -U abc testdb1
create table testtable8 (id int primary key);
insert into testtable8 select generate_series(1,3);
select * from testtable8 ;
create database testdb2 with owner abc ;
psql -h 192.168.0.153 -p 47001 -U abc testdb2
create table testtable9 (id int primary key);
insert into testtable9 select generate_series(1,3);
select * from testtable9 ;
5 Backup Databases via XPanel and API
5.1 Click on the "Settings" button on the right of cluster1.
5.2 Click on "Logical Backup" in the left menu bar.
5.3 Select "db" in the backup type. Choose the database to perform logical backup on, in this example, we select "testdb1." Then, in the "Backup Time Range," select 22:00 – 23:00. (Note: You can choose the time after the test data tables have been created in this database. This time range signifies that the system will automatically perform daily logical backups of the data tables in testdb1, and the backup execution period will be determined by the system between 22:00 and 23:00.)
5.4 Click "Save" to initiate the backup operation.
5.5 Upon successful backup, you will see the following message:
5.6 For successfully completed logical backup tasks, the backup data for the respective database will be stored in the previously configured HDFS backup destination. Log in to the specified HDFS backup destination IP: 192.168.0.152 and execute the following command to view the backups:
hdfs dfs -ls /kunlun/logicalbackup/cluster_1684593444_000002
Subsequent logical recovery tasks for the database will choose the appropriate backup set based on the specified parameters in the recovery operation.
5.7 Backup Databases via API
curl -d '
{
"version":"1.0",
"job_id":"",
"job_type":"logical_backup",
"timestamp":"1684851125",
"user_name":"super_dba",
"paras":{
"cluster_id":"2",
"backup_type": "db",
"backup":[{
"db_table":"testdb2",
"backup_time":"22:00:00-23:00:00"
}]
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output is as follows:
{"attachment":null,"error_code":"0","error_info":"","job_id":"71","status":"accept","v
ersion":"1.0"}
In the above output, the key information is error_code. If it is 0, it indicates that the backup task has been accepted by the database. Afterward, we can use job_id 71 as a parameter to obtain the task execution results.
curl -d '
{
"version":"1.0",
"job_id":"71",
"job_type":"get_status",
"timestamp":"1684811103",
"user_name":"super_dba",
"paras":{
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output is as follows:
{"attachment":{},"error_code":"0","error_info":"OK","job_id":"71","job_type":"","status":"done"
,"version":"1.0"}
With error_code 0 and status "done," the database's logical backup task has been successfully executed. Below is the backup result information found in HDFS:
This concludes the successful completion of the API-based backup task for testdb2.
6 Restore Databases via XPanel and API
6.1 Click on the "Settings" button on the right of cluster1.
6.2 Click on "Logical Restore" in the left menu bar.
6.3 Select "db" in the restore type. Choose "cluster2" in the "Target Table Cluster" (in this document, it is cluster_1684593523_000003). In "Backup Tables," select the previously backed up database: testdb1 (2023-05-23 22:08:16). For "Start Time," select 2023-05-23 22:30:00 (which is after the time of the last backup).
6.4 Click "Save" to begin the database's logical restore.
6.5 Upon successful database restoration, you will see the following message:
6.6 To confirm that the tables under testdb1 have been successfully restored to the target cluster cluster2, SSH into the cluster's computing node (IP: 192.168.0.155) as the "kunlun" user and perform the following checks:
Ensure that in /kunlun/env.sh, envtype="${envtype:-no}" has been replaced with envtype="all."
source env.sh
psql -h 192.168.0.155 -p 47001 -U abc testdb1
postgres=# select * from testtable8 ;
From the output, it can be seen that the table "testtable8" in testdb1 exists in the target cluster instance, and the data content and row count are correct. This signifies the successful completion of the logical recovery operation at the database level.
6.7 Restore Databases via API
curl -d '
{
"version":"1.0",
"job_id":"",
"job_type":"logical_restore",
"timestamp":"1684852254",
"user_name":"super_dba",
"paras":{
"src_cluster_id":"2",
"dst_cluster_id":"3",
"restore_type":"db",
"restore":[{
"db_table":"testdb2",
"restore_time":"2023-05-23 22:30:00"
}]
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output is as follows:
{"attachment":null,"error_code":"0","error_info":"","job_id":"73","status":"accept","version":"1.0
"}
In the above output, the key information is error_code. If it is 0, it indicates that the restore task has been accepted by the database. Afterward, we can use job_id 73 as a parameter to obtain the task execution results.
curl -d '
{
"version":"1.0",
"job_id":"73",
"job_type":"get_status",
"timestamp":"1684852254",
"user_name":"super_dba",
"paras":{
}
}
' -X POST http://192.168.0.152:58000/HttpService/Emit
The output is as follows:
{"attachment":{"done_dts":"testdb2_$$_public.testtable9,"},"error_code":"0","error_info":"","job
_id":"","job_type":"","status":"done","version":"1.0"}
With error_code 0 and status "done," the task has been successfully completed. We can perform the following operations in the target cluster to verify that the restore operation results are as expected:
Ensure that in /kunlun/env.sh, envtype="${envtype:-no}" has been replaced with envtype="all."
source env.sh
psql -h 192.168.0.155 -p 47001 -U abc testdb2
postgres=#select * from testtable9 ;
From the output, it can be seen that the tables under testdb2 exist, and the data content and row count are correct. This signifies the successful completion of the API-based logical restore operation at the database level.
To complete the table online backup and recovery functionality as mentioned in this document (Note: schema and db-level backup and recovery support the same capabilities), we have prepared the corresponding data tables in cluster1 as shown below:
Note: The following is the process and script for preparing test data:
SSH into 192.168.0.153 with the username "kunlun."
Modify /kunlun/env.sh to ensure envtype="${envtype:-no}" has been replaced with envtype="all," and save the changes.
Source /kunlun/env.sh.
Execute the following commands:
psql -h 192.168.0.153 -p 47001 -U abc postgres
create table testtable10 (id int primary key);
insert into testtable10 select generate_series(1,1000000);
select max(id) from testtable10;
Note: Through the above operations, there are now one million records in the testtable10 table.
7 Online Backup and Restore of Tables
7.1 Click on the "Settings" button on the right of cluster1.
7.2 Click on "Logical Backup" in the left menu bar.
7.3 Select "table" in the backup type. Choose "postgres/public/testtable10" in the backup table, then select "16:00 – 17:00" in the "Backup Time Range."
7.4 On 192.168.0.153, the "kunlun" user creates a shell script with the following content:
kunlun@kunlun2:~$ more gen_data.sh
#!/bin/bash
# KL Connection Information
KL_USER="abc"
KL_HOST="192.168.0.153"
KL_PORT="47001"
KL_DATABASE="postgres"
KL_TABLE="testtable10"
# Loop to Insert Records
i=1000001
while true
do
# Execute SQL Statement
psql -h $KL_HOST -p $KL_PORT -U $KL_USER $KL_DATABASE -c "INSERT INTO $KL_TABLE (id) VALUES ('$i')"
echo $i
i=`expr $i + 1`
# Wait for 1 second
sleep 1
done
Note: The script's purpose is to add a new record to testtable10 every second, starting from ID 1000001. This simulates continuous operations on the database table by a frontend application.
7.5 Click "Save" to initiate the backup operation.
7.6 Execute the previously created script on 192.168.0.153: sh gen_data.sh to continuously add records to the database table. The script's output will look like this:
7.7 After successful backup, you will see the following message:
7.8 After receiving the successful backup message, forcibly stop the gen_data.sh script and check the data in testtable10. The results are as follows:
Include the relevant commands below:
psql -h 192.168.0.153 -p 47001 -U abc postgres
select max(id) from testtable10;
Note: After initiating the backup task, and continuing until the backup task concluded and afterward, 12 new records were successfully inserted into the testtable10 table through transactions.
7.9 Click on the "Settings" button on the right of cluster1.
7.10 Click on "Logical Restore" in the left menu bar.
7.11 In the restore type, select "table." In "Target Table Cluster," choose "cluster2" (in this document, it is cluster_1684593523_000003). In "Backup Tables," select the previously backed up table: postgres_$$_public.testtable10 (2023-05-27 16:16:35). For "Start Time," select 2023-05-27 16:30:00 (which is after the time of the last backup).
7.12 Click "Save" to begin the table's logical restore.
7.13 After successful table restoration, you will see the following message:
7.14 Log in to cluster2 with the "kunlun" user and use SSH to connect to the cluster's computing node (IP: 192.168.0.155) to check the table's restoration:
Ensure that in /kunlun/env.sh, envtype="${envtype:-no}" has been replaced with envtype="all".
source env.sh
psql -h 192.168.0.155 -p 47001 -U abc postgres
postgres=# select max(id) from testtable10;
From the output, it can be seen that there are 10 million records in the "testtable10" table in the target cluster instance. Even though new data was generated continuously after the backup task was initiated, the backup task completed at the specified time without being affected by new data transactions, ensuring a consistent backup.