Klustron Cluster Elastic Scaling
Klustron Cluster Elastic Scaling
Note:
Unless specifically stated, any version number mentioned can be substituted with any released version number. For all released versions, please visit: Release_notes
Objective:
The primary focus is on how to establish a cluster in the Klustron database, create partitioned tables within the cluster database, add storage shard nodes to the cluster, expand certain partitions of the partitioned table to the newly added shard nodes, and finally, view the distribution of all partitions in the database after expansion. Additionally, the document covers expanding compute nodes within the cluster. All tests are conducted through the XPanel console and by connecting to the cluster via the PostgreSQL client.
This article uses servers that have already been initialized (bootstrapped). If you need to use a brand-new server that has not been initialized for the Klustron cluster, you should refer to the bootstrap manual.
The IP address for the machine where XPanel service is installed is 192.168.56.112. On any machine with access to 192.168.56.112, open a web browser and enter the following URL: http://192.168.56.112:18080/KunlunXPanel/#/login?redirect=%2Fdashboard
For first-time users, the credentials are: Username: super_dba Password: super_dba. It's essential to change the super_dba password upon the first login.
After logging in, the homepage appears as:
Setting Up the Cluster
1.1 Navigate to “Cluster Management” and select “Cluster List”. In the subsequent screen, click the “Add New” button.
1.2 Enter the cluster parameters as illustrated below:
1.3 Click “Confirm” and monitor the progress of the cluster creation task.
1.4 Once all statuses indicate “Running”, the cluster has been successfully created.
02 Preparing Test Data
2.1 Open an SSH terminal window connected to the kunlundb_cluster (Compute node IP: 192.168.56.112). Login to the host as the 'kunlun' user and then connect to the cluster database using the PostgreSQL client.
[root@kunlun1 ~]# su - kunlun
[kunlun@kunlun1 ~]$ source /kunlun/env.sh
[kunlun@kunlun1 ~]$ psql -h 192.168.56.112 -p 47001 -U abc postgres
2.2 Within the database, create a test table and insert some sample data.
postgres=# create table prod_part (id int primary key, name char(8)) partition by hash(id);
postgres=# create table prod_part_p1 partition of prod_part for values with (modulus 6, remainder 0);
postgres=# create table prod_part_p2 partition of prod_part for values with (modulus 6, remainder 1);
postgres=# create table prod_part_p3 partition of prod_part for values with (modulus 6, remainder 2);
postgres=# create table prod_part_p4 partition of prod_part for values with (modulus 6, remainder 3);
postgres=# create table prod_part_p5 partition of prod_part for values with (modulus 6, remainder 4);
postgres=# create table prod_part_p6 partition of prod_part for values with (modulus 6, remainder 5);
postgres=# insert into prod_part select i,'text'||i from generate_series(1,300) i;
2.3 Observe the data distribution.
postgres=# analyze prod_part;
postgres=# select relname table_name ,reltuples num_rows, name shard_name from pg_class t1,pg_shard t2 where t1.relshardid = t2.id and t1.reltype<>0 and t1.relname like 'prod_part%';
From the above steps, you can determine that all partitions of the partitioned table prod_part, namely prod_part_p1, prod_part_p2, prod_part_p3, prod_part_p4, prod_part_p5, and prod_part_p6, are all stored on the shard_1 storage node.
03 Cluster Expansion with Additional Storage Shard Nodes
3.1 Add Storage Shard Nodes
3.1.1 In the cluster list view, click on the “Setting” or “+” button for the cluster you wish to add storage shard nodes to.
3.1.2 Under “Cluster Setting”, click on “Shard List” and then select the “Add Shard” button.
3.1.3 Enter the shard node parameters as depicted below.
3.1.4 Click “Confirm” and monitor the progress of the shard node addition task.
3.1.5 Verify that the storage shard node, shard_2, has been successfully added to the cluster.
3.2 Expand Storage Shard Node
This section simulates a scenario where tables are still being written to while they are being relocated during a scaling operation. It demonstrates that Klustron can accurately migrate these tables and ensure that any data updates made during the migration are also synchronized to the tables in the destination shard. Users do not need to perform the actions described in this section when actually expanding their setup.
3.2.1 Prepare a Python script, pyprod.py. While the cluster is expanding the storage shard node, this script will continuously operate on the cluster database. The content of the pyprod.py script is as follows:
import psycopg2.extras
from psycopg2 import DatabaseError
import time
import datetime
conn = psycopg2.connect(database='postgres',user='abc',
password='abc',host='192.168.56.112',port='47001')
select_sql = ''' select * from prod_part where id=%s; '''
i = 1
try:
while (i <= 1000) :
cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
cursor.execute(select_sql,[i])
res = cursor.fetchall()
print(dict(res[0]))
current_datetime = datetime.datetime.now()
print("Current date and time:", current_datetime)
if (i == 1000) :
i = 1
else :
i = i+1
cursor.close()
conn.commit()
time.sleep(1)
except (Exception, DatabaseError) as e:
print(e)
input('Press any key and Enter to continue ~!')
conn = psycopg2.connect(database='postgres', user='abc',
password='abc', host='192.168.56.112', port='47001')
select_sql = ''' select * from prod_part where id=%s; '''
while (i <= 1000):
cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
cursor.execute(select_sql, [i])
res = cursor.fetchall()
print(dict(res[0]))
current_datetime = datetime.datetime.now()
print("Current date and time:", current_datetime)
if (i == 1000):
i = 1
else:
i = i + 1
cursor.close()
conn.commit()
time.sleep(1)
finally:
conn.close()
3.2.2 Execute the pyprod.py script, continually operating on the database.
[kunlun@kunlun1 scripts]$ python pyprod.py
3.2.3 In the cluster list panel, for the cluster that needs partition transfer between shards, click the "Expand" button.
3.2.4 In the cluster expansion, choose the database that needs to be expanded from the "Select Database" dropdown list, then click the "Confirm" button. For instance, in this example, choose the "postgres" database.
3.2.5 Select the tables to be moved from shard_1 to shard_2. Here, the partitions prod_part_p4, prod_part_p5, and prod_part_p6 are chosen. Choose "No" for retaining the original table, select "shard_2" for the target shard, and then click "Submit."
3.2.6 On the cluster expansion confirmation interface, click the "Confirm" button to start the cluster expansion.
3.2.7 Expansion is successful.
3.2.8 During the expansion process, if a table being moved is one that the application is operating on, an error that the table does not exist will be reported. There will be a brief period of inaccessibility. Once the movement is complete, the application can continue accessing and operating on these tables.
3.3 Data Verification after Expansion
3.3.1 Open an SSH terminal window connected to the cluster kunlundb_cluster (Compute Node IP: 192.168.56.112). Log in as the kunlun user, and then connect to the cluster using the PostgreSQL client and execute the data query instruction.
[root@kunlun1 ~]# su - kunlun
[kunlun@kunlun1 ~]$ source /kunlun/env.sh
[kunlun@kunlun1 ~]$ psql -h 192.168.56.112 -p 47001 -U abc postgres
3.3.2 Observe the data distribution.
postgres=# analyze prod_part;
postgres=# select relname table_name ,reltuples num_rows, name shard_name from pg_class t1,pg_shard t2 where t1.relshardid = t2.id and t1.reltype<>0 and t1.relname like 'prod_part%';
From the above, we can see that in the partition table prod_part, partitions prod_part_p1, prod_part_p2, and prod_part_p3 are stored on storage node shard_1, while partitions prod_part_p4, prod_part_p5, and prod_part_p6 have been expanded and stored on storage node shard_2.
The expansion of the cluster storage shard node is now complete.
04 Cluster Expansion with Additional Compute Nodes
4.1 In the cluster list, for the cluster that needs to add compute nodes, click the "Setting" button.
4.2 In "Cluster Settings", click on the "Compute Node List", and then click the "Add Compute Node" button.
4.3 Set the parameters for the new compute node as shown in the figure.
4.4 Click "Confirm" and check the progress of the task to add the compute node.
4.5 Review the compute node list; the new compute node has been added to the cluster.
4.6 Open an SSH terminal window connecting to the cluster kunlundb_cluster (Compute Node IP: 192.168.56.113). Log in as the kunlun user, and then connect to the cluster database using the PostgreSQL client (accessing the database through the newly added compute node, IP: 192.168.56.113).
[root@kunlun2 ~]# su - kunlun
[kunlun@kunlun2 ~]$ source /kunlun/env.sh
[kunlun@kunlun2 ~]$ psql -h 192.168.56.113 -p 47001 -U abc postgres
postgres=# select relname table_name ,reltuples num_rows, name shard_name from pg_class t1,pg_shard t2 where t1.relshardid = t2.id and t1.reltype<>0 and t1.relname like 'prod_part%';
The expansion of the cluster compute node is now complete.