跳至主要內容

Klustron表重分布使用示例

Klustron大约 4 分钟

Klustron表重分布使用示例

注意

如无特别说明,文中的版本号可以使用任何已发布版本的版本号代替。所有已发布版本详见:Release_notes

本文目标:

通过命令行工具和XPanel演示Klustron 表重分布的功能,指导客户如何将普通表转换为分区表。模拟了一个在线应用的场景下进行表重分布的案例,测试了表重分布对在线应用的影响以及验证了数据的完整性。

在现实的业务场景中,常常会出现随着业务需求和表的数据量变化,原先创建的表的方式可能不再适合了。例如本文例子的sales_order表,在创建的初期由于业务量不大,创建的是非分区表,但是随着公司业务的发展,表中记录的数量越来越多,因此需要将其转换成分区表,方便后期的数据归档和加快应用的访问速度。

具体环境信息如下:

节点类型IP端口
计算节点192.168.40.15247002
Shard1主节点192.168.40.15257003
Shard2主节点192.168.26.15357005
XPanel192.168.40.15118080

01 环境准备

通过PG客户端登录计算节点,创建用户和数据库。

psql -h 192.168.40.152 -p 47001 postgres
create user kunlun_test with password 'kunlun';
create database test_db with owner kunlun_test encoding utf8 template template0;
\q
psql -h 192.168.40.152 -p 47001 -U kunlun_test test_db

创建sales_order表,并通过存储过程装载数据

create table sales_order
(
   order_number         INT NOT NULL AUTO_INCREMENT,
   customer_number      INT NOT NULL,
   product_code         INT NOT NULL,
   order_date           DATETIME NOT NULL,
   entry_date           DATETIME NOT NULL,
   order_amount         DECIMAL(18,2) NOT NULL,
   primary key(order_number,order_date)
)  ;

create or replace procedure generate_order_data()
AS $$
DECLARE
  v_customer_number integer;
  v_product_code integer;
  v_order_date date;
  v_amount integer;
  start_date date := to_date('2021-01-01','yyyy-mm-dd');
  i integer :=1;
BEGIN
	while i<=10000 loop
		v_customer_number := FLOOR(1+RANDOM()*6);
		v_product_code := FLOOR(1+RANDOM()*500);
		v_order_date := to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT);
		v_amount := FLOOR(1000+RANDOM()*9000);
		INSERT INTO sales_order VALUES (i,v_customer_number,v_product_code,v_order_date,v_order_date,v_amount);
		commit;
		i := i+1;
	end loop;
END; $$
LANGUAGE plpgsql;
set statement_timeout=0;
call generate_order_data();
test_db=> select count(*) from sales_order;
 count 
-------
 10000
(1 row)

创建目标表sales_order_new, 根据订单时间的Range分区表。

create table sales_order_new
(
   order_number         INT NOT NULL AUTO_INCREMENT,
   customer_number      INT NOT NULL,
   product_code         INT NOT NULL,
   order_date           DATETIME NOT NULL,
   entry_date           DATETIME NOT NULL,
   order_amount         DECIMAL(18,2) NOT NULL,
   primary key(order_number,order_date)
) partition by range(order_date);
create table sales_order_p0 partition of sales_order_new
for values from ('2021-01-01') to ('2021-05-01');
create table sales_order_p1 partition of sales_order_new
for values from ('2021-05-01') to ('2021-09-01');
create table sales_order_p2 partition of sales_order_new
for values from ('2021-09-01') to ('2022-01-01');

02 准备一个Python程序kinsert.py,模拟一直运行在sales_order表上的插入应用。

import psycopg2.extras
import time

conn = psycopg2.connect(database='test_db',user='kunlun_test',
                 password='kunlun',host='192.168.40.152',port='47001')


cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)

# insert sales_order table
insert_sql = '''INSERT INTO sales_order VALUES (%s,
FLOOR(1+RANDOM()*6), FLOOR(1+RANDOM()*3),
to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT),
to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT),
FLOOR(1000+RANDOM()*9000));
;'''

#print("Press Enter to continue.")
input('Press any key and Enter to continue ~!')
i = 10001
while (i < 20001) :
  cursor.execute(insert_sql,[i])
  conn.commit()
  print("order_number:",i," is inserted.")
  i = i+1

cursor.close()
conn.close()

应用模拟从order_number 10001开始插入,程序运行后会等待输入才能执行真正的插入逻辑。在XPanel界面开始重新分布任务后,立刻恢复程序的运行,

[klbase@server-0 ~]$ python kinsert.py 
Press any key and Enter to continue ~!

03 在XPanel界面进行表重分布的操作

在“集群列表信息”页面,点击“设置”

在左边栏位点击“表重分布”

选择“目标表集群”,“源表”,“目标表”以及删除源表的策略。如果策略选择“自动”,则会默认保留源表7天;如果选择“手动”,则用户自行删除源表。

点击“提交”。与此同时,切换到运行Python程序kinsert.py的终端,输入Enter继续插入记录。

[klbase@server-0 ~]$ python kinsert.py 
Press any key and Enter to continue ~!
order_number: 10001  is inserted.
order_number: 10002  is inserted.
order_number: 10003  is inserted.
order_number: 10004  is inserted.
order_number: 10005  is inserted.
order_number: 10006  is inserted.
order_number: 10007  is inserted.
order_number: 10008  is inserted.
order_number: 10009  is inserted.
order_number: 10010  is inserted.
order_number: 10011  is inserted.
order_number: 10012  is inserted.
order_number: 10013  is inserted.
order_number: 10014  is inserted.
order_number: 10015  is inserted.
order_number: 10016  is inserted.

稍等片刻会出现插入错误,此时源表sales_order被重命名了。

order_number: 10281  is inserted.
order_number: 10282  is inserted.
order_number: 10283  is inserted.
order_number: 10284  is inserted.
order_number: 10285  is inserted.
order_number: 10286  is inserted.
order_number: 10287  is inserted.
order_number: 10288  is inserted.
order_number: 10289  is inserted.
order_number: 10290  is inserted.
order_number: 10291  is inserted.
order_number: 10292  is inserted.
order_number: 10293  is inserted.
order_number: 10294  is inserted.
order_number: 10295  is inserted.
order_number: 10296  is inserted.
order_number: 10297  is inserted.
Traceback (most recent call last):
  File "kinsert.py", line 22, in <module>
    cursor.execute(insert_sql,[i])
  File "/usr/local/lib64/python3.6/site-packages/psycopg2/extras.py", line 236, in execute
    return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "sales_order" does not exist
LINE 1: INSERT INTO sales_order VALUES (10298,

此时在线应用插入的order_number为10297,当表重分布执行的时侯sales_order会被重命名,所以应用此时插入失败了。

03 检查目标表和源表的情况

回到PG客户端检查目标表和源表的情况。发现源表已经变成了分区表,目标表根据“删除源表”策略修改为__sales_order$$tb_repartition_13

命名规则为:__[源表名]$$tb_repartition_[任务号]。等到7天之后,系统会自动删除表__sales_order$$tb_repartition_13。

查询sales_order中的数据,数据量正好和应用错误之前插入点是吻合的,在开始进行表重新分布的之后,在线应用仍然插入了297行记录。

test_db=> select count(*) from sales_order;
 count 
-------
 10297
(1 row)

END