Klustron表重分布使用示例
Klustron表重分布使用示例
注意:
如无特别说明,文中的版本号可以使用任何已发布版本的版本号代替。所有已发布版本详见:Release_notes
本文目标:
通过命令行工具和XPanel演示Klustron 表重分布的功能,指导客户如何将普通表转换为分区表。模拟了一个在线应用的场景下进行表重分布的案例,测试了表重分布对在线应用的影响以及验证了数据的完整性。
在现实的业务场景中,常常会出现随着业务需求和表的数据量变化,原先创建的表的方式可能不再适合了。例如本文例子的sales_order表,在创建的初期由于业务量不大,创建的是非分区表,但是随着公司业务的发展,表中记录的数量越来越多,因此需要将其转换成分区表,方便后期的数据归档和加快应用的访问速度。
具体环境信息如下:
节点类型 | IP | 端口 |
---|---|---|
计算节点 | 192.168.40.152 | 47002 |
Shard1主节点 | 192.168.40.152 | 57003 |
Shard2主节点 | 192.168.26.153 | 57005 |
XPanel | 192.168.40.151 | 18080 |
01 环境准备
通过PG客户端登录计算节点,创建用户和数据库。
psql -h 192.168.40.152 -p 47001 postgres
create user kunlun_test with password 'kunlun';
create database test_db with owner kunlun_test encoding utf8 template template0;
\q
psql -h 192.168.40.152 -p 47001 -U kunlun_test test_db
创建sales_order表,并通过存储过程装载数据
create table sales_order
(
order_number INT NOT NULL AUTO_INCREMENT,
customer_number INT NOT NULL,
product_code INT NOT NULL,
order_date DATETIME NOT NULL,
entry_date DATETIME NOT NULL,
order_amount DECIMAL(18,2) NOT NULL,
primary key(order_number,order_date)
) ;
create or replace procedure generate_order_data()
AS $$
DECLARE
v_customer_number integer;
v_product_code integer;
v_order_date date;
v_amount integer;
start_date date := to_date('2021-01-01','yyyy-mm-dd');
i integer :=1;
BEGIN
while i<=10000 loop
v_customer_number := FLOOR(1+RANDOM()*6);
v_product_code := FLOOR(1+RANDOM()*500);
v_order_date := to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT);
v_amount := FLOOR(1000+RANDOM()*9000);
INSERT INTO sales_order VALUES (i,v_customer_number,v_product_code,v_order_date,v_order_date,v_amount);
commit;
i := i+1;
end loop;
END; $$
LANGUAGE plpgsql;
set statement_timeout=0;
call generate_order_data();
test_db=> select count(*) from sales_order;
count
-------
10000
(1 row)
创建目标表sales_order_new, 根据订单时间的Range分区表。
create table sales_order_new
(
order_number INT NOT NULL AUTO_INCREMENT,
customer_number INT NOT NULL,
product_code INT NOT NULL,
order_date DATETIME NOT NULL,
entry_date DATETIME NOT NULL,
order_amount DECIMAL(18,2) NOT NULL,
primary key(order_number,order_date)
) partition by range(order_date);
create table sales_order_p0 partition of sales_order_new
for values from ('2021-01-01') to ('2021-05-01');
create table sales_order_p1 partition of sales_order_new
for values from ('2021-05-01') to ('2021-09-01');
create table sales_order_p2 partition of sales_order_new
for values from ('2021-09-01') to ('2022-01-01');
02 准备一个Python程序kinsert.py,模拟一直运行在sales_order表上的插入应用。
import psycopg2.extras
import time
conn = psycopg2.connect(database='test_db',user='kunlun_test',
password='kunlun',host='192.168.40.152',port='47001')
cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
# insert sales_order table
insert_sql = '''INSERT INTO sales_order VALUES (%s,
FLOOR(1+RANDOM()*6), FLOOR(1+RANDOM()*3),
to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT),
to_date('2021-01-01','yyyy-mm-dd') + CAST(FLOOR(RANDOM()*365) AS INT),
FLOOR(1000+RANDOM()*9000));
;'''
#print("Press Enter to continue.")
input('Press any key and Enter to continue ~!')
i = 10001
while (i < 20001) :
cursor.execute(insert_sql,[i])
conn.commit()
print("order_number:",i," is inserted.")
i = i+1
cursor.close()
conn.close()
应用模拟从order_number 10001开始插入,程序运行后会等待输入才能执行真正的插入逻辑。在XPanel界面开始重新分布任务后,立刻恢复程序的运行,
[klbase@server-0 ~]$ python kinsert.py
Press any key and Enter to continue ~!
03 在XPanel界面进行表重分布的操作
在“集群列表信息”页面,点击“设置”
在左边栏位点击“表重分布”
选择“目标表集群”,“源表”,“目标表”以及删除源表的策略。如果策略选择“自动”,则会默认保留源表7天;如果选择“手动”,则用户自行删除源表。
点击“提交”。与此同时,切换到运行Python程序kinsert.py的终端,输入Enter继续插入记录。
[klbase@server-0 ~]$ python kinsert.py
Press any key and Enter to continue ~!
order_number: 10001 is inserted.
order_number: 10002 is inserted.
order_number: 10003 is inserted.
order_number: 10004 is inserted.
order_number: 10005 is inserted.
order_number: 10006 is inserted.
order_number: 10007 is inserted.
order_number: 10008 is inserted.
order_number: 10009 is inserted.
order_number: 10010 is inserted.
order_number: 10011 is inserted.
order_number: 10012 is inserted.
order_number: 10013 is inserted.
order_number: 10014 is inserted.
order_number: 10015 is inserted.
order_number: 10016 is inserted.
稍等片刻会出现插入错误,此时源表sales_order被重命名了。
order_number: 10281 is inserted.
order_number: 10282 is inserted.
order_number: 10283 is inserted.
order_number: 10284 is inserted.
order_number: 10285 is inserted.
order_number: 10286 is inserted.
order_number: 10287 is inserted.
order_number: 10288 is inserted.
order_number: 10289 is inserted.
order_number: 10290 is inserted.
order_number: 10291 is inserted.
order_number: 10292 is inserted.
order_number: 10293 is inserted.
order_number: 10294 is inserted.
order_number: 10295 is inserted.
order_number: 10296 is inserted.
order_number: 10297 is inserted.
Traceback (most recent call last):
File "kinsert.py", line 22, in <module>
cursor.execute(insert_sql,[i])
File "/usr/local/lib64/python3.6/site-packages/psycopg2/extras.py", line 236, in execute
return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "sales_order" does not exist
LINE 1: INSERT INTO sales_order VALUES (10298,
此时在线应用插入的order_number为10297,当表重分布执行的时侯sales_order会被重命名,所以应用此时插入失败了。
03 检查目标表和源表的情况
回到PG客户端检查目标表和源表的情况。发现源表已经变成了分区表,目标表根据“删除源表”策略修改为__sales_order$$tb_repartition_13
命名规则为:__[源表名]$$tb_repartition_[任务号]。等到7天之后,系统会自动删除表__sales_order$$tb_repartition_13。
查询sales_order中的数据,数据量正好和应用错误之前插入点是吻合的,在开始进行表重新分布的之后,在线应用仍然插入了297行记录。
test_db=> select count(*) from sales_order;
count
-------
10297
(1 row)