Klustron vs. Greenplum: TPCH Results
Klustron vs. Greenplum: TPCH Results
Objective of this article:
We aim to conduct performance comparison tests on Klustron and Greenplum respectively using the popular industry tool TPC-H (TPCH), and derive conclusions from the results.
Test Environment One: (Greenplum)
Greenplum version: 6.24.5-rhel8-x86_64
Node Type | IP | Port |
---|---|---|
Master | 192.168.0.20 | 5432 |
Seg1 Node | 192.168.0.21 | 6000 |
Seg2 Node | 192.168.0.22 | 6000 |
Operating Environment: CentOS 8.5 Linux 64-bit, CPU: AMD Ryzen 9 7950X 16-Core Processor, MEM: 128G, Storage: M.2 NVME SSD PCIE 4 Solid State 2T
Test Environment Two: (Klustron)
Node Type | IP | Port |
---|---|---|
Computing Node | 192.168.0.20 | 47001 |
Shard1 Master Node | 192.168.0.21 | 57003 |
Shard2 Master Node | 192.168.0.22 | 57005 |
XPanel | 192.168.0.20 | 40580 |
Operating Environment: CentOS 8.5 Linux 64-bit, CPU: AMD Ryzen 9 7950X 16-Core Processor, MEM: 128G, Storage: M.2 NVME SSD PCIE 4 Solid State 2T
Test Environment Three: (TPC-H)
TPC-H: v3.0.1
Deployment Mode: Single Node
Operating Environment: CentOS 8.5 Linux 64-bit, CPU: AMD Ryzen 9 7950X 16-Core Processor, MEM: 128G, Storage: M.2 NVME SSD PCIE 4 Solid State 2T
IP: 192.168.0.20
Note: As the TPC-H test is conducted in a single user mode to stress the target library and does not need to form testing pressure, in both test scenarios, 192.168.0.20 is used to initiate TPCH related SQL queries.
01 Greenplum Performance Testing
1.1. Test Preparation
1.1.1. Installing Greenplum
Skipped
1.1.2 Adjusting Parameters (Execute on Master Node)
gpconfig -c shared_buffers -v 32GB
gpconfig -c statement_timeout -v 6000000
gpconfig -c lock_timeout -v 1200000
gpconfig -c log_min_duration_statement -v 1200000
gpconfig -c effective_cache_size -v 64GB
gpconfig -c work_mem -v 8MB
gpconfig -c wal_buffers -v 64MB
gpconfig -c autovacuum -v false
gpstop -M fast
gpstart
1.1.3 Creating Test Users and Database
psql -h 192.168.0.20 -p5432 -d gpdw #Greenplum安装完成时创建了gpdw数据库
create user tpch with password 'tpch';
create database tpch with owner tpch ;
alter user tpch with superuser;
1.1.4 Accessing the TPC-H Software Working Directory
cd /home/gpadmin/tpch301/dbgen
1.1.5 Creating TPC-H Related Test Objects
cat << EOF > dss.ddl
-- Sccsid: @(#)dss.ddl 2.1.8.1
CREATE TABLE ORDERS ( O_ORDERKEY INTEGER NOT NULL,
O_CUSTKEY INTEGER NOT NULL,
O_ORDERSTATUS CHAR(1) NOT NULL,
O_TOTALPRICE DECIMAL(15,2) NOT NULL,
O_ORDERDATE DATE NOT NULL,
O_ORDERPRIORITY CHAR(15) NOT NULL,
O_CLERK CHAR(15) NOT NULL,
O_SHIPPRIORITY INTEGER NOT NULL,
O_COMMENT VARCHAR(79) NOT NULL)
DISTRIBUTED BY (O_ORDERKEY);
CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
PS_SUPPKEY INTEGER NOT NULL,
PS_AVAILQTY INTEGER NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
PS_COMMENT VARCHAR(199) NOT NULL)
DISTRIBUTED BY (PS_PARTKEY,PS_SUPPKEY);
CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
C_NAME VARCHAR(25) NOT NULL,
C_ADDRESS VARCHAR(40) NOT NULL,
C_NATIONKEY INTEGER NOT NULL,
C_PHONE CHAR(15) NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT CHAR(10) NOT NULL,
C_COMMENT VARCHAR(117) NOT NULL)
DISTRIBUTED BY (C_CUSTKEY);
CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
P_NAME VARCHAR(55) NOT NULL,
P_MFGR CHAR(25) NOT NULL,
P_BRAND CHAR(10) NOT NULL,
P_TYPE VARCHAR(25) NOT NULL,
P_SIZE INTEGER NOT NULL,
P_CONTAINER CHAR(10) NOT NULL,
P_RETAILPRICE DECIMAL(15,2) NOT NULL,
P_COMMENT VARCHAR(23) NOT NULL)
DISTRIBUTED BY (P_PARTKEY);
CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
S_NAME CHAR(25) NOT NULL,
S_ADDRESS VARCHAR(40) NOT NULL,
S_NATIONKEY INTEGER NOT NULL,
S_PHONE CHAR(15) NOT NULL,
S_ACCTBAL DECIMAL(15,2) NOT NULL,
S_COMMENT VARCHAR(101) NOT NULL)
DISTRIBUTED BY (S_SUPPKEY);
CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
N_NAME CHAR(25) NOT NULL,
N_REGIONKEY INTEGER NOT NULL,
N_COMMENT VARCHAR(152))
DISTRIBUTED BY (N_NATIONKEY);
CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
R_NAME CHAR(25) NOT NULL,
R_COMMENT VARCHAR(152))
DISTRIBUTED BY (R_REGIONKEY);
CREATE TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY DECIMAL(15,2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
L_DISCOUNT DECIMAL(15,2) NOT NULL,
L_TAX DECIMAL(15,2) NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL)
DISTRIBUTED BY (L_LINENUMBER, L_ORDERKEY);
EOF
psql -h 192.168.0.20 -p5432 -Utpch -d tpch -f ./dss.ddl
cat << EOF > tpch-index
-- indexes on the foreign keys
CREATE INDEX IDX_SUPPLIER_NATION_KEY ON SUPPLIER (S_NATIONKEY);
CREATE INDEX IDX_PARTSUPP_PARTKEY ON PARTSUPP (PS_PARTKEY);
CREATE INDEX IDX_PARTSUPP_SUPPKEY ON PARTSUPP (PS_SUPPKEY);
CREATE INDEX IDX_CUSTOMER_NATIONKEY ON CUSTOMER (C_NATIONKEY);
CREATE INDEX IDX_ORDERS_CUSTKEY ON ORDERS (O_CUSTKEY);
CREATE INDEX IDX_LINEITEM_ORDERKEY ON LINEITEM (L_ORDERKEY);
CREATE INDEX IDX_LINEITEM_PART_SUPP ON LINEITEM (L_PARTKEY,L_SUPPKEY);
CREATE INDEX IDX_NATION_REGIONKEY ON NATION (N_REGIONKEY);
-- aditional indexes
CREATE INDEX IDX_LINEITEM_SHIPDATE ON LINEITEM (L_SHIPDATE, L_DISCOUNT, L_QUANTITY);
CREATE INDEX IDX_ORDERS_ORDERDATE ON ORDERS (O_ORDERDATE);
EOF
psql -h 192.168.0.20 -p5432 -Utpch -d tpch -f ./tpch-index
cat << EOF > tpch-pkeys
ALTER TABLE PART ADD PRIMARY KEY (P_PARTKEY);
ALTER TABLE SUPPLIER ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE PARTSUPP ADD PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY);
ALTER TABLE CUSTOMER ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE ORDERS ADD PRIMARY KEY (O_ORDERKEY);
ALTER TABLE LINEITEM ADD PRIMARY KEY (L_ORDERKEY, L_LINENUMBER);
ALTER TABLE NATION ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE REGION ADD PRIMARY KEY (R_REGIONKEY);
EOF
psql -h 192.168.0.20 -p5432 -Utpch -d tpch -f ./tpch-pkeys
1.1.6 Configuring Parameters and Compiling the Test Tool
cp makefile.suite makefile
vi makefile
--------------------------------
CC = gcc
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TPCH
make
Note: Ensure that gcc related packages are installed
1.1.7 Generating and Preprocessing Test Data
./dbgen -vf -s 1 # Generate 1GB test data
vi trim-data.sh
----------------------------------
for i in `ls *.tbl`
do
sed 's/|$//' $i > ${i}.csv;
done
sh trim-data.sh
split -l 100000 -d lineitem.tbl.csv lineitem.tbl.csv-
1.1.8 Loading Test Data
psql -h 192.168.0.20 -p5432 -Utpch -d tpch
COPY customer(C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/customer.tbl.csv' delimiter '|' ;
COPY nation(N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/nation.tbl.csv' delimiter '|' ;
COPY orders(O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/orders.tbl.csv' delimiter '|' ;
COPY partsupp(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_SUPPLYCOST,PS_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/partsupp.tbl.csv' delimiter '|' ;
COPY part(P_PARTKEY,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_SIZE,P_CONTAINER,P_RETAILPRICE,P_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/part.tbl.csv' delimiter '|' ;
COPY region(R_REGIONKEY,R_NAME,R_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/region.tbl.csv' delimiter '|' ;
COPY supplier(S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_ACCTBAL,S_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/supplier.tbl.csv' delimiter '|' ;
vi load_lineitem.sh
-----------------------------------
for i in `ls lineitem.tbl.csv-*`
do
PGPASSWORD=tpch psql -h 192.168.0.20 -p 5432 -U tpch -d tpch -c "\copy lineitem from '/home/gpadmin/tpch301/dbgen/${i}' with delimiter as '|' NULL ' ';" ;
done
sh load_lineitem.sh
Note: Here you need to upload the dss.tgz file to the dbgen directory and unzip it
1.1.9 Explaining Test Data
The dbgen command generates 1GB of test data, and after loading the data into the Greenplum database, the number of data records in the 8 tables is as follows:
Customer:150000
Orders: 1500000
Partsupp: 800000
Part: 200000
Region: 5
Supplier: 10000
Nation: 25
Lineitem: 6001215
1.1.10 Generating TPC-H Test SQL
mkdir query
more gen-sql.sh
---------------------------
for q in `seq 1 22`
do
DSS_QUERY=dss/templates ./qgen ${q} >> query/${q}.sql
sed 's/^select/explain select/' query/${q}.sql > query/${q}.explain.sql
cat query/${q}.sql >> query/${q}.explain.sql;
done
sh gen-sql.sh
1.1.11 Analyzing the Loaded Objects
ana.sh
----------------------------
for i in `cat ./dss.ddl | grep 'CREATE TABLE' | grep -v partition | awk '{print $3}'`
do
echo "select * from $i limit 1;" > a.sql
clomun=`psql -h 192.168.0.20 -p 5432 -U tpch -d tpch -f a.sql | sed -n 1p | sed 's/|/,/g'`
analyze="analyze $i($clomun)"
echo $analyze
echo $analyze > tmp.sql
psql -h 192.168.0.20 -p 5432 -U tpch -d tpch -f tmp.sql
done
sh ana.sh
1.2 Running the TPC-H Test
mkdir run_log
more run_tpch.sh
--------------------------
for i in {1..22}
do
echo "begin run Q${i}, query/$i.sql , `date`"
begin_time=`date +%s.%N`
PGPASSWORD=tpch psql -h 192.168.0.20 -p 5432-d tpch -U tpch -f query/${i}.sql > ./run_log/log_q${i}.out
rc=$?
end_time=`date +%s.%N`
cost=`echo "$end_time-$begin_time"|bc`
total_cost=`echo "$total_cost+$cost"|bc`
if [ $rc -ne 0 ] ; then
printf "run Q%s fail, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
else
printf "run Q%s succ, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
fi
done
sh run_tpch.sh
Note: The node should have the 'bc' package installed: yum install bc
1.3 TPC-H Test Results
SQL Number | Running Time (Seconds) | Note |
---|---|---|
1 | 3.10 | |
2 | 0.65 | |
3 | 1.12 | |
4 | 0.99 | |
5 | 1.21 | |
6 | 0.12 | |
7 | 1.10 | |
8 | 1.14 | |
9 | 1.48 | |
10 | 1.01 | |
11 | 0.27 | |
12 | 0.95 | |
13 | 0.53 | |
14 | 0.11 | |
15 | 0.22 | |
16 | 0.21 | |
17 | 1.96 | |
18 | 1.98 | |
19 | 0.16 | |
20 | 0.87 | |
21 | 1.98 | |
22 | 0.30 |
02 Performance Testing on Klustron
2.1. Test Preparation
2.1.1 Installing Klustron
Skipped
Using Klustron-1.2.1 version
Parameters of the cluster created for the test are attached
2.1.2 Modifying Parameters of Computing Node and Storage Node
Key Parameters of the Computing Node (PG):
psql -h 192.168.0.20 -p 47001 -U abc postgres
alter system set shared_buffers='32GB';
alter system set statement_timeout=6000000;
alter system set mysql_read_timeout=1200;
alter system set mysql_write_timeout=1200;
alter system set lock_timeout=1200000;
alter system set log_min_duration_statement=1200000;
alter system set effective_cache_size = '64GB';
alter system set work_mem = '8MB';
alter system set wal_buffers='64MB';
alter system set autovacuum=false;
alter system set log_min_messages='DEBUG1' ; #Activate when SQL debugging is needed
alter system set enable_sql_log=on; #Activate when SQL debugging is needed
pg_ctl restart -D /nvme2/kunlun/server_datadir/47001
Key Parameters of the Storage Node (MYSQL):
innodb_buffer_pool_size=32*1024*1024*1024
lock_wait_timeout=1200
innodb_lock_wait_timeout=1200
fullsync_timeout=1200000
enable_fullsync=false
innodb_flush_log_at_trx_commit=2
sync_binlog=0
max_binlog_size=1*1024*1024*1024
mysql -h 192.168.0.21 -P 57003 -upgx -ppgx_pwd
mysql -h 192.168.0.22 -P 57005 -upgx -ppgx_pwd
set global innodb_buffer_pool_size=32*1024*1024*1024;
set global lock_wait_timeout=1200;
set global innodb_lock_wait_timeout=1200;
set global fullsync_timeout=1200000;
set global enable_fullsync=false;
set global innodb_flush_log_at_trx_commit=2;
set global sync_binlog=0;
set global max_binlog_size=1*1024*1024*1024;
set global enable_fullsync=off; #All shard master nodes turn off full synchronization
set enable_fullsync=off; #All shard master nodes turn off full synchronization
set global general_log=on ; #Activate when SQL debugging is needed
Note: Avoid master node anomalies causing node switching during data loading by setting Failover to 10800 seconds (Xpanel)
Adjustment of MySQL RBR 3 Node Parameters of Metadata Cluster:
export LD_LIBRARY_PATH=/nvme2/kunlun/instance_binaries/storage/57001/Klustron-storage-1.2.1/lib:$LD_LIBRARY_PATH ;
/nvme2/kunlun/instance_binaries/storage/57001/Klustron-storage-1.2.1/bin/mysql --connect-expired-password -S/nvme2/kunlun/storage_logdir/57001/mysql.sock -uroot -proot
set global innodb_buffer_pool_size=1*1024*1024*1024; (Default 128M)
2.1.3 Creating Test Database and Users in the Klustron Cluster
psql -h 192.168.0.20 -p 47001 -U abc postgres
create user tpch with password 'tpch';
create database tpch with owner tpch ;
grant pg_read_server_files to tpch ;
2.1.4 Accessing the TPC-H Working Directory
cd /home/kunlun/tpch301/dbgen
2.1.5 Creating TPC-H Related Test Objects
cat << EOF > dss.ddl
-- Sccsid: @(#)dss.ddl 2.1.8.1
CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL,
N_NAME CHAR(25) NOT NULL,
N_REGIONKEY INTEGER NOT NULL,
N_COMMENT VARCHAR(152)) partition by hash(N_NATIONKEY);
create table NATION_0 partition of NATION for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table NATION_1 partition of NATION for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table NATION_2 partition of NATION for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table NATION_3 partition of NATION for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table NATION_4 partition of NATION for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table NATION_5 partition of NATION for values with(MODULUS 6, REMAINDER 5) with (shard = 2);
CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL,
R_NAME CHAR(25) NOT NULL,
R_COMMENT VARCHAR(152)) partition by hash(R_REGIONKEY);
create table REGION_0 partition of REGION for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table REGION_1 partition of REGION for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table REGION_2 partition of REGION for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table REGION_3 partition of REGION for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table REGION_4 partition of REGION for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table REGION_5 partition of REGION for values with(MODULUS 6, REMAINDER 5) with (shard = 2);
CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL,
P_NAME VARCHAR(55) NOT NULL,
P_MFGR CHAR(25) NOT NULL,
P_BRAND CHAR(10) NOT NULL,
P_TYPE VARCHAR(25) NOT NULL,
P_SIZE INTEGER NOT NULL,
P_CONTAINER CHAR(10) NOT NULL,
P_RETAILPRICE DECIMAL(15,2) NOT NULL,
P_COMMENT VARCHAR(23) NOT NULL ) partition by hash(P_PARTKEY);
create table PART_0 partition of PART for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table PART_1 partition of PART for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table PART_2 partition of PART for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table PART_3 partition of PART for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table PART_4 partition of PART for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table PART_5 partition of PART for values with(MODULUS 6, REMAINDER 5) with (shard = 2);
CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL,
S_NAME CHAR(25) NOT NULL,
S_ADDRESS VARCHAR(40) NOT NULL,
S_NATIONKEY INTEGER NOT NULL,
S_PHONE CHAR(15) NOT NULL,
S_ACCTBAL DECIMAL(15,2) NOT NULL,
S_COMMENT VARCHAR(101) NOT NULL) partition by hash(S_SUPPKEY) ;
create table SUPPLIER_0 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table SUPPLIER_1 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table SUPPLIER_2 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table SUPPLIER_3 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table SUPPLIER_4 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table SUPPLIER_5 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 5) with (shard = 2);
CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL,
PS_SUPPKEY INTEGER NOT NULL,
PS_AVAILQTY INTEGER NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
PS_COMMENT VARCHAR(199) NOT NULL ) partition by hash(PS_PARTKEY,PS_SUPPKEY) ;
create table PARTSUPP_0 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table PARTSUPP_1 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table PARTSUPP_2 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table PARTSUPP_3 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table PARTSUPP_4 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table PARTSUPP_5 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 5) with (shard = 2);
CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL,
C_NAME VARCHAR(25) NOT NULL,
C_ADDRESS VARCHAR(40) NOT NULL,
C_NATIONKEY INTEGER NOT NULL,
C_PHONE CHAR(15) NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT CHAR(10) NOT NULL,
C_COMMENT VARCHAR(117) NOT NULL) partition by hash(C_CUSTKEY) ;
create table CUSTOMER_0 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table CUSTOMER_1 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table CUSTOMER_2 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table CUSTOMER_3 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table CUSTOMER_4 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table CUSTOMER_5 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 5) with (shard = 2);
CREATE TABLE ORDERS ( O_ORDERKEY INTEGER NOT NULL,
O_CUSTKEY INTEGER NOT NULL,
O_ORDERSTATUS CHAR(1) NOT NULL,
O_TOTALPRICE DECIMAL(15,2) NOT NULL,
O_ORDERDATE DATE NOT NULL,
O_ORDERPRIORITY CHAR(15) NOT NULL,
O_CLERK CHAR(15) NOT NULL,
O_SHIPPRIORITY INTEGER NOT NULL,
O_COMMENT VARCHAR(79) NOT NULL) partition by hash(O_ORDERKEY);
create table ORDERS_0 partition of ORDERS for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table ORDERS_1 partition of ORDERS for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table ORDERS_2 partition of ORDERS for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table ORDERS_3 partition of ORDERS for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table ORDERS_4 partition of ORDERS for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table ORDERS_5 partition of ORDERS for values with(MODULUS 6, REMAINDER 5) with (shard = 2);
CREATE TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY DECIMAL(15,2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
L_DISCOUNT DECIMAL(15,2) NOT NULL,
L_TAX DECIMAL(15,2) NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL) partition by hash(L_LINENUMBER, L_ORDERKEY);
create table LINEITEM_0 partition of LINEITEM for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table LINEITEM_1 partition of LINEITEM for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table LINEITEM_2 partition of LINEITEM for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table LINEITEM_3 partition of LINEITEM for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table LINEITEM_4 partition of LINEITEM for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table LINEITEM_5 partition of LINEITEM for values with(MODULUS 6, REMAINDER 5) with (shard = 2);
EOF
psql -h 192.168.0.20 -p 47001 -U tpch tpch -f ./dss.ddl
cat << EOF > tpch-index
-- indexes on the foreign keys
CREATE INDEX IDX_SUPPLIER_NATION_KEY ON SUPPLIER (S_NATIONKEY);
CREATE INDEX IDX_PARTSUPP_PARTKEY ON PARTSUPP (PS_PARTKEY);
CREATE INDEX IDX_PARTSUPP_SUPPKEY ON PARTSUPP (PS_SUPPKEY);
CREATE INDEX IDX_CUSTOMER_NATIONKEY ON CUSTOMER (C_NATIONKEY);
CREATE INDEX IDX_ORDERS_CUSTKEY ON ORDERS (O_CUSTKEY);
CREATE INDEX IDX_LINEITEM_ORDERKEY ON LINEITEM (L_ORDERKEY);
CREATE INDEX IDX_LINEITEM_PART_SUPP ON LINEITEM (L_PARTKEY,L_SUPPKEY);
CREATE INDEX IDX_NATION_REGIONKEY ON NATION (N_REGIONKEY);
-- aditional indexes
CREATE INDEX IDX_LINEITEM_SHIPDATE ON LINEITEM (L_SHIPDATE, L_DISCOUNT, L_QUANTITY);
CREATE INDEX IDX_ORDERS_ORDERDATE ON ORDERS (O_ORDERDATE);
EOF
psql -h 192.168.0.20 -p 47001 -U tpch tpch -f ./tpch-index
cat << EOF > tpch-pkeys
ALTER TABLE PART ADD PRIMARY KEY (P_PARTKEY);
ALTER TABLE SUPPLIER ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE PARTSUPP ADD PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY);
ALTER TABLE CUSTOMER ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE ORDERS ADD PRIMARY KEY (O_ORDERKEY);
ALTER TABLE LINEITEM ADD PRIMARY KEY (L_ORDERKEY, L_LINENUMBER);
ALTER TABLE NATION ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE REGION ADD PRIMARY KEY (R_REGIONKEY);
EOF
psql -h 192.168.0.20 -p 47001 -U tpch tpch -f ./tpch-pkeys
2.1.6 Configuring Parameters and Compiling the Test Tool
cp makefile.suite makefile
vi makefile
--------------------------------
CC = gcc
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TPCH
make
Note: Ensure that gcc related packages are installed
2.1.7 Generating and Preprocessing Test Data
./dbgen -vf -s 1 # Generate 1GB test data
vi trim-data.sh
----------------------------------
for i in `ls *.tbl`
do
sed 's/|$//' $i > ${i}.csv;
done
sh trim-data.sh
split -l 100000 -d lineitem.tbl.csv lineitem.tbl.csv-
Note: To ensure consistency of test data, the test data files generated by dbgen in the previous Greenplum test were used directly when obtaining the TPCH test data for Klustron.
2.1.8 Loading Test Data
psql -h 192.168.0.20 -p 47001 -U tpch -d tpch
COPY customer(C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) FROM '/home/kunlun/tpch301/dbgen/customer.tbl.csv' delimiter '|' ;
COPY nation(N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT) FROM '/home/kunlun/tpch301/dbgen/nation.tbl.csv' delimiter '|' ;
COPY orders(O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT) FROM '/home/kunlun/tpch301/dbgen/orders.tbl.csv' delimiter '|' ;
COPY partsupp(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_SUPPLYCOST,PS_COMMENT) FROM '/home/kunlun/tpch301/dbgen/partsupp.tbl.csv' delimiter '|' ;
COPY part(P_PARTKEY,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_SIZE,P_CONTAINER,P_RETAILPRICE,P_COMMENT) FROM '/home/kunlun/tpch301/dbgen/part.tbl.csv' delimiter '|' ;
COPY region(R_REGIONKEY,R_NAME,R_COMMENT) FROM '/home/kunlun/tpch301/dbgen/region.tbl.csv' delimiter '|' ;
COPY supplier(S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_ACCTBAL,S_COMMENT) FROM '/home/kunlunn/tpch301/dbgen/supplier.tbl.csv' delimiter '|' ;
vi load_lineitem.sh
-----------------------------------
for i in `ls lineitem.tbl.csv-*`
do
PGPASSWORD=tpch psql -h 192.168.0.20 -p 47001 -U tpch -d tpch -c "\copy lineitem from '/home/kunlun/tpch301/dbgen/${i}' with delimiter as '|' NULL ' ';" ;
done
sh load_lineitem.sh
Note: Here you need to upload the dss.tgz file to the dbgen directory and unzip it.
2.1.9 Generating TPC-H Test SQL
mkdir query
more gen-sql.sh
---------------------------
for q in `seq 1 22`
do
DSS_QUERY=dss/templates ./qgen ${q} >> query/${q}.sql
sed 's/^select/explain select/' query/${q}.sql > query/${q}.explain.sql
cat query/${q}.sql >> query/${q}.explain.sql;
done
sh gen-sql.sh
2.1.10 Analyzing the Loaded Objects
ana.sh
----------------------------
for i in `cat ./dss.ddl | grep 'CREATE TABLE' | grep -v partition | awk '{print $3}'`
do
echo "select * from $i limit 1;" > a.sql
clomun=`psql -h 192.168.0.20 -p 47001 -U tpch -d tpch -f a.sql | sed -n 1p | sed 's/|/,/g'`
analyze="analyze $i($clomun)"
echo $analyze
echo $analyze > tmp.sql
psql -h 192.168.0.20 -p 47001 -U tpch -d tpch -f tmp.sql
done
sh ana.sh
2.2 Running the TPC-H Test
mkdir run_log
more run_tpch.sh
--------------------------
for i in {1..22}
do
echo "begin run Q${i}, query/$i.sql , `date`"
begin_time=`date +%s.%N`
PGPASSWORD=tpch psql -h 192.168.0.20 -p 47001 -d tpch -U tpch -f query/${i}.sql > ./run_log/log_q${i}.out
rc=$?
end_time=`date +%s.%N`
cost=`echo "$end_time-$begin_time"|bc`
total_cost=`echo "$total_cost+$cost"|bc`
if [ $rc -ne 0 ] ; then
printf "run Q%s fail, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
else
printf "run Q%s succ, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
fi
done
sh run_tpch.sh
Note: The node should have the 'bc' package installed: yum install bc
2.3 TPC-H Test Results
SQL Number | Running Time (Seconds) | Note |
---|---|---|
1 | 2.98 | |
2 | 104.96 | |
3 | 1.35 | |
4 | 0.81 | |
5 | 2.56 | |
6 | 0.31 | |
7 | 1.19 | |
8 | 2.34 | |
9 | 81.32 | |
10 | 0.93 | |
11 | 0.49 | |
12 | 0.84 | |
13 | 1.23 | |
14 | 0.24 | |
15 | 0.59 | |
16 | 1.16 | |
17 | 2.99 | |
18 | 11.18 | |
19 | 0.69 | |
20 | 14.44 | |
21 | 365.06 | psql:query/21.sql:43: ERROR: Kunlun-db: Connection with MySQL storage node (2147483648, 2147483651) is gone: 2006, Server has gone away. Resend the statement. DETAIL: Disconnected all connections to MySQL storage nodes. |
22 | 0.74 |
Appendix: The above tests were carried out with database objects partitioned. Before this, a separate round of tests was carried out without partitioning the tables, i.e., Klustron was free to determine the shard where the table was stored. The test results are as follows:
SQL Number | Running Time (Seconds) | Note |
---|---|---|
1 | 3.77 | |
2 | 0.42 | |
3 | 1.36 | |
4 | 2.43 | |
5 | 1.41 | |
6 | 0.35 | |
7 | 0.67 | |
8 | 2.00 | |
9 | 8.29 | |
10 | 2.82 | |
11 | 0.43 | |
12 | 1.19 | |
13 | 2.15 | |
14 | 0.23 | |
15 | 0.91 | |
16 | 1.61 | |
17 | 14.49 | |
18 | 4.28 | |
19 | 0.52 | |
20 | 1.95 | |
21 | 1.15 | |
22 | 0.43 |
03 Klustron vs Greenplum TPC-H Test Results Summary
SQL Number | Non-partitioned Klustron Running Time (Seconds) | Partitioned Klustron Running Time (Seconds) | Greenplum Running Time (Seconds) |
---|---|---|---|
1 | 3.77 | 2.98 | 3.10 |
2 | 0.42 | 104.96 | 0.65 |
3 | 1.36 | 1.35 | 1.12 |
4 | 2.43 | 0.81 | 0.99 |
5 | 1.41 | 2.56 | 1.21 |
6 | 0.35 | 0.31 | 0.12 |
7 | 0.67 | 1.19 | 1.10 |
8 | 2.00 | 2.34 | 1.14 |
9 | 8.29 | 81.32 | 1.48 |
10 | 2.82 | 0.93 | 1.01 |
11 | 0.43 | 0.49 | 0.27 |
12 | 1.19 | 0.84 | 0.95 |
13 | 2.15 | 1.23 | 0.53 |
14 | 0.23 | 0.24 | 0.11 |
15 | 0.91 | 0.59 | 0.22 |
16 | 1.61 | 1.16 | 0.21 |
17 | 14.49 | 2.99 | 1.96 |
18 | 4.28 | 11.18 | 1.98 |
19 | 0.52 | 0.69 | 0.16 |
20 | 1.95 | 14.44 | 0.87 |
21 | 1.15 | 365.06 | 1.98 |
22 | 0.43 | 0.74 | 0.30 |
Result Analysis
In section 2.3, the parallel query-related parameters of Klustron's computing nodes use default values, so at most two working processes execute the same query. In the future, we will run the TPCH test again using 32 or 64 processes and add the results to this document.
For most test cases, the time difference between Klustron and Greenplum is small; for a few test cases, Klustron's performance still has significant problems, which we will resolve in future versions.
Since all TPC-H tests are read-only, there is no replica machine installed on the shard, and the following configuration parameters have been modified: enable_fullsync=false, sync_binlog=0, innodb_flush_log_at_trx_commit = 2. These changes have no effect on the test results.