Skip to main content

Klustron vs. Greenplum: TPCH Results

KlustronAbout 13 min

Klustron vs. Greenplum: TPCH Results

Objective of this article:

We aim to conduct performance comparison tests on Klustron and Greenplum respectively using the popular industry tool TPC-H (TPCH), and derive conclusions from the results.

Test Environment One: (Greenplum)

Greenplum version: 6.24.5-rhel8-x86_64

Node TypeIPPort
Master192.168.0.205432
Seg1 Node192.168.0.216000
Seg2 Node192.168.0.226000

Operating Environment: CentOS 8.5 Linux 64-bit, CPU: AMD Ryzen 9 7950X 16-Core Processor, MEM: 128G, Storage: M.2 NVME SSD PCIE 4 Solid State 2T

Test Environment Two: (Klustron)

Node TypeIPPort
Computing Node192.168.0.2047001
Shard1 Master Node192.168.0.2157003
Shard2 Master Node192.168.0.2257005
XPanel192.168.0.2040580

Operating Environment: CentOS 8.5 Linux 64-bit, CPU: AMD Ryzen 9 7950X 16-Core Processor, MEM: 128G, Storage: M.2 NVME SSD PCIE 4 Solid State 2T

Test Environment Three: (TPC-H)

TPC-H: v3.0.1

Deployment Mode: Single Node

Operating Environment: CentOS 8.5 Linux 64-bit, CPU: AMD Ryzen 9 7950X 16-Core Processor, MEM: 128G, Storage: M.2 NVME SSD PCIE 4 Solid State 2T

IP: 192.168.0.20

Note: As the TPC-H test is conducted in a single user mode to stress the target library and does not need to form testing pressure, in both test scenarios, 192.168.0.20 is used to initiate TPCH related SQL queries.

01 Greenplum Performance Testing

1.1. Test Preparation

1.1.1. Installing Greenplum

Skipped

1.1.2 Adjusting Parameters (Execute on Master Node)

gpconfig -c shared_buffers -v 32GB
gpconfig -c statement_timeout -v 6000000
gpconfig -c lock_timeout -v 1200000
gpconfig -c log_min_duration_statement -v 1200000
gpconfig -c effective_cache_size  -v  64GB
gpconfig -c work_mem   -v  8MB
gpconfig -c wal_buffers -v 64MB
gpconfig -c autovacuum -v false
gpstop -M fast
gpstart

1.1.3 Creating Test Users and Database

psql -h 192.168.0.20 -p5432 -d gpdw   #Greenplum安装完成时创建了gpdw数据库
create user tpch with password 'tpch';
create database tpch with owner tpch ;
alter user tpch with superuser;

1.1.4 Accessing the TPC-H Software Working Directory

cd /home/gpadmin/tpch301/dbgen

1.1.5 Creating TPC-H Related Test Objects

cat << EOF > dss.ddl
-- Sccsid:     @(#)dss.ddl      2.1.8.1
CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,
                           O_CLERK          CHAR(15) NOT NULL,
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL)
                          DISTRIBUTED BY (O_ORDERKEY);
CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL) 
                           DISTRIBUTED BY (PS_PARTKEY,PS_SUPPKEY);
CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL) 
                            DISTRIBUTED BY (C_CUSTKEY);
CREATE TABLE PART ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL) 
                        DISTRIBUTED BY (P_PARTKEY);
CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL)
                            DISTRIBUTED BY (S_SUPPKEY);
CREATE TABLE NATION ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152)) 
                           DISTRIBUTED BY (N_NATIONKEY);
CREATE TABLE REGION ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152)) 
                           DISTRIBUTED BY (R_REGIONKEY);
CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    DECIMAL(15,2) NOT NULL,
                             L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                             L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                             L_TAX         DECIMAL(15,2) NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL)  
                            DISTRIBUTED BY (L_LINENUMBER, L_ORDERKEY);
EOF

psql -h 192.168.0.20 -p5432 -Utpch -d tpch -f ./dss.ddl 

cat << EOF > tpch-index
-- indexes on the foreign keys
CREATE INDEX IDX_SUPPLIER_NATION_KEY ON SUPPLIER (S_NATIONKEY);
CREATE INDEX IDX_PARTSUPP_PARTKEY ON PARTSUPP (PS_PARTKEY);
CREATE INDEX IDX_PARTSUPP_SUPPKEY ON PARTSUPP (PS_SUPPKEY);
CREATE INDEX IDX_CUSTOMER_NATIONKEY ON CUSTOMER (C_NATIONKEY);
CREATE INDEX IDX_ORDERS_CUSTKEY ON ORDERS (O_CUSTKEY);
CREATE INDEX IDX_LINEITEM_ORDERKEY ON LINEITEM (L_ORDERKEY);
CREATE INDEX IDX_LINEITEM_PART_SUPP ON LINEITEM (L_PARTKEY,L_SUPPKEY);
CREATE INDEX IDX_NATION_REGIONKEY ON NATION (N_REGIONKEY);
-- aditional indexes
CREATE INDEX IDX_LINEITEM_SHIPDATE ON LINEITEM (L_SHIPDATE, L_DISCOUNT, L_QUANTITY);
CREATE INDEX IDX_ORDERS_ORDERDATE ON ORDERS (O_ORDERDATE);
EOF

psql -h 192.168.0.20 -p5432 -Utpch -d tpch -f ./tpch-index

cat << EOF > tpch-pkeys
ALTER TABLE PART ADD PRIMARY KEY (P_PARTKEY);
ALTER TABLE SUPPLIER ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE PARTSUPP ADD PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY);
ALTER TABLE CUSTOMER ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE ORDERS ADD PRIMARY KEY (O_ORDERKEY);
ALTER TABLE LINEITEM ADD PRIMARY KEY (L_ORDERKEY, L_LINENUMBER);
ALTER TABLE NATION ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE REGION ADD PRIMARY KEY (R_REGIONKEY);
EOF

psql -h 192.168.0.20 -p5432 -Utpch -d tpch -f ./tpch-pkeys

1.1.6 Configuring Parameters and Compiling the Test Tool

cp makefile.suite makefile

vi makefile
--------------------------------
CC      = gcc
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TPCH

make

Note: Ensure that gcc related packages are installed

1.1.7 Generating and Preprocessing Test Data

./dbgen -vf -s 1   # Generate 1GB test data

vi trim-data.sh
----------------------------------
for i in `ls *.tbl`
do
  sed 's/|$//' $i > ${i}.csv;
done

sh trim-data.sh

split -l 100000 -d lineitem.tbl.csv lineitem.tbl.csv-

1.1.8 Loading Test Data

psql -h 192.168.0.20 -p5432 -Utpch -d tpch 

COPY customer(C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/customer.tbl.csv' delimiter '|' ;
COPY nation(N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT) FROM   '/home/gpadmin/tpch301/dbgen/nation.tbl.csv' delimiter '|' ;
COPY orders(O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT) FROM   '/home/gpadmin/tpch301/dbgen/orders.tbl.csv' delimiter '|' ;
COPY partsupp(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_SUPPLYCOST,PS_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/partsupp.tbl.csv' delimiter '|' ;
COPY part(P_PARTKEY,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_SIZE,P_CONTAINER,P_RETAILPRICE,P_COMMENT) FROM     '/home/gpadmin/tpch301/dbgen/part.tbl.csv' delimiter '|' ;
COPY region(R_REGIONKEY,R_NAME,R_COMMENT) FROM   '/home/gpadmin/tpch301/dbgen/region.tbl.csv' delimiter '|' ;
COPY supplier(S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_ACCTBAL,S_COMMENT) FROM '/home/gpadmin/tpch301/dbgen/supplier.tbl.csv' delimiter '|' ;

vi load_lineitem.sh
-----------------------------------
for i in `ls lineitem.tbl.csv-*`
do
PGPASSWORD=tpch psql -h 192.168.0.20 -p 5432 -U tpch -d tpch -c "\copy lineitem from '/home/gpadmin/tpch301/dbgen/${i}' with delimiter as '|' NULL ' ';" ;
done

sh load_lineitem.sh

Note: Here you need to upload the dss.tgz file to the dbgen directory and unzip it

1.1.9 Explaining Test Data

The dbgen command generates 1GB of test data, and after loading the data into the Greenplum database, the number of data records in the 8 tables is as follows:

Customer:150000
Orders:  1500000
Partsupp: 800000
Part:  200000
Region:  5
Supplier:  10000
Nation:  25
Lineitem:  6001215

1.1.10 Generating TPC-H Test SQL

mkdir query
more gen-sql.sh
---------------------------
for q in `seq 1 22`
do
    DSS_QUERY=dss/templates ./qgen ${q} >> query/${q}.sql
    sed 's/^select/explain select/' query/${q}.sql > query/${q}.explain.sql
    cat query/${q}.sql >> query/${q}.explain.sql;
done

sh gen-sql.sh

1.1.11 Analyzing the Loaded Objects

ana.sh
----------------------------
for i in `cat ./dss.ddl | grep 'CREATE TABLE' | grep -v partition | awk '{print $3}'`
do
        echo "select * from $i limit 1;" > a.sql
        clomun=`psql -h 192.168.0.20 -p 5432 -U tpch -d tpch  -f a.sql | sed -n 1p | sed 's/|/,/g'`
        analyze="analyze $i($clomun)"
        echo $analyze
        echo $analyze > tmp.sql
        psql -h 192.168.0.20 -p 5432 -U tpch -d tpch -f tmp.sql
done

sh ana.sh 

1.2 Running the TPC-H Test

mkdir run_log
more run_tpch.sh
--------------------------
for i in {1..22}
do
        echo "begin run Q${i}, query/$i.sql , `date`"
        begin_time=`date +%s.%N`
        PGPASSWORD=tpch psql -h 192.168.0.20 -p 5432-d tpch -U tpch  -f query/${i}.sql > ./run_log/log_q${i}.out
        rc=$?
        end_time=`date +%s.%N`
        cost=`echo "$end_time-$begin_time"|bc`
        total_cost=`echo "$total_cost+$cost"|bc`
        if [ $rc -ne 0 ] ; then
              printf "run Q%s fail, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
         else
              printf "run Q%s succ, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
         fi
done

sh run_tpch.sh  

Note: The node should have the 'bc' package installed: yum install bc

1.3 TPC-H Test Results

SQL NumberRunning Time (Seconds)Note
13.10
20.65
31.12
40.99
51.21
60.12
71.10
81.14
91.48
101.01
110.27
120.95
130.53
140.11
150.22
160.21
171.96
181.98
190.16
200.87
211.98
220.30

02 Performance Testing on Klustron

2.1. Test Preparation

2.1.1 Installing Klustron

Skipped

Using Klustron-1.2.1 version

Parameters of the cluster created for the test are attached

2.1.2 Modifying Parameters of Computing Node and Storage Node

Key Parameters of the Computing Node (PG):

psql -h 192.168.0.20 -p 47001 -U abc postgres
alter system set shared_buffers='32GB';
alter system set statement_timeout=6000000;
alter system set mysql_read_timeout=1200;
alter system set mysql_write_timeout=1200;
alter system set lock_timeout=1200000;
alter system set log_min_duration_statement=1200000;
alter system set effective_cache_size = '64GB';
alter system set work_mem  = '8MB';
alter system set wal_buffers='64MB';
alter system set autovacuum=false;

alter system set log_min_messages='DEBUG1' ;  #Activate when SQL debugging is needed
alter system set enable_sql_log=on;  #Activate when SQL debugging is needed

pg_ctl restart -D /nvme2/kunlun/server_datadir/47001

Key Parameters of the Storage Node (MYSQL):

innodb_buffer_pool_size=32*1024*1024*1024
lock_wait_timeout=1200    
innodb_lock_wait_timeout=1200    
fullsync_timeout=1200000 
enable_fullsync=false
innodb_flush_log_at_trx_commit=2
sync_binlog=0
max_binlog_size=1*1024*1024*1024

mysql -h 192.168.0.21 -P 57003 -upgx -ppgx_pwd
mysql -h 192.168.0.22 -P 57005 -upgx -ppgx_pwd
set global innodb_buffer_pool_size=32*1024*1024*1024;
set global lock_wait_timeout=1200;
set global innodb_lock_wait_timeout=1200;    
set global fullsync_timeout=1200000; 
set global enable_fullsync=false;
set global innodb_flush_log_at_trx_commit=2;
set global sync_binlog=0;
set global max_binlog_size=1*1024*1024*1024;

set global enable_fullsync=off;  #All shard master nodes turn off full synchronization
set enable_fullsync=off;    #All shard master nodes turn off full synchronization

set global general_log=on ;   #Activate when SQL debugging is needed

Note: Avoid master node anomalies causing node switching during data loading by setting Failover to 10800 seconds (Xpanel)

Adjustment of MySQL RBR 3 Node Parameters of Metadata Cluster:

export LD_LIBRARY_PATH=/nvme2/kunlun/instance_binaries/storage/57001/Klustron-storage-1.2.1/lib:$LD_LIBRARY_PATH ; 
/nvme2/kunlun/instance_binaries/storage/57001/Klustron-storage-1.2.1/bin/mysql --connect-expired-password -S/nvme2/kunlun/storage_logdir/57001/mysql.sock -uroot -proot 
set global innodb_buffer_pool_size=1*1024*1024*1024;     (Default 128M)

2.1.3 Creating Test Database and Users in the Klustron Cluster

psql -h 192.168.0.20 -p 47001 -U abc postgres

create user tpch with password 'tpch';
create database tpch with owner tpch ;
grant pg_read_server_files to tpch ;

2.1.4 Accessing the TPC-H Working Directory

cd /home/kunlun/tpch301/dbgen

2.1.5 Creating TPC-H Related Test Objects

cat << EOF > dss.ddl
-- Sccsid:     @(#)dss.ddl      2.1.8.1
CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152)) partition by hash(N_NATIONKEY);
create table NATION_0 partition of NATION for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table NATION_1 partition of NATION for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table NATION_2 partition of NATION for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table NATION_3 partition of NATION for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table NATION_4 partition of NATION for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table NATION_5 partition of NATION for values with(MODULUS 6, REMAINDER 5) with (shard = 2);


CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152)) partition by hash(R_REGIONKEY);
create table REGION_0 partition of REGION for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table REGION_1 partition of REGION for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table REGION_2 partition of REGION for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table REGION_3 partition of REGION for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table REGION_4 partition of REGION for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table REGION_5 partition of REGION for values with(MODULUS 6, REMAINDER 5) with (shard = 2);


CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL ) partition by hash(P_PARTKEY);
create table PART_0 partition of PART for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table PART_1 partition of PART for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table PART_2 partition of PART for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table PART_3 partition of PART for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table PART_4 partition of PART for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table PART_5 partition of PART for values with(MODULUS 6, REMAINDER 5) with (shard = 2);


CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL) partition by hash(S_SUPPKEY)  ;
create table SUPPLIER_0 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table SUPPLIER_1 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table SUPPLIER_2 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table SUPPLIER_3 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table SUPPLIER_4 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table SUPPLIER_5 partition of SUPPLIER for values with(MODULUS 6, REMAINDER 5) with (shard = 2);


CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL ) partition by hash(PS_PARTKEY,PS_SUPPKEY) ;
create table PARTSUPP_0 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table PARTSUPP_1 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table PARTSUPP_2 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table PARTSUPP_3 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table PARTSUPP_4 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table PARTSUPP_5 partition of PARTSUPP for values with(MODULUS 6, REMAINDER 5) with (shard = 2);


CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL) partition by hash(C_CUSTKEY) ;
create table CUSTOMER_0 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table CUSTOMER_1 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table CUSTOMER_2 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table CUSTOMER_3 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table CUSTOMER_4 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table CUSTOMER_5 partition of CUSTOMER for values with(MODULUS 6, REMAINDER 5) with (shard = 2);


CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,
                           O_CLERK          CHAR(15) NOT NULL,
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL) partition by hash(O_ORDERKEY);
create table ORDERS_0 partition of ORDERS for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table ORDERS_1 partition of ORDERS for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table ORDERS_2 partition of ORDERS for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table ORDERS_3 partition of ORDERS for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table ORDERS_4 partition of ORDERS for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table ORDERS_5 partition of ORDERS for values with(MODULUS 6, REMAINDER 5) with (shard = 2);

CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    DECIMAL(15,2) NOT NULL,
                             L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                             L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                             L_TAX         DECIMAL(15,2) NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL) partition by hash(L_LINENUMBER, L_ORDERKEY);
create table LINEITEM_0 partition of LINEITEM for values with(MODULUS 6, REMAINDER 0) with (shard = 1);
create table LINEITEM_1 partition of LINEITEM for values with(MODULUS 6, REMAINDER 1) with (shard = 1);
create table LINEITEM_2 partition of LINEITEM for values with(MODULUS 6, REMAINDER 2) with (shard = 1);
create table LINEITEM_3 partition of LINEITEM for values with(MODULUS 6, REMAINDER 3) with (shard = 2);
create table LINEITEM_4 partition of LINEITEM for values with(MODULUS 6, REMAINDER 4) with (shard = 2);
create table LINEITEM_5 partition of LINEITEM for values with(MODULUS 6, REMAINDER 5) with (shard = 2);

EOF

psql -h 192.168.0.20 -p 47001 -U tpch tpch -f ./dss.ddl

cat << EOF > tpch-index
-- indexes on the foreign keys
CREATE INDEX IDX_SUPPLIER_NATION_KEY ON SUPPLIER (S_NATIONKEY);
CREATE INDEX IDX_PARTSUPP_PARTKEY ON PARTSUPP (PS_PARTKEY);
CREATE INDEX IDX_PARTSUPP_SUPPKEY ON PARTSUPP (PS_SUPPKEY);
CREATE INDEX IDX_CUSTOMER_NATIONKEY ON CUSTOMER (C_NATIONKEY);
CREATE INDEX IDX_ORDERS_CUSTKEY ON ORDERS (O_CUSTKEY);
CREATE INDEX IDX_LINEITEM_ORDERKEY ON LINEITEM (L_ORDERKEY);
CREATE INDEX IDX_LINEITEM_PART_SUPP ON LINEITEM (L_PARTKEY,L_SUPPKEY);
CREATE INDEX IDX_NATION_REGIONKEY ON NATION (N_REGIONKEY);
-- aditional indexes
CREATE INDEX IDX_LINEITEM_SHIPDATE ON LINEITEM (L_SHIPDATE, L_DISCOUNT, L_QUANTITY);
CREATE INDEX IDX_ORDERS_ORDERDATE ON ORDERS (O_ORDERDATE);
EOF

psql -h 192.168.0.20 -p 47001 -U tpch tpch -f ./tpch-index

cat << EOF > tpch-pkeys
ALTER TABLE PART ADD PRIMARY KEY (P_PARTKEY);
ALTER TABLE SUPPLIER ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE PARTSUPP ADD PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY);
ALTER TABLE CUSTOMER ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE ORDERS ADD PRIMARY KEY (O_ORDERKEY);
ALTER TABLE LINEITEM ADD PRIMARY KEY (L_ORDERKEY, L_LINENUMBER);
ALTER TABLE NATION ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE REGION ADD PRIMARY KEY (R_REGIONKEY);
EOF


psql -h 192.168.0.20 -p 47001 -U tpch tpch -f ./tpch-pkeys

2.1.6 Configuring Parameters and Compiling the Test Tool

cp makefile.suite makefile

vi makefile
--------------------------------
CC      = gcc
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TPCH

make

Note: Ensure that gcc related packages are installed

2.1.7 Generating and Preprocessing Test Data

./dbgen -vf -s 1   # Generate 1GB test data

vi trim-data.sh
----------------------------------
for i in `ls *.tbl`
do
  sed 's/|$//' $i > ${i}.csv;
done

sh trim-data.sh

split -l 100000 -d lineitem.tbl.csv lineitem.tbl.csv-

Note: To ensure consistency of test data, the test data files generated by dbgen in the previous Greenplum test were used directly when obtaining the TPCH test data for Klustron.

2.1.8 Loading Test Data

psql -h 192.168.0.20 -p 47001 -U tpch -d tpch 

COPY customer(C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT) FROM '/home/kunlun/tpch301/dbgen/customer.tbl.csv' delimiter '|' ;
COPY nation(N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT) FROM   '/home/kunlun/tpch301/dbgen/nation.tbl.csv' delimiter '|' ;
COPY orders(O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT) FROM   '/home/kunlun/tpch301/dbgen/orders.tbl.csv' delimiter '|' ;
COPY partsupp(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_SUPPLYCOST,PS_COMMENT) FROM '/home/kunlun/tpch301/dbgen/partsupp.tbl.csv' delimiter '|' ;
COPY part(P_PARTKEY,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_SIZE,P_CONTAINER,P_RETAILPRICE,P_COMMENT) FROM     '/home/kunlun/tpch301/dbgen/part.tbl.csv' delimiter '|' ;
COPY region(R_REGIONKEY,R_NAME,R_COMMENT) FROM   '/home/kunlun/tpch301/dbgen/region.tbl.csv' delimiter '|' ;
COPY supplier(S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_ACCTBAL,S_COMMENT) FROM '/home/kunlunn/tpch301/dbgen/supplier.tbl.csv' delimiter '|' ;

vi load_lineitem.sh
-----------------------------------
for i in `ls lineitem.tbl.csv-*`
do
PGPASSWORD=tpch psql -h 192.168.0.20 -p 47001 -U tpch -d tpch -c "\copy lineitem from '/home/kunlun/tpch301/dbgen/${i}' with delimiter as '|' NULL ' ';" ;
done

sh load_lineitem.sh

Note: Here you need to upload the dss.tgz file to the dbgen directory and unzip it.

2.1.9 Generating TPC-H Test SQL

mkdir query
more gen-sql.sh
---------------------------
for q in `seq 1 22`
do
    DSS_QUERY=dss/templates ./qgen ${q} >> query/${q}.sql
    sed 's/^select/explain select/' query/${q}.sql > query/${q}.explain.sql
    cat query/${q}.sql >> query/${q}.explain.sql;
done

sh gen-sql.sh

2.1.10 Analyzing the Loaded Objects

ana.sh
----------------------------
for i in `cat ./dss.ddl | grep 'CREATE TABLE' | grep -v partition | awk '{print $3}'`
do
        echo "select * from $i limit 1;" > a.sql
        clomun=`psql -h 192.168.0.20 -p 47001  -U tpch -d tpch  -f a.sql | sed -n 1p | sed 's/|/,/g'`
        analyze="analyze $i($clomun)"
        echo $analyze
        echo $analyze > tmp.sql
        psql -h 192.168.0.20 -p 47001 -U tpch -d tpch -f tmp.sql
done

sh ana.sh 

2.2 Running the TPC-H Test

mkdir run_log
more run_tpch.sh
--------------------------
for i in {1..22}
do
        echo "begin run Q${i}, query/$i.sql , `date`"
        begin_time=`date +%s.%N`
        PGPASSWORD=tpch psql -h 192.168.0.20 -p 47001 -d tpch -U tpch  -f query/${i}.sql > ./run_log/log_q${i}.out
        rc=$?
        end_time=`date +%s.%N`
        cost=`echo "$end_time-$begin_time"|bc`
        total_cost=`echo "$total_cost+$cost"|bc`
        if [ $rc -ne 0 ] ; then
              printf "run Q%s fail, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
         else
              printf "run Q%s succ, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
         fi
done

sh run_tpch.sh 

Note: The node should have the 'bc' package installed: yum install bc

2.3 TPC-H Test Results

SQL NumberRunning Time (Seconds)Note
12.98
2104.96
31.35
40.81
52.56
60.31
71.19
82.34
981.32
100.93
110.49
120.84
131.23
140.24
150.59
161.16
172.99
1811.18
190.69
2014.44
21365.06psql:query/21.sql:43: ERROR: Kunlun-db: Connection with MySQL storage node (2147483648, 2147483651) is gone: 2006, Server has gone away. Resend the statement. DETAIL: Disconnected all connections to MySQL storage nodes.
220.74

Appendix: The above tests were carried out with database objects partitioned. Before this, a separate round of tests was carried out without partitioning the tables, i.e., Klustron was free to determine the shard where the table was stored. The test results are as follows:

SQL NumberRunning Time (Seconds)Note
13.77
20.42
31.36
42.43
51.41
60.35
70.67
82.00
98.29
102.82
110.43
121.19
132.15
140.23
150.91
161.61
1714.49
184.28
190.52
201.95
211.15
220.43

03 Klustron vs Greenplum TPC-H Test Results Summary

SQL NumberNon-partitioned Klustron Running Time (Seconds)Partitioned Klustron Running Time (Seconds)Greenplum Running Time (Seconds)
13.772.983.10
20.42104.960.65
31.361.351.12
42.430.810.99
51.412.561.21
60.350.310.12
70.671.191.10
82.002.341.14
98.2981.321.48
102.820.931.01
110.430.490.27
121.190.840.95
132.151.230.53
140.230.240.11
150.910.590.22
161.611.160.21
1714.492.991.96
184.2811.181.98
190.520.690.16
201.9514.440.87
211.15365.061.98
220.430.740.30

Result Analysis

  1. In section 2.3, the parallel query-related parameters of Klustron's computing nodes use default values, so at most two working processes execute the same query. In the future, we will run the TPCH test again using 32 or 64 processes and add the results to this document.

  2. For most test cases, the time difference between Klustron and Greenplum is small; for a few test cases, Klustron's performance still has significant problems, which we will resolve in future versions.

  3. Since all TPC-H tests are read-only, there is no replica machine installed on the shard, and the following configuration parameters have been modified: enable_fullsync=false, sync_binlog=0, innodb_flush_log_at_trx_commit = 2. These changes have no effect on the test results.

END