Use Ti-dumpling + Ticdc to synchronize TiDB data to Klustron

This article describes how to synchronize the full amount of data from a running TiDB cluster to the Klustron cluster, and then continuously update the streaming data to the Klustron cluster.

Set the gc_lift_time parameter

Enter TiDB

mysql -h 172.0.0.132 -uroot -proot -P40002
set global tidb_gc_life_time = '24h'

If this parameter is not set, the default is to clean up expired data and update the gc safe point every 10 minutes, and set the value of this parameter according to the size of the data to be fully exported
- If the Pos value in the metadata file exported by dumping is smaller than the gc safe point, ticdc cannot be started and the following error message appears. For the GC mechanism, please refer to
  https://docs.pingcap.com/zh/tidb/stable/garbage-collection-overview
```
Error: [CDC:ErrMetaListDatabases]meta store list databases: [tikv:9006]GC life time is shorter than transaction duration, transaction starts at \2022-11-16 16:29:54.329 +0800 CST, GC safe point is 2022-11-16 16:45:36.778 +0800 CST
```

Use ti-dumpling to fully export data

cd ~/.tiup
./bin/tiup install dumpling
./bin/tiup dumpling -u root -proot -P 40002 -h 172.0.0.132 \
-o /nvme2/compare/tidb/dumpling_data/ -B tpcc

-o location of data store
-B database name
For specific usage methods, please refer to https://docs.pingcap.com/zh/tidb/stable/dumpling-overview

Use the ddl2kunlun-linux tool to create tables

Download the tool wget http://downloads.klustron.com/kunlun-utils/1.1.1/ddl2kunlun-linux
This tool will convert table definitions from other databases into table definitions that can be used by Klustron
Example of use
```
./ddl2kunlun-linux -host="172.0.0.132" -port="40002" -user="root" -password="root" \
-sourceType="mysql" -database="tpcc" -table="tablename" > a.sql
```
- -host/// -portThe information -userof -passwordthe database being exported;
- -sourceTypeThe database type of the exported database, default mysql;
- -database/ -tableThe database and table name to be exported;
- You can view the help documentation through ./ddl2kunlun-linux --help ;
- Then use psql or mysql to import the exported table definition into Klustron to generate a table.
```
psql -h 172.0.0.132 -p 35001 -U abc -d postgres < a.sql
```
Use a for loop script to run
Because the tool can only export one table at a time, it will be more frequent

You can refer to the following script to automatically import the converted table definition into Klustron

echo `show databases;` > test.sql
for i in `mysql -h 172.0.0.132 -uroot -proot -P40002 tpcc < test.sql | grep -v Tables_in_`
do 
  ./ddl2kunlun-linux -host="172.0.0.132" -port="40002" -user="root" -password="root" -sourceType="mysql" -database="tpcc" -table="$i" > a.sql
  echo create table $i
  psql -h 172.0.0.132 -p 35001 -U abc -d postgres < a.sql
done

Process the data after dumping

The data after dumping cannot be directly used by Klustron, so we need to process the data
The tidb database is mapped to Klustron, which is the schema under the postgres database, so we need to add the schema name after create table in the exported data file

Klustron date has minimum value '1970-01-01'

# cd到dumpling导出数据的文件夹
cd /nvme2/compare/tidb/dumpling_data
for i in `ls . | grep -v schema | grep sql`
do
   table=`echo $i | awk -F. '{print $2}'`
   db=`echo $i | awk -F. '{print $1}'`
   sed -i "s/\`$table\`/${db}.$table/" $i
   sed -i 's/0000-00-00/1970-01-01/' $i
done

![img](使用Ti-dumpling + Ticdc 同步 TiDB 数据到 KunlunBase/1.png)

![img](使用Ti-dumpling + Ticdc 同步 TiDB 数据到 KunlunBase/2.png)

Import data to Klustron

Only import sql files that are not table definitions.
Because the table definition has already been imported, there is no need to import the file with the schema file name.

The metadata file will be used later when the ticdc task is started, and this file is not used in this step.

cd /nvme2/compare/tidb/dumpling_data
for i in `ls . | grep -v schema | grep sql`
do
       echo begin $i
       psql -h 172.0.0.132 -p 35001 -U abc -d postgres -f $i
       echo $i done
done

Start the Ticdc task

It is recommended that the Tidb cluster version should be consistent with the Ticdc version
Example of use
```
cd ~/.tiup
./bin/tiup cdc cli changefeed create --pd=http://172.0.0.132:2379 --sink-uri="mysql://abc:abc@172.0.0.132:47001/?time-zone=" --changefeed-id="simple-replication-task" --sort-engine="unified" --start-ts=437427747125198853
```
- For details on how to use Ticdc, please refer to https://docs.pingcap.com/zh/tidb/stable/manage-ticdc
- - -pdTiDB's pd service can be used when there are more than one, separated;
- --sink-uriThe downstream database connection address can support mysql, kafaka, tidb. Klustron uses the computing node port number of the mysql protocol; ■ You can log in to the pg port of the Klustron computing node, and get the mysql port number through show mysql_port;
- --changefeed-idTicdc task name;
- --sort-engineSpecify the sorting engine used by changefeed, which supports unified (default)/memory/file;
- --start-tsThe position where the Ticdc task is started. The value of this option is the Pos value in the metadata file exported by dumping.

Set the gc_lift_time parameter

Enter TiDB

mysql -h 172.0.0.132 -uroot -proot -P40002

Set the gc_lift_time parameter value back to the default 10 minutes, that is, clean up expired data every 10 minutes
- ```
set global tidb_gc_life_time = '10m'
```
  ![img](使用Ti-dumpling + Ticdc 同步 TiDB 数据到 KunlunBase/3.png)

Use Ti-dumpling + Ticdc to synchronize TiDB data to Klustron

# Use Ti-dumpling + Ticdc to synchronize TiDB data to Klustron

# Set the gc_lift_time parameter

# Use ti-dumpling to fully export data

# Use the ddl2kunlun-linux tool to create tables

# Process the data after dumping

# Import data to Klustron

# Start the Ticdc task

# Set the gc_lift_time parameter

# END

Use Ti-dumpling + Ticdc to synchronize TiDB data to Klustron

Set the gc_lift_time parameter

Use ti-dumpling to fully export data

Use the ddl2kunlun-linux tool to create tables

Process the data after dumping

Import data to Klustron

Start the Ticdc task

Set the gc_lift_time parameter

END