Use Ti-dumpling + Ticdc to synchronize TiDB data to Klustron
Use Ti-dumpling + Ticdc to synchronize TiDB data to Klustron
This article describes how to synchronize the full amount of data from a running TiDB cluster to the Klustron cluster, and then continuously update the streaming data to the Klustron cluster.
Set the gc_lift_time parameter
Enter TiDB
mysql -h 172.0.0.132 -uroot -proot -P40002 set global tidb_gc_life_time = '24h'
If this parameter is not set, the default is to clean up expired data and update the gc safe point every 10 minutes, and set the value of this parameter according to the size of the data to be fully exported
If the Pos value in the metadata file exported by dumping is smaller than the gc safe point, ticdc cannot be started and the following error message appears. For the GC mechanism, please refer to
https://docs.pingcap.com/zh/tidb/stable/garbage-collection-overview
Error: [CDC:ErrMetaListDatabases]meta store list databases: [tikv:9006]GC life time is shorter than transaction duration, transaction starts at \2022-11-16 16:29:54.329 +0800 CST, GC safe point is 2022-11-16 16:45:36.778 +0800 CST
Use ti-dumpling to fully export data
cd ~/.tiup
./bin/tiup install dumpling
./bin/tiup dumpling -u root -proot -P 40002 -h 172.0.0.132 \
-o /nvme2/compare/tidb/dumpling_data/ -B tpcc
- -o location of data store
- -B database name
- For specific usage methods, please refer to https://docs.pingcap.com/zh/tidb/stable/dumpling-overview
Use the ddl2kunlun-linux tool to create tables
Download the tool wget http://downloads.klustron.com/kunlun-utils/1.1.1/ddl2kunlun-linux
This tool will convert table definitions from other databases into table definitions that can be used by Klustron
Example of use
./ddl2kunlun-linux -host="172.0.0.132" -port="40002" -user="root" -password="root" \ -sourceType="mysql" -database="tpcc" -table="tablename" > a.sql
-host
///-port
The information-user
of-password
the database being exported;-sourceType
The database type of the exported database, default mysql;-database
/-table
The database and table name to be exported;You can view the help documentation through ./ddl2kunlun-linux --help ;
Then use psql or mysql to import the exported table definition into Klustron to generate a table.
psql -h 172.0.0.132 -p 35001 -U abc -d postgres < a.sql
Use a for loop script to run
Because the tool can only export one table at a time, it will be more frequent
You can refer to the following script to automatically import the converted table definition into Klustron
echo `show databases;` > test.sql for i in `mysql -h 172.0.0.132 -uroot -proot -P40002 tpcc < test.sql | grep -v Tables_in_` do ./ddl2kunlun-linux -host="172.0.0.132" -port="40002" -user="root" -password="root" -sourceType="mysql" -database="tpcc" -table="$i" > a.sql echo create table $i psql -h 172.0.0.132 -p 35001 -U abc -d postgres < a.sql done
Process the data after dumping
The data after dumping cannot be directly used by Klustron, so we need to process the data
The tidb database is mapped to Klustron, which is the schema under the postgres database, so we need to add the schema name after create table in the exported data file
Klustron date has minimum value '1970-01-01'
# cd到dumpling导出数据的文件夹 cd /nvme2/compare/tidb/dumpling_data for i in `ls . | grep -v schema | grep sql` do table=`echo $i | awk -F. '{print $2}'` db=`echo $i | awk -F. '{print $1}'` sed -i "s/\`$table\`/${db}.$table/" $i sed -i 's/0000-00-00/1970-01-01/' $i done


Import data to Klustron
Only import sql files that are not table definitions.
Because the table definition has already been imported, there is no need to import the file with the schema file name.
The metadata file will be used later when the ticdc task is started, and this file is not used in this step.
cd /nvme2/compare/tidb/dumpling_data for i in `ls . | grep -v schema | grep sql` do echo begin $i psql -h 172.0.0.132 -p 35001 -U abc -d postgres -f $i echo $i done done
Start the Ticdc task
It is recommended that the Tidb cluster version should be consistent with the Ticdc version
Example of use
cd ~/.tiup ./bin/tiup cdc cli changefeed create --pd=http://172.0.0.132:2379 --sink-uri="mysql://abc:abc@172.0.0.132:47001/?time-zone=" --changefeed-id="simple-replication-task" --sort-engine="unified" --start-ts=437427747125198853
- For details on how to use Ticdc, please refer to https://docs.pingcap.com/zh/tidb/stable/manage-ticdc
- -
-pd
TiDB's pd service can be used when there are more than one, separated; --sink-uri
The downstream database connection address can support mysql, kafaka, tidb. Klustron uses the computing node port number of the mysql protocol; ■ You can log in to the pg port of the Klustron computing node, and get the mysql port number through show mysql_port;--changefeed-id
Ticdc task name;--sort-engine
Specify the sorting engine used by changefeed, which supports unified (default)/memory/file;--start-ts
The position where the Ticdc task is started. The value of this option is the Pos value in the metadata file exported by dumping.
Set the gc_lift_time parameter
Enter TiDB
mysql -h 172.0.0.132 -uroot -proot -P40002
Set the gc_lift_time parameter value back to the default 10 minutes, that is, clean up expired data every 10 minutes
set global tidb_gc_life_time = '10m'
