Skip to main content

Use Ti-dumpling + Ticdc to synchronize TiDB data to Klustron

KlustronAbout 3 min

Use Ti-dumpling + Ticdc to synchronize TiDB data to Klustron

This article describes how to synchronize the full amount of data from a running TiDB cluster to the Klustron cluster, and then continuously update the streaming data to the Klustron cluster.

Set the gc_lift_time parameter

  • Enter TiDB

    mysql -h 172.0.0.132 -uroot -proot -P40002
    set global tidb_gc_life_time = '24h'
    
  • If this parameter is not set, the default is to clean up expired data and update the gc safe point every 10 minutes, and set the value of this parameter according to the size of the data to be fully exported

    • If the Pos value in the metadata file exported by dumping is smaller than the gc safe point, ticdc cannot be started and the following error message appears. For the GC mechanism, please refer to

      https://docs.pingcap.com/zh/tidb/stable/garbage-collection-overview

      Error: [CDC:ErrMetaListDatabases]meta store list databases: [tikv:9006]GC life time is shorter than transaction duration, transaction starts at \2022-11-16 16:29:54.329 +0800 CST, GC safe point is 2022-11-16 16:45:36.778 +0800 CST
      

Use ti-dumpling to fully export data

cd ~/.tiup
./bin/tiup install dumpling
./bin/tiup dumpling -u root -proot -P 40002 -h 172.0.0.132 \
-o /nvme2/compare/tidb/dumpling_data/ -B tpcc
  • -o location of data store
  • -B database name
  • For specific usage methods, please refer to https://docs.pingcap.com/zh/tidb/stable/dumpling-overview

Use the ddl2kunlun-linux tool to create tables

  • Download the tool wget http://downloads.klustron.com/kunlun-utils/1.1.1/ddl2kunlun-linux

  • This tool will convert table definitions from other databases into table definitions that can be used by Klustron

  • Example of use

    ./ddl2kunlun-linux -host="172.0.0.132" -port="40002" -user="root" -password="root" \
    -sourceType="mysql" -database="tpcc" -table="tablename" > a.sql
    
    • -host/// -portThe information -userof -passwordthe database being exported;

    • -sourceTypeThe database type of the exported database, default mysql;

    • -database/ -tableThe database and table name to be exported;

    • You can view the help documentation through ./ddl2kunlun-linux --help ;

    • Then use psql or mysql to import the exported table definition into Klustron to generate a table.

      psql -h 172.0.0.132 -p 35001 -U abc -d postgres < a.sql
      
  • Use a for loop script to run

  • Because the tool can only export one table at a time, it will be more frequent

  • You can refer to the following script to automatically import the converted table definition into Klustron

    echo `show databases;` > test.sql
    for i in `mysql -h 172.0.0.132 -uroot -proot -P40002 tpcc < test.sql | grep -v Tables_in_`
    do 
      ./ddl2kunlun-linux -host="172.0.0.132" -port="40002" -user="root" -password="root" -sourceType="mysql" -database="tpcc" -table="$i" > a.sql
      echo create table $i
      psql -h 172.0.0.132 -p 35001 -U abc -d postgres < a.sql
    done
    

Process the data after dumping

  • The data after dumping cannot be directly used by Klustron, so we need to process the data

  • The tidb database is mapped to Klustron, which is the schema under the postgres database, so we need to add the schema name after create table in the exported data file

  • Klustron date has minimum value '1970-01-01'

    # cd到dumpling导出数据的文件夹
    cd /nvme2/compare/tidb/dumpling_data
    for i in `ls . | grep -v schema | grep sql`
    do
       table=`echo $i | awk -F. '{print $2}'`
       db=`echo $i | awk -F. '{print $1}'`
       sed -i "s/\`$table\`/${db}.$table/" $i
       sed -i 's/0000-00-00/1970-01-01/' $i
    done
    

    ![img](使用Ti-dumpling + Ticdc 同步 TiDB 数据到 KunlunBase/1.png)

    ![img](使用Ti-dumpling + Ticdc 同步 TiDB 数据到 KunlunBase/2.png)

Import data to Klustron

  • Only import sql files that are not table definitions.

  • Because the table definition has already been imported, there is no need to import the file with the schema file name.

  • The metadata file will be used later when the ticdc task is started, and this file is not used in this step.

    cd /nvme2/compare/tidb/dumpling_data
    for i in `ls . | grep -v schema | grep sql`
    do
           echo begin $i
           psql -h 172.0.0.132 -p 35001 -U abc -d postgres -f $i
           echo $i done
    done
    

Start the Ticdc task

  • It is recommended that the Tidb cluster version should be consistent with the Ticdc version

  • Example of use

    cd ~/.tiup
    ./bin/tiup cdc cli changefeed create --pd=http://172.0.0.132:2379 --sink-uri="mysql://abc:abc@172.0.0.132:47001/?time-zone=" --changefeed-id="simple-replication-task" --sort-engine="unified" --start-ts=437427747125198853
    
    • For details on how to use Ticdc, please refer to https://docs.pingcap.com/zh/tidb/stable/manage-ticdc
    • - -pdTiDB's pd service can be used when there are more than one, separated;
    • --sink-uriThe downstream database connection address can support mysql, kafaka, tidb. Klustron uses the computing node port number of the mysql protocol; ■ You can log in to the pg port of the Klustron computing node, and get the mysql port number through show mysql_port;
    • --changefeed-idTicdc task name;
    • --sort-engineSpecify the sorting engine used by changefeed, which supports unified (default)/memory/file;
    • --start-tsThe position where the Ticdc task is started. The value of this option is the Pos value in the metadata file exported by dumping.

Set the gc_lift_time parameter

  • Enter TiDB

    • mysql -h 172.0.0.132 -uroot -proot -P40002
      
  • Set the gc_lift_time parameter value back to the default 10 minutes, that is, clean up expired data every 10 minutes

    • set global tidb_gc_life_time = '10m'
      

      ![img](使用Ti-dumpling + Ticdc 同步 TiDB 数据到 KunlunBase/3.png)

END