Klustron Database Server Cluster Initialization Guide

KlustronAbout 30 min

Klustron Database Server Cluster Initialization Guide

01 Basic Process for Installing Klustron Cluster

Before installing a Klustron cluster on a group of computer servers, the servers must first be initialized (bootstrapped). This step installs the Klustron components on each server and ensures that these components can work together, as well as configures and starts the relevant components responsible for cluster management. After initialization, users can easily install multiple Klustron clusters on these database servers and manage them using the GUI tool XPanel provided by us. Each database server only needs to perform the bootstrap operation once.

This guide mainly introduces how to initialize the database servers and applies to all released versions of Klustron unless otherwise specified. If users need to perform specific operations for a specific version, there will be special instructions in the text. For information on using XPanel for cluster management operations, please refer to the XPanel User Manual

The initialization of the database server includes the following steps:
- Necessary environment settings for the target server and installation of several common external components
- Download the bootstrap script setup_cluster_manager.py
- Download the Klustron component package released by us
- Use the configuration template file to fill in the configuration file (in JSON format) required by setup_cluster_manager.py
- Run the setup_cluster_manager.py script

Figure of using setup_cluster_manager.py to initialize the database server

Figure of using XPanel for cluster management and operation

Cluster management and operation operations include creating clusters, adding/deleting computing nodes/storage nodes, adding/deleting storage clusters, cluster physical/logical backup and recovery, cluster scaling, online DDL and repartitioning, and all these operations can also be completed by calling the cluster_mgr API. In fact, XPanel also calls the cluster_mgr API to complete these cluster management operations.

Furthermore, users can use the Prometheus and Grafana installed by Klustron to monitor the status of cluster nodes, and use the ES and Kinaba installed by users themselves (Klustron will automatically configure and associate the ES component with various Klustron components) to view and search the running logs of cluster nodes. This part is discussed in other documents and will not be reiterated here.

02 Basic Concepts

Cluster Management System (CMS): A system that can receive cluster instance operation requests from the client (XPanel) through the cluster manager node, in order to perform various cluster operations on the cluster instance. The CMS consists of a core part and a non-core part.
- The core part (CMS core) includes the meta group and the cluster manager group.
- The non-core part includes numerous node managers managed by the core part.
Meta Group (MG): A replication group composed of multiple metadata nodes, used to store self-data of the CMS, not user business data. The MG is a replication group at runtime.
Cluster Instance (CS): A node set created by the user for processing user business data requests and storing user business data, containing multiple computing nodes and multiple data shards, with multiple storage nodes in each data shard.
Data Node (DN, a.k.a. storage node): A running instance of the Klustron-storage component used to store user business data. Multiple data nodes form a data shard and run as a master or slave node in the data shard.
Data Shard (a.k.a. storage shard, or shard): A Klustron cluster includes one or more shards, each storing a part of user data, and different shards storing user data with no intersection. Each shard has a master node and multiple slave nodes, with high availability achieved through binlog replication between master and slave nodes.
Computing Node (CN): A node used to process user business data requests. The computing node receives user business data requests, processes them accordingly, sends the content that needs to be stored to the data node, and returns the processing results to the user. A computing node is currently an instance of the Klustron-Server component.
Replication Group (RG): A cluster composed of two or more storage nodes, with data synchronization between storage nodes within the group achieved through Klustron FullSync (RBR) to ensure data consistency within the group.
Cluster Manager (CM): Receives cluster instance operation requests from the client and completes the request by sending the actual action to various node managers.
Cluster Manager Group (CMG): A high-availability group composed of at least three cluster managers, with the cluster manager with the LEADER role actually handling cluster instance operation requests from the client.
Node Manager (NM): A local command execution node that needs to be deployed on each working machine. The node receives requests from the cluster manager and performs various operations on data nodes, computing nodes, and other Klustron components on the local machine, including but not limited to installation, deletion, stoppage, and backup.
Working Machine: Can be a physical or virtual machine, on which metadata nodes, data nodes, or computing nodes will run.
XPanel: A web application running in a Docker image. Users can use a browser to connect to XPanel to perform cluster management operations.

03 Function Description

This script has four types of execution actions: install, stop, start, and clean. Each time the script runs, it performs one of these actions, and multiple tasks can be completed in one execution:

Install action(install): This action can install the CMS core and one or more node managers. One or more objects of the two types can be installed at a time, as long as the following requirements are followed:
- The CMS core must be executed in the first installation action and can only be installed once.
Start action(start): This action can start the CMS core and node managers. It is usually used after a machine or power outage or when manually stopped and needs to be restarted. Similar to the stop action, the start action also has some characteristics and suggestions:
- The node manager will automatically start all the metadata nodes, data nodes, and computing nodes it manages. Therefore, to start the entire system, all node managers and cluster managers need to be started. There is no need to start each cluster instance one by one.
Stop action(stop): This action can stop the node manager and stop the cluster core. The stop action has the following characteristics and suggestions:
- Stopping the node manager will stop all data nodes, computing nodes, and metadata nodes it manages on that working machine. Therefore, stopping the node manager is generally used together with stopping the cluster core or when the machine needs maintenance. Otherwise, it may cause the cluster instance or the cluster core to malfunction.
- Stopping the CMS core will cause all cluster instances to be inoperable. Therefore, stopping the CMS core is generally used when it is necessary to maintain all cluster instances.
Clean action(clean): This action can clean up the node manager and the entire cluster management system. The clean action has the following characteristics:
- Cleaning the node manager will clean up all metadata nodes, data nodes, and computing nodes it manages on the corresponding machine. Therefore, cleaning the node manager is generally used to clean up the entire system.
- Cleaning the CMS core. Only execute when cleaning up the entire system.
- The command is also commonly used for handling installation failures, that is, after the installation fails to determine the cause (such as the selected port being occupied, insufficient disk space, etc.), use clean to remove the remnants generated by the installation, and then fix the problem (such as modifying the configuration file, cleaning up space), and then execute the installation action again.

This script is designed for the following scenarios:

Installing the CMS core
After installing the CMS core, or during the installation, the following tasks can be performed multiple times:
- Initializing working machines, including installing Klustron node managers and other components
- When necessary, stopping the node managers to perform maintenance operations on the corresponding working machines, and restarting the node managers after maintenance is complete
Starting the CMS
- For example, in the event of a large-scale power outage in the data center
When all cluster instances and the CMS are no longer needed, cleaning up the entire CMS and all cluster computing and storage nodes on the database server
- Be extremely cautious when performing this step, as there are no permission checks or validations, and there is no way to undo the action once executed

All machines involved in the cluster installation are divided into two categories:

Working machines: machines that install various nodes of the system and cluster instances, and receive instructions from the control machine to complete the assigned tasks. There are usually multiple working machines.
Control machine: the machine on which the installation script runs. On this machine, the installation script is downloaded from Gitee and relevant software packages are downloaded from Index of / (klustron.com)open in new window. Then, the control machine issues various commands to the working machines to install the nodes. In actual deployment, one of the many working machines is usually selected as the control machine. Of course, a machine outside of the working machines can also be selected as the control machine.

04 Server Suggested Configuration

4.1 Development and Test Environment

For development and testing environments, it is recommended to use at least three machines, and each machine can deploy multiple types of components. The following are the specific requirements for each component:

4.2 Production Environment

For production environments, it is recommended to have at least four machines. If machine resources are sufficient, it is recommended to use eight machines. The following are the specific requirements for each component:

05 Preparation

Pre-installation check - Working machines (see diagram above)

All machines where nodes need to be installed (core nodes and non-core nodes) are working machines.
Either turn off the firewall or set firewall rules to allow all listening ports (including default ports specified in the configuration file) for all types of nodes installed on the machine. For example, run the following command as root to turn off the firewall:

systemctl stop firewalld
systemctl disable firewalld

Python2 needs to be installed on each working machine, and Python2 should be found in PATH after non-interactive login with the specified username.
- For Centos7, run the command: yum install python
- For Centos8/Kylin V10, run the command: yum install python2 -For Ubuntu 20.04 and above versions, run the command: apt install python2 -For opensuse 15, run the command: zypper install -y python2 -For other systems, please refer to the installation document of Python2 on the official website of Python or the official website of the operating system.
Install the aio library on each machine to ensure that all components can start normally. For example, on Centos7/8, run the following command:

[root@kunlun-test2]# yum install -y libaio libaio-devel

Create a user on each machine (can be different on each machine, but it is recommended to use the same name "klustron" on all machines) to run all related programs, and set a password for this user. This user is referred to as the working user, and the username corresponds to the user property in the subsequent machine configuration:

[root@kunlun-test3 /]# useradd --create-home --shell /bin/bash kunlun
[root@kunlun-test3 /]# passwd kunlun

Add sudo permissions for this working user to execute commands without a password. For example, use root to perform the following operations to set the sudo without password for the newly added klustron user:

vi /etc/sudoers
#Add the following line at the end:
Kunlun ALL=(ALL) NOPASSWD: ALL

It is recommended to create an empty directory (recommended to be /klustron) on each machine and set the owner of the directory to the working username created earlier. This directory corresponds to the basedir property in the subsequent machine configuration.

[root@kunlun-test3 /]# mkdir -p /klustron
[root@kunlun-test3 /]# chown -R kunlun:kunlun /klustron

Ensure that the maximum number of open files for the corresponding user is set appropriately to ensure that the program will not work abnormally due to insufficient file handle resources. The recommended value is at least 65536. You can use ulimit -n on the machine to display the current user's open file limit settings. If the displayed value is less than 63336, it is generally recommended to add the following lines to /etc/security/limits.conf:

* soft nofile 65536
* hard nofile 200000

For machines where XPanel or Elasticsearch/Kibana is installed, in addition to Python2, Docker (version 20 or above) also needs to be installed. Regarding Docker installation, you can refer to the official documentation of Docker engine installation: https://docs.docker.com/engine/install. For example, to install Docker on CentOS 7, you can run the following commands:

$ sudo yum remove docker docker-client docker-client-latest  docker-common  docker-latest \
  docker-latest-logrotate  docker-logrotate docker-engine
$ sudo yum install -y yum-utils
$ sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
$ sudo yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin

After installing Docker, you need to start Docker and set it to start automatically with the operating system. You can do this with the following commands:

systemctl enable docker
systemctl start docker

Pre-installation check - Control machine (see diagram above)

The machine needs to have python2 installed. Refer to the previous section for instructions on installing Python2.
git/wget needs to be installed. For example, on CentOS7/8, run the following command:

yum -y install git wget

Select an existing user on the control machine or create a new user as the control user. All subsequent control operations will be performed using this user. Although it is allowed, it is generally not recommended to use root as the control user. When the control machine itself is also a working machine, it is generally recommended to use the working user created earlier as the control user.
Under the control user, be able to use the specified account (the user created on the working machine, such as kunlun mentioned earlier and specified in the configuration file) to log in to all the working machines involved without a password through ssh (including the local machine if it is also a working machine). For example, on CentOS7/8, do the following:

# Generate an RSA public-private key pair. This step can be skipped if the key pair already exists in $HOME/.ssh.
[kunlun@kunlun-test12 ~]# ssh-keygen -t rsa
... Omit command output, when prompted to enter passphrase, just press Enter ...
# Set the current user on the current machine (here it is klustron) to ssh login without a password to all the worker machines involved, such as kunlun@192.168.0.125 (note that this step requires the corresponding login password).
# Note that when the target machine is the local machine, use the IP address of the local machine in the cluster instead of localhost or 127.0.0.1, unless its address in the cluster is localhost or 127.0.0.1.
[kunlun@kunlun-test12 ~]# ssh-copy-id -i .ssh/id_rsa.pub kunlun@192.168.0.125
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: ".ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
kunlun@192.168.0.125's password:

Number of key(s) added: 1
Now try logging into the machine, with:   "ssh 'kunlun@192.168.0.125'"
and check to make sure that only the key(s) you wanted were added.

Log in to the control machine, switch to the control user, pull the installation script from Gitee, and then download the software package.

cd $HOME
git clone https://gitee.com/zettadb/cloudnative.git
cd cloudnative/cluster
mkdir -p clustermgr
cd clustermgr

wget http://zettatech.tpddns.cn:14000/thirdparty/hadoop-3.3.1.tar.gz
wget http://zettatech.tpddns.cn:14000/thirdparty/jdk-8u131-linux-x64.tar.gz
wget http://zettatech.tpddns.cn:14000/thirdparty/mysql-connector-python-2.1.3.tar.gz
wget http://zettatech.tpddns.cn:14000/thirdparty/prometheus.tgz
wget http://zettatech.tpddns.cn:14000/thirdparty/haproxy-2.5.0-bin.tar.gz
wget http://zettatech.tpddns.cn:14000/thirdparty/efk/filebeat-7.10.1-linux-x86_64.tar.gz

# Here we are using the 1.1.2 release package.
VERSION=1.1.2
wget http://downloads.klustron.com/releases/$VERSION/release-binaries/kunlun-cluster-manager-$VERSION.tgz
wget http://downloads.klustron.com/releases/$VERSION/release-binaries/kunlun-node-manager-$VERSION.tgz
wget http://downloads.klustron.com/releases/$VERSION/release-binaries/Klustron-server-$VERSION.tgz
wget http://downloads.klustron.com/releases/$VERSION/release-binaries/Klustron-storage-$VERSION.tgz
wget http://downloads.klustron.com/releases/$VERSION/release-binaries/kunlun-proxysql-$VERSION.tgz

# If you need to use Elasticsearch/Kibana to collect and display node log information, you also need the following two packages:
wget http://zettatech.tpddns.cn:14000/thirdparty/efk/elasticsearch-7.10.1.tar.gz
wget http://zettatech.tpddns.cn:14000/thirdparty/efk/kibana-7.10.1.tar.gz

06 Installation Example

Here is an example of the installation of the simplest cluster management system using three machines (192.168.0.110/111/100) as working machines. The specific deployment is:

One metadata node and one cluster manager node are deployed on each machine, and RBR (FullSync, a Klustron proprietary technology) is used as the replication method for the metadata group.
Each machine has a node manager and a cluster manager deployed.
XPanel is deployed on the first machine (192.168.0.110).
ElasticSearch/Kibana is deployed on the second machine (192.168.0.111).
The third working machine (192.168.0.100) also serves as the control machine.
The working user and control user are both kunlun, so it is necessary to configure ssh login without password from the kunlun user on 192.168.0.100 to kunlun@192.168.0.110, kunlun@192.168.0.111, and kunlun@192.168.0.100.

6.1 Edit the Configuration File

Note: The detailed specification requirements of the configuration file will be explained later. Here, a structural specification is given.

The configuration contains the following parts:

Machine configuration: used to set the working directory and connection information on the working machine, and there should be corresponding entries for all working machines.
Relevant configurations of the core of the cluster management system, mainly including:
- meta: Used to set the metadata group, which serves as the data core of the cluster management system, storing all data in the management system and shared by all cluster instances in the system. At least three should be configured to ensure high availability. In this example, 3 are configured.
- cluster_manager: Configuration of the cluster manager. If there are at least three machines, it is generally recommended to configure at least three nodes to ensure high availability. Otherwise, one is generally configured. In this example, 3 are configured.
node_manager: Configuration of the node manager. A entry should be configured for all working machines involved in the cluster instance. In this example, there are 3 cluster instances installed on three machines, so 3 are configured.
xpanel: Configuration of XPanel. The final XPanel address is http://${xpanel.ip}😒{xpanel.port}/KunlunXPanel. Here, the xpanel.port uses the default value of 18080.
elasticsearch: Configuration of ElasticSearch/Kibana, where ElasticSearch uses the default port 9200 and Kibana uses the default port 5601.
Elements and attributes in the configuration file usually have default values. If the property or element is not set, the default value is generally used.

Based on the specific deployment instructions and configuration file specifications, with most elements using default values, the final configuration file content can be as follows (i.e. the cluster_and_node_mgr.json file):

{
    "machines":[
        {
            "ip":"192.168.0.110",
            "basedir":"/klustron",
            "user":"Klustron"
        },
        {
            "ip":"192.168.0.111",
	    "basedir":"/klustron",
            "user":"Klustron"
        },
        {
            "ip":"192.168.0.100",
            "basedir":"/klustron",
            "user":"Klustron"
        }
    ],
    "meta":{
	"ha_mode": "rbr",
        "nodes":[
 		{
 		"ip":"192.168.0.110"
                },
                {
 		"ip":"192.168.0.111"
                },
                {
 		"ip":"192.168.0.100"
                }
        ]
    },
    "cluster_manager": {
           "nodes": [
		{
 		"ip": "192.168.0.110"
		},
		{
 		"ip": "192.168.0.111"
		},
		{
 		"ip": "192.168.0.100"
		}
	    ]	
    },
    "node_manager": {
           "nodes": [
		{
 		"ip": "192.168.0.110"
		},
		{
 		"ip": "192.168.0.111"
		},
		{ 
 		"ip": "192.168.0.100"
		}
	    ]	
    },
    "xpanel": {
	"ip": "192.168.0.110",
	"image": "registry.cn-hangzhou.aliyuncs.com/kunlundb/kunlun-xpanel:1.1.2"
    },
    "elasticsearch": {
	"ip": "192.168.0.111"
    }
}

6.2 Perform Specific Actions

Note: It is generally recommended to use the same configuration file for actions that operate on the same group of objects.

Installation

cd $HOME/cloudnative/cluster
# Use code package version 1.1.2
python setup_cluster_manager.py --autostart --config=cluster_and_node_mgr.json --product_version=1.1.2 --action=install
bash -e clustermgr/install.sh

Operate the cluster using XPanel, i.e., open the following URL in a web browser (firefox/edge/chrome):

http://192.168.0.110:18080/KunlunXPanel

If an error occurs and after identifying and fixing the issue, a reinstallation is required, the cleaning command is:

cd $HOME/cloudnative/cluster
# Use code package version 1.1.2
python setup_cluster_manager.py --autostart --config=cluster_and_node_mgr.json --product_version=1.1.2 --action=clean
bash clustermgr/clean.sh

6.3 View Installation Error Log

When using the above json file for installation, if an installation error occurs, in addition to the log information displayed during the execution of the clustermgr/install.sh script, for cases where a node installation error occurs, it is usually necessary to view the log file of that node to know the specific reason for the installation failure.

For metadata nodes, the log file path is BASEDIR/storage_logdir/PORT/mysqld.err on the machine where the installation failed. Here, BASEDIR is the basedir attribute of the corresponding machine entry in machines, and PORT is the listening port configured for the node (the port attribute). For example, if the installation of the second metadata node fails, the corresponding log file path is /home/klustron/basedir/storage_logdir/6002/mysqld.err on the machine 192.168.0.111.
For cluster managers, the log file directory is BASEDIR/klustron-cluster-manager-$VERSION/log, where BASEDIR is as described above and $VERSION represents the product version number mentioned earlier when downloading the software package. This directory contains several log files, and the different file name prefixes indicate different types of logs.
For node managers, the log file directory is BASEDIR/klustron-node-manager-$VERSION/log, where BASEDIR and $VERSION have the same meaning as described above. This directory contains several log files, and the different file name prefixes indicate different types of logs.
Since Elasticsearch/Kibana is also installed in this example, all node log information can also be viewed through Kibana, i.e., through http://${elasticsearch.ip}😒{elasticsearch.kibana_port} to view various log information, where elasticsearch.kibana_port is the default port of 5601.

07 Additional Content

7.1 Command-line Parameters of setup_cluster_manager.py

Command-line Parameters of setup_cluster_manager.py

The current version of setup_cluster_manager.py supports the following parameters::

usage: setup_cluster_manager.py [-h] --config CONFIG
                                [--action {install,clean,start,stop,service,gen_cluster_config}]
                                [--defuser DEFUSER] [--defbase DEFBASE]
                                [--sudo] [--verbose]
                                [--product_version PRODUCT_VERSION]
                                [--localip LOCALIP] [--small] [--autostart]
                                [--setbashenv]
                                [--defbrpc_raft_port_clustermgr DEFBRPC_RAFT_PORT_CLUSTERMGR]
                                [--defbrpc_http_port_clustermgr DEFBRPC_HTTP_PORT_CLUSTERMGR]
                                [--defpromethes_port_start_clustermgr DEFPROMETHES_PORT_START_CLUSTERMGR]
                                [--defbrpc_http_port_nodemgr DEFBRPC_HTTP_PORT_NODEMGR]
                                [--deftcp_port_nodemgr DEFTCP_PORT_NODEMGR]
                                [--defstorage_portrange_nodemgr DEFSTORAGE_PORTRANGE_NODEMGR]
                                [--defserver_portrange_nodemgr DEFSERVER_PORTRANGE_NODEMGR]
                                [--defprometheus_port_start_nodemgr DEFPROMETHEUS_PORT_START_NODEMGR]
                                [--outfile OUTFILE]
                                [--cluster_name CLUSTER_NAME]

Specify the arguments.

optional arguments:
  -h, --help            show this help message and exit
  --config CONFIG       The config path
  --action {install,clean,start,stop,service}
                        The action
  --defuser DEFUSER     the default user
  --defbase DEFBASE     the default basedir
  --sudo                whether to use sudo
  --verbose             verbose mode, to show more information
  --product_version PRODUCT_VERSION
                        kunlun version
  --small               whether to use small template
  --autostart           whether to start the cluster automaticlly
  --setbashenv          whether to set the user bash env
  --defbrpc_raft_port_clustermgr DEFBRPC_RAFT_PORT_CLUSTERMGR
                        default brpc_raft_port for cluster_manager
  --defbrpc_http_port_clustermgr DEFBRPC_HTTP_PORT_CLUSTERMGR
                        default brpc_http_port for cluster_manager
  --defpromethes_port_start_clustermgr DEFPROMETHES_PORT_START_CLUSTERMGR
                        default prometheus starting port for cluster_manager
  --defbrpc_http_port_nodemgr DEFBRPC_HTTP_PORT_NODEMGR
                        default brpc_http_port for node_manager
  --deftcp_port_nodemgr DEFTCP_PORT_NODEMGR
                        default tcp_port for node_manager
  --defstorage_portrange_nodemgr DEFSTORAGE_PORTRANGE_NODEMGR
                        default port-range for storage nodes
  --defserver_portrange_nodemgr DEFSERVER_PORTRANGE_NODEMGR
                        default port-range for server nodes
  --defprometheus_port_start_nodemgr DEFPROMETHEUS_PORT_START_NODEMGR
                        default prometheus starting port for node_manager

Parameters are described below:
--config CONFIG: Path to the parameter file. Required.
--action {install,clean，start, stop}: Specify the action, currently supports install, clean, start, and stop. Required.
--defuser DEFUSER: Specify the default username for connecting to machines. If not specified, the default is klustron.
--defbase DEFBASE: Specify the default working directory. If the corresponding machine is not set, the default working directory is used. If not specified, the default is /klustron, and it must be an absolute path.
--sudo: Whether to use sudo to run necessary commands. If the user does not have permission to create the corresponding directory, sudo is required. The default is false.
--verbose: Whether to display more detailed information during execution. The default only displays periodic information and does not display every command executed.
--product_version PRODUCT_VERSION: Product version. If not specified, the default is 1.2.1.
--small: Whether to use template-small.cnf to install metadata nodes. The default is [false].
--autostart: Whether to allow auto-start. This option will add a configuration item to systemd, so that relevant services can automatically start after the machine starts up. This option enables the --sudo option. If the user specified on the remote machine does not have sudo privileges, it will fail. The default is [false].
--setbashenv: Whether to modify the .bashrc file in the user's $HOME directory to assist DBAs with manual operations. Using this option may affect the bash environment variables under the corresponding user on the working machine, and is generally not set. The default is [false].
--defbrpc_raft_port_clustermgr DEFBRPC_RAFT_PORT_CLUSTERMGR: Default brpc_raft_port for cluster_mgr nodes. The default is [58001].
--defbrpc_http_port_clustermgr DEFBRPC_HTTP_PORT_CLUSTERMGR: Default brpc_http_port for cluster_mgr nodes. The default is [58000].
--defpromethes_port_start_clustermgr DEFPROMETHES_PORT_START_CLUSTERMGR: Default prometheus port for cluster_mgr nodes. The default is [59010].
--defbrpc_http_port_nodemgr DEFBRPC_HTTP_PORT_NODEMGR: Default brpc_http_port for node_mgr nodes. The default is [58002].
--deftcp_port_nodemgr DEFTCP_PORT_NODEMGR: Default brpc_http_port for node_mgr nodes. The default is [58003].
--defstorage_portrange_nodemgr DEFSTORAGE_PORTRANGE_NODEMGR: Port range for storage nodes used for automatic port assignment. The default is [57000-58000].
--defserver_portrange_nodemgr DEFSERVER_PORTRANGE_NODEMGR: Port range for computing nodes used for automatic port assignment. The default is [47000-48000].
--defprometheus_port_start_nodemgr DEFPROMETHEUS_PORT_START_NODEMGR: Default prometheus port for node_mgr nodes. The default is [58010].

7.2 About Autostart

When passing the --autostart option to setup_cluster_manager.py, the relevant processes will be installed into the system as services with the following characteristics and requirements:

After a machine reboot, the relevant processes will be automatically started.
This option should be consistent throughout all steps:
- When install using the --autostart option, all other actions start/stop/clean should also use the --autostart option in order to work properly.
- Conversely, if install is not done with the --autostart option, none of the other actions should use the --autostart option.

When passing --setbashenv to setup_cluster_manager.py, environment variables such as PATH and LD_LIBRARY_PATH in the user's login environment will be set to include the paths of Java and Hadoop. This way, Hadoop will be in PATH when the user uses interactive bash. When using this option, the following should be noted:

This option will modify the environment variables of interactive bash. All users who log in to the machine with this account will be affected, especially if Java and Hadoop have already been set, which will be overwritten by this option.
This option should be consistent throughout all steps:
- When install using the --setbashenv option, clean should also use the --setbashenv option so that the environment variables set by install can be cleaned up.
- Conversely, if install is not done with the --setbashenv option, clean should also not use the --setbashenv option.
- Between a pair of install and clean actions on the same machine, there should not be any other install or clean actions from other installations, otherwise the environment variables set during this installation may be overwritten or cleared.

7.4 Configuration File Explanation

7.4.1 The configuration file is used to declare the objects that need to be operated, and it is a JSON object file with the following first-level attributes:

machines: an array that specifies the connection information and working directory of the working machines. This is a JSON array, and each member is used to identify a working machine. It is recommended to set up a member for each working machine. Each member has the following properties:

ip: a string that is required and represents the address of the working machine. It can be an IPv4 address, an IPv6 address, or a DNS address. This address will serve as the unique identifier of the working machine. If multiple addresses are used on the same machine (such as IPv4 and IPv6 addresses), they are considered to be multiple machines.
basedir: a string representing the working directory on the working machine. It must be an absolute path, and it will be used to store various types of node packages and various action support files. The default working directory is "/klustron", and it can be changed by passing the --defbase option to setup_cluster_manager.py.
sshport: an integer representing the listening port of the SSH server on the machine. The default value is 22.
user: specifies the username used by the control machine to connect to the working machine. This username must exist on the working machine. All subsequent actions, including installing various types of nodes, are executed using this user, so this user needs sufficient permissions. The permission requirements have the following characteristics and requirements:
- If the --sudo and --autostart options are not passed when running setup_cluster_manager.py, the execution action will not use sudo. Therefore, this user needs complete permissions for all directories in the configuration file. It is best for this user to be the owner of these directories. If a directory does not exist, the user is required to have the permission to create the directory.
- If the --sudo or --autostart option is passed when running setup_cluster_manager.py, sudo is used to execute the command if necessary. In this case, the root user is required to have the permission to create these directories. Generally, this should not be a problem as long as it is not a special location directory. This requires the user to have the permission to execute sudo without a password.
- The default value of this attribute is "klustron", which can be changed by passing the --defuser option to setup_cluster_manager.py.
- Since the computing nodes cannot be started with the root user, this property value cannot be "root".
If an address of a working machine appears at other locations in the configuration file but there is no corresponding entry in machines, the script will create an entry with default values for user and basedir using that address during runtime. Therefore, if the working directory and user on this machine are different from the default values, the corresponding entry must be set so that this script can work properly.

meta: an object that specifies the installation information of the metadata group, mainly with the following properties:

ha_mode: string type, specifies the replication mode between metadata nodes, it is recommended to explicitly set the mode rather than rely on the default value. In versions 1.0 and earlier, it is generally recommended to set it to 'mgr', and from version 1.1 onwards, it is generally recommended to set it to 'rbr'. The current default value is 'rbr'.
group_seeds: string type, specifies the address of the installed metadata group. This value will be generated when setup_cluster_manager.py executes the installation action for the first time, and will be used for subsequent operations on other batches of objects. When operating on metadata groups (such as installation), this property does not need to be set, and setting it will be ignored.
nodes: array type, specifies the detailed configuration of each metadata node in the metadata group. When operating on metadata groups (such as installation), it must be set, otherwise setting is prohibited. The property value is a Klustron-Storage object. See the Klustron-Storage object description later for details.
When operating on the CMS core, the nodes property must be set. Otherwise, the group_seeds property must be set.

node_manager: an object that specifies the relevant information of the node manager, the script will install and start the node manager process on the specified working machine according to this information, and appropriately set it so that these working machines can be managed by the cluster management system. This property is a json object, which currently has only one attribute:

nodes: an array, where each member represents the node manager information corresponding to a working machine. The current node manager has the following attributes:
- ip: a string type used to represent the working machine's address of the node manager, which is required.
- brpc_http_port: an integer type that specifies the http port that the node manager listens on, with a default value of 58002. This value can be changed by passing the --defbrpc_http_port_nodemgr option to the setup_cluster_manager.py script. Generally, this value does not need to be set as long as the default port is not occupied on the working machine and is allowed by the firewall. Otherwise, this value needs to be set.
- tcp_port: an integer type that specifies the tcp port that the node manager listens on, with a default value of 58003. This value can be changed by passing the --deftcp_port_nodemgr option to the setup_cluster_manager.py script. Generally, this value does not need to be set as long as the default port is not occupied on the working machine and is allowed by the firewall. Otherwise, this value needs to be set.
- storage_datadirs: a string type that is a comma-separated list of directories, where each directory must be an absolute path. This value is used to select the data directory of the storage node when the node manager installs the storage node. The default value is BASEDIR/storage_datadir, where BASEDIR is the working machine's default working directory.
- storage_logdirs: a string type that is a comma-separated list of directories, where each directory must be an absolute path. This value is used to select the error log directory of the storage node when the node manager installs the storage node. The default value is BASEDIR/storage_logdir, where BASEDIR is the working machine's default working directory.
- storage_waldirs: a string type that is a comma-separated list of directories, where each directory must be an absolute path. This value is used to select the redo log directory of the storage node when the node manager installs the storage node. The default value is BASEDIR/storage_waldir, where BASEDIR is the working machine's default working directory.
- server_datadirs: a string type that is a comma-separated list of directories, where each directory must be an absolute path. This value is used to select the data directory of the computing node when the node manager installs the computing node. The default value is BASEDIR/server_datadir, where BASEDIR is the working machine's default working directory.
- For the four directories storage_datadirs, storage_logdirs, storage_waldirs, and server_datadirs, if multiple disk devices are required to improve the IO performance in a production environment or performance and stress testing environment, these four attributes usually need to be set to different disks. However, this is generally not necessary for non-production environments and default values can be used.
- storage_portrange: a string type that represents the port range used for storage nodes. When the metadata node does not specify a port, the script selects a port within this range. Similarly, cluster_manager selects an available port for storage nodes within this range. The default value is "57000-58000", which can be changed by passing the --defstorage_portrange_nodemgr option.
- server_portrange: a string type that represents the port range used for computing nodes. Cluster_manager selects a port for computing nodes within this range. The default value is "47000-48000", which can be changed by passing the --defserver_portrange_nodemgr option.
- total_cpu_cores: an integer type that specifies the number of CPUs on the machine. The default value is 8 and should be adjusted according to actual needs.
- total_mem: an integer type that specifies the amount of memory in the machine, measured in MB. The default value is 16384 and should be adjusted according to actual needs.
- nodetype: a string type that specifies the intended use of the machine. There are four available options: 'none' indicates that the machine will not deploy any nodes, 'storage' indicates that the machine will be used for deploying storage nodes, 'server' indicates that the machine will be used for deploying computing nodes, and 'both' indicates that the machine will be used for deploying both storage and computing nodes. This value must be set correctly, otherwise installation of the cluster through the cluster_manager API may fail. The default value is 'both'.
- prometheus_port_start: an integer type that specifies the Prometheus port of node_mgr. The default value is 58010 and can be modified using the --defprometheus_port_start_nodemgr parameter.

cluster_manager: an object that specifies the information related to the cluster manager. This object has only one attribute:

nodes: array type, specifies the nodes of the cluster manager group, each array member represents a cluster manager node object, and each cluster manager node has the following attributes:
- ip: a string type used to represent the working machine address of the cluster manager, and must be set.
- brpc_http_port: an integer type that specifies the http port that the cluster manager listens on, with a default value of 58000, which can be changed by passing --defbrpc_http_port_clustermgr to setup_cluster_manager.py. Normally, as long as the default port on the working machine is not occupied and allowed by the firewall, there is generally no need to set it. Otherwise, this value needs to be set.
- brpc_raft_port: an integer type that specifies the raft port that the cluster manager listens on, with a default value of 58001, which can be changed by passing --defbrpc_raft_port_clustermgr to setup_cluster_manager.py. Normally, as long as the default port on the working machine is not occupied and allowed by the firewall, there is generally no need to set it. Otherwise, this value needs to be set.
- prometheus_port_start: an integer type used to specify the Prometheus port of the cluster manager, with a default value of 59010, which can be changed by passing --defpromethes_port_start_clustermgr.

backup: an object that specifies some backup-related information, which when correctly set, enables automatic backup of cluster instances. Currently, two backup methods are supported: ssh and hdfs.

hdfs: an object type, specifies information for using Hadoop File System (HDFS) as the backup medium.
- ip: a string type that specifies the address of the HDFS server.
- port: an integer type that specifies the connection port of the HDFS server. A service must be listening to this port on the machine specified by the ip property.
- Currently, HDFS federation server address format is not supported.
ssh: an object type that specifies information for using ssh/scp to perform replication backup, with the following primary properties:
- ip: a string type, specifies the IP address of the target machine for ssh backup. This property is required.
- port: an integer type, specifies the listening port of the ssh server on the target machine. The default value is 22.
- user: the username for connecting to the target machine. It is required that the specified user on all working machines of the storage node has the configuration for using this username to connect to the target machine without a password.
- targetDir: the backup folder on the target machine. The user must have read and write permissions for this directory. If the directory does not exist, the user must have the permission to create it.

xpanel: an object that specifies information for installing and starting xpanel. The script will start xpanel on the specified machine based on this configuration so that users can operate the cluster through a web browser. The current properties are:

ip: ip: a string that specifies the address of the machine where xpanel is installed
port: an integer that specifies the port number used by xpanel on the machine. The default value is 18080.
image: a URL for the image, which is required. It can be a local or remote image URL, but a remote image URL is typically used. The remote image URL corresponding to version 1.1.2 is:

registry.cn-hangzhou.aliyuncs.com/kunlundb/kunlun-xpanel:1.1.2

imageType: a string that specifies the type of image. Currently, "url" and "file" are supported, and the default value is "url". When the type is "url", the image must be accessible. When the type is "file", the script loads the image from the specified file (see below), but the user must download the image file to the clustermgr directory in advance.
imageFile: a string that specifies the image file name. It takes effect when imageType is "file", and the default value is "kunlun-xpanel-$VERSION.tar.gz", where $VERSION is the version specified by the --product_version parameter. After this file is loaded using docker load, it must produce the image URL specified in the image. The image file generated by Klustron meets this requirement.
The final XPanel address is http://${xpanel.ip}😒{xpanel.port}/KunlunXPanel. For example, if the IP is 192.168.0.110 and the port is 18000, the address is http://192.168.0.110:18000/KunlunXPanel.

elasticsearch: an object that specifies installation information for Elasticsearch/Kibana. The script will start Elasticsearch and Kibana on the specified machine based on this information (currently Elasticsearch and Kibana are deployed on the same machine), so that users can view the log information of various nodes through Kibana. The current properties are:

ip: a string that specifies the IP address of the machine where Elasticsearch/Kibana is installed
port: an integer that specifies the listening port of Elasticsearch. The default value is 9200.
kibana_port: an integer that specifies the listening port of Kibana. The default value is 5601.
The logs can be viewed through Kibana, and the address is http://${elasticsearch.ip}😒{elasticsearch.kibana_port}, for example, if the IP is 192.168.0.111 and the kibana_port uses the default value, the address is http://192.168.0.111:5601.

Regarding the first-level attributes and subsequent sub-level attributes, the corresponding objects only need to be set if they need to be operated on or if the object information is required to assist in operating other objects and the object has attribute values that differ from the defaults. Otherwise, they do not need to be set, and the script will use default values to create necessary auxiliary objects to complete the task at runtime. For example:

The core of the cluster management system must be executed in the first installation action, so meta.nodes and cluster_manager.nodes must be correctly set. For subsequent cluster instance installation actions, only meta.group_seeds needs to be set, and the cluster_manager top-level attribute is not needed. For meta.ha_mode, it only needs to be set for the first installation.

7.4.2 Klustron-Storage object is used in the configuration file, and a detailed explanation will be provided here

Klustron-Storage object represents an instance installed and running using the Klustron-storage-$VERSION.tgz software package, developed based on MySQL-8.0.26. This object has the following properties:
is_primary: a Boolean value indicating whether the Klustron-Storage instance is the initial master node in the replication group during installation. The default value is false. There can only be one master node in the group, and when there is no master node, the first node becomes the master node.
ip: a string specifying the IP address of the working machine where the Klustron-Storage instance is located.
port: an integer specifying the port on which the Klustron-Storage instance is listening for connections from computing nodes. If not specified, the script automatically selects one within the specified range.
xport: an integer specifying the port used by the Klustron-Storage instance for cloning. This property is only required in replication mode 'mgr', and if not specified, the script automatically selects one within the specified range.
mgr_port: an integer specifying the port used by the Klustron-Storage instance for XCom communication. This property is only required in replication mode 'mgr', and if not specified, the script automatically selects one within the specified range.
innodb_buffer_pool_size: a string specifying the size of the Innodb buffer pool for the Klustron-Storage instance. For metadata nodes, the default value is 128MB, which may be too small for production environments. It is generally recommended to set this value according to your needs. For more information, see Configuring InnoDB Buffer Pool Size.
election_weight: an integer value between 1-100, with a default of 50, used for fault-tolerant leader election in mgr replication mode. This property is only effective in replication mode 'mgr', and is ignored in other replication modes.

7.5 An Example of Installing Cluster Management System in Steps

The machines for the cluster management system do not need to be determined all at once, and the system can be installed in parts (including the core) and gradually add new machines to the system. Here is an example of installing the cluster management system in steps.

Note that the --autostart option is passed to setup_cluster_manager.py to make all related processes have the function of being automatically started after restarting. Also, because version 1.1.2 is used here, the --product_version=1.1.2 option is passed in all steps.

This example mainly includes the following steps:

Install the CMS core and set up several initial working machines.
Add several machines to the CMS to expand the resource pool.
Continue to add several machines to the CMS to expand the resource pool.
Some machines are used for other purposes and are removed from the resource pool.
Remove the entire system.

7.5.1 Install the CMS core and set up several initial working machines.

Step 1: Install the core of the cluster management system on three machines (192.168.0.2/3/4), including three metadata nodes and three cluster manager nodes. Also set up three machines (192.168.0.5/6/7) as backup machines for the deployment of data and computing nodes. Assume that each machine has default 8 cores and 16GB memory.

Each of the first three machines has a metadata node that uses port 6001 as the listening port. The rest of the ports are chosen by the script itself, and the metadata group uses rbr (i.e., FullSync) as the replication mode. All other settings use default values.
Since each of the first three machines has a metadata node, each machine also needs a node manager, and the corresponding node manager uses the default 5800X port. All directories use the default settings.
Each of the first three machines also has a cluster manager node, which uses the default 5800X port series.
The other three machines need to be set up, so each one needs a node manager. Here, it is assumed that the directories and ports use default settings.
Backup information is not set up, and no cluster instances are installed at this step.
XPanel is installed on the first machine, using the remote image: registry.cn-hangzhou.aliyuncs.com/kunlundb/kunlun-xpanel:1.1.2. Assume that the port 18080 cannot be used, so port 18000 is used instead.
Elasticsearch/Kibana is installed on the second machine, with Elasticsearch using the default listening port 9200 and Kibana using the default listening port 5601.
All machines have the necessary software installed, the username "Klustron" has been created, and the default working directory "/klustron" is used. Therefore, there is no need for any entries in the "machines" section. However, there are two scenarios regarding the "/klustron" directory:
- a. The "/klustron" directory has been manually created by the user and the "Klustron" user has sufficient permissions to operate on this directory, so there is no need to pass any special permission options to setup_cluster_manager.py.
- b. The "/klustron" directory has not been created by the user, so the script will attempt to create it. In this case, it is necessary to pass the "--sudo" option to setup_cluster_manager.py, and the "Klustron" user on these machines has the permission to execute sudo without a password.

According to the above instructions, the resulting configuration file (assumed to be named system.json) is:

{
    "meta":{
        "ha_mode": "rbr",
        "nodes":[
 		{
 		"ip":"192.168.0.2",
 		"port":6001
                },
                {
 		"ip":"192.168.0.3",
 		"port":6001
                },
                {
 		"ip":"192.168.0.4",
 		"port":6001
                }
        ]
    },
    "cluster_manager": {
           "nodes": [
		{
 		"ip": "192.168.0.2"
		},
		{ 
 		"ip": "192.168.0.3"
		},
		{ 
 		"ip": "192.168.0.4"
		}
	    ]	
    },
    "xpanel": {
	"ip": "192.168.0.2",
        "port": 18000,
	"image": "registry.cn-hangzhou.aliyuncs.com/kunlundb/kunlun-xpanel:1.1.2"
    },
    "elasticsearch": {
        "ip": "192.168.0.3"
    },
    "node_manager": {
           "nodes": [
		{
 		"ip": "192.168.0.2"
		},
		{ 
 		"ip": "192.168.0.3"
		},
		{ 
 		"ip": "192.168.0.4"
		},		
		{
 		"ip": "192.168.0.5"
		},
		{ 
 		"ip": "192.168.0.6"
		},
		{ 
 		"ip": "192.168.0.7"
		}
	    ]	
    }
}

If the /klustron directory has been created on all three working machines and the klustron user has sufficient permissions, the installation command is:

cd cloudnative/cluster
python2 setup_cluster_manager.py --autostart --config=core.json --action=install --product_version=1.1.2
bash -e clustermgr/install.sh

Otherwise, if you want the script to create the directory, you need to pass the --sudo option and ensure that the Klustron user on these machines has sudo access without a password. The installation command is:

cd cloudnative/cluster
python2 setup_cluster_manager.py --autostart --config=core.json --action=install --sudo --product_version=1.1.2
bash -e clustermgr/install.sh

The setup_cluster_manager.py script will print out the metaseeds after installing the core, which need to be recorded for later installation of node managers. The printed content is:

metaseeds:192.168.0.2:6001,192.168.0.3:6001,192.168.0.4:6001

After the installation is complete, you can use XPanel to send commands to the cluster instances through the cluster managers, including but not limited to creation, deletion, modification, backup, and restoration. The XPanel address is: http://192.168.0.102:18000/KunlunXPanel.

7.5.2 Add several machines to the CMS to expand the resource pool.

Assuming a large business needs to be added to the CMS to meet the data storage and processing needs, six new machines (192.168.0.108-113) are added. Therefore, these six machines need to be configured so that the CMS can deploy cluster instance data nodes and computing nodes on these machines. Assume that each of these machines has 16 cores and 64GB of memory.

These six machines are new and need to be added to the resource pool. Each machine needs to deploy a node manager. Suppose each machine places data on a separate disk to improve I/O efficiency (assuming that the mount path is /data1), so set storage_datadir to /data1/datadir. The error log, redo log, and computing node logs are placed on the default path due to small volume.
Assuming that these machine users have already created the /klustron directory as the working directory and established the Klustron account. Since it is the same as the default value, no separate entry is required in machines.
The metadata nodes have already been installed, so only their address information is needed, which is the metaseeds displayed earlier. The cluster manager has been installed in the first step, so it is no longer needed.
Therefore, the final configuration file (add1.json) is:

{
    "meta":{
	"ha_mode": "rbr",
	"group_seeds": "192.168.0.2:6001,192.168.0.3:6001,192.168.0.4:6001"
    },
    "node_manager": {
           "nodes": [
		{ 
 		"ip": "192.168.0.8",
		"total_cpu_cores": 16,
		"total_mem": 65536,
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.9",
		"total_cpu_cores": 16,
		"total_mem": 65536,
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.10",
		"total_cpu_cores": 16,
		"total_mem": 65536,
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.11",
		"total_cpu_cores": 16,
		"total_mem": 65536,
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.12",
		"total_cpu_cores": 16,
		"total_mem": 65536,
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.13",
		"total_cpu_cores": 16,
		"total_mem": 65536,
		"storage_datadirs": "/data1/datadir"
		}
	    ]	
    }
}

The install command is:

cd cloudnative/cluster
python2 setup_cluster_manager.py --autostart --config=add1.json --action=install --product_version=1.1.2
bash -e clustermgr/install.sh

After successful installation, these 6 machines are set up and XPanel can be used to send commands through the cluster manager to operate cluster instances.

7.5.3 Continue to add several machines to the CMS to expand the resource pool.

Assuming that the next step is to add a small business and use three machines (192.168.0.14/15/16) that are already running other businesses (not belonging to this system) to deploy the cluster instances for this small business. This script adds these machines to the cluster management system, and the creation of cluster instances is left to the cluster manager.

Since these three machines did not belong to the management system before, a node manager needs to be installed on each machine. Assuming that these machines have the default configuration of 8 cores and 16GB memory.
The configuration of the metadata group and cluster manager is the same as before.
Since these three machines were originally running other businesses and some of them may use the Klustron account, and there may be other code packages and data for other businesses in the /klustron directory. To ensure the stable operation of the database and prevent it from being affected, it is generally recommended to use an independent account and an independent directory. Therefore, the default user and working directory do not work. Assuming that the default user is changed to dbuser (already created), and the default working directory is changed to /Klustron (already created). There are two ways to do this:
- Create an entry for each machine in machines and specify the username and working directory in the entry.
- Do not create an entry for each machine, but pass the --defuser=dbuser and --defbase=/Klustron options to setup_cluster_manager.py.
Similarly, assuming that the range of ports 58000-58010 is occupied on these three machines, the two default ports of the node manager cannot be used. So the brpc_http_port is changed to 35000, and the tcp_port is changed to 35001. There are two ways to do this:
- Set the brpc_http_port and tcp_port attributes for each node manager.
- Do not set these two attributes, but pass the --defbrpc_http_port_nodemgr=35000 and --deftcp_port_nodemgr=35001 options to setup_cluster_manager.py.
From the surface, passing options is simpler. However, for multiple machines with their own customized attributes, it is more appropriate to set the entries accurately. This example chooses to set the entries accurately. So the final configuration file (add2.json) is:

{
    "machines":[
        {
            "ip":"192.168.0.14",
            "basedir":"/Klustron",
            "user":"dbuser"
        },
        {
            "ip":"192.168.0.15",
	    "basedir":"/Klustron",
            "user":"dbuser"
        },
        {
            "ip":"192.168.0.16",
            "basedir":"/Klustron",
            "user":"dbuser"
        }
    ],
    "meta":{
	"ha_mode": "rbr",
	"group_seeds": "192.168.0.2:6001,192.168.0.3:6001,192.168.0.4:6001"
    },
    "node_manager": {
           "nodes": [
		{
 		"ip": "192.168.0.14",
		"brpc_http_port": 35000,
		"tcp_port": 35001
		},		
		{
 		"ip": "192.168.0.15",
		"brpc_http_port": 35000,
		"tcp_port": 35001
		},
		{
 		"ip": "192.168.0.16",
		"brpc_http_port": 35000,
		"tcp_port": 35001
		}
	    ]	
    }
}

The install command is:

cd cloudnative/cluster
python2 setup_cluster_manager.py --autostart --config=add2.json --action=install --product_version=1.1.2
bash -e clustermgr/install.sh

After the installation is successful, the 3 machines are ready for use, and requests can be sent to the cluster manager to install cluster instances on these machines.

7.5.4 Some machines are used for other purposes and are removed from the resource pool.

Assuming that the small business has stopped running and the three newly added machines will be used for other purposes after backing up necessary data, it is necessary to remove them from the management system by running a clean action to clean up necessary data and programs/processes.

The configuration file is still add2.json.
The clean command is:

cd cloudnative/cluster
python2 setup_cluster_manager.py --autostart --config=add2.json --action=clean --product_version=1.1.2
bash clustermgr/clean.sh

7.5.5 Remove the entire system.

Assuming that all cluster instances in the cluster management system have completed their missions, it is necessary to delete the entire system.

It is necessary to integrate all configuration files into a new configuration file, which includes all metadata nodes, cluster managers and node managers. It should be noted that the three machines added in the second step were already removed in the previous step, so no additional actions are required here.
The resulting configuration file (clean.json) is:

{
    "meta":{
        "ha_mode": "rbr",
        "nodes":[
 		{
 		"ip":"192.168.0.2",
 		"port":6001
                },
                {
 		"ip":"192.168.0.3",
 		"port":6001
                },
                {
 		"ip":"192.168.0.4",
 		"port":6001
                }
        ]
    },
    "cluster_manager": {
           "nodes": [
		{
 		"ip": "192.168.0.2"
		},
		{ 
 		"ip": "192.168.0.3"
		},
		{ 
 		"ip": "192.168.0.4"
		}
	    ]	
    },
    "xpanel": {
	"ip": "192.168.0.2",
        "port": 18000,
	"image": "registry.cn-hangzhou.aliyuncs.com/kunlundb/kunlun-xpanel:1.1.2"
    },
    "elasticsearch": {
        "ip": "192.168.0.3"
    },
    "node_manager": {
           "nodes": [
		{
 		"ip": "192.168.0.2"
		},
		{ 
 		"ip": "192.168.0.3"
		},
		{ 
 		"ip": "192.168.0.4"
		},		
		{
 		"ip": "192.168.0.5"
		},
		{ 
 		"ip": "192.168.0.6"
		},
		{ 
 		"ip": "192.168.0.7"
		},
		{ 
 		"ip": "192.168.0.8",
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.9",
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.10",
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.11",
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.12",
		"storage_datadirs": "/data1/datadir"
		},
		{ 
 		"ip": "192.168.0.13",
		"storage_datadirs": "/data1/datadir"
		}
	    ]	
    }
}

The clean command is:

cd cloudnative/cluster
python2 setup_cluster_manager.py --autostart --config=clean.json --action=clean --product_version=1.1.2
bash clustermgr/clean.sh

Klustron Database Server Cluster Initialization Guide

# Klustron Database Server Cluster Initialization Guide

# 01 Basic Process for Installing Klustron Cluster

# 02 Basic Concepts

# 03 Function Description

# 04 Server Suggested Configuration

# 4.1 Development and Test Environment

# 4.2 Production Environment

# 05 Preparation

# 06 Installation Example

# 6.1 Edit the Configuration File

# 6.2 Perform Specific Actions

# 6.3 View Installation Error Log

# 07 Additional Content

# 7.1 Command-line Parameters of setup_cluster_manager.py

# 7.2 About Autostart

# 7.3 About Setting User Login Environment Variables

# 7.4 Configuration File Explanation

# 7.4.1 The configuration file is used to declare the objects that need to be operated, and it is a JSON object file with the following first-level attributes:

# 7.4.2 Klustron-Storage object is used in the configuration file, and a detailed explanation will be provided here

# 7.5 An Example of Installing Cluster Management System in Steps

# 7.5.1 Install the CMS core and set up several initial working machines.

# 7.5.2 Add several machines to the CMS to expand the resource pool.

# 7.5.3 Continue to add several machines to the CMS to expand the resource pool.

# 7.5.4 Some machines are used for other purposes and are removed from the resource pool.

# 7.5.5 Remove the entire system.

# END

Klustron Database Server Cluster Initialization Guide

01 Basic Process for Installing Klustron Cluster

02 Basic Concepts

03 Function Description

04 Server Suggested Configuration

4.1 Development and Test Environment

4.2 Production Environment

05 Preparation

06 Installation Example

6.1 Edit the Configuration File

6.2 Perform Specific Actions

6.3 View Installation Error Log

07 Additional Content

7.1 Command-line Parameters of setup_cluster_manager.py

7.2 About Autostart

7.3 About Setting User Login Environment Variables

7.4 Configuration File Explanation

7.4.1 The configuration file is used to declare the objects that need to be operated, and it is a JSON object file with the following first-level attributes:

7.4.2 Klustron-Storage object is used in the configuration file, and a detailed explanation will be provided here

7.5 An Example of Installing Cluster Management System in Steps

7.5.1 Install the CMS core and set up several initial working machines.

7.5.2 Add several machines to the CMS to expand the resource pool.

7.5.3 Continue to add several machines to the CMS to expand the resource pool.

7.5.4 Some machines are used for other purposes and are removed from the resource pool.

7.5.5 Remove the entire system.

END