Skip to main content

Klustron Cluster Resource Isolation Functionality

KlustronAbout 2 min

Klustron Cluster Resource Isolation Functionality

In a multi-tenant environment, effective resource isolation is an important guarantee for the availability of user instances. This article will introduce the Klustron resource isolation mechanism and usage.

1. Working Principle

Resource isolation ensures that instances of different users work only within the preset resource pool to achieve the goal of no interference between instances of different users. The system resources referred to here specifically include CPU, IO, and the use of physical disks.

In version Klustron-1.1.1, CPU resource isolation is implemented. IO and physical disk space isolation will be implemented in subsequent versions. The principle of CPU resource isolation is briefly introduced below.

1.1 Cgroup Mechanism

Linux provides the cgroup mechanism to constrain the resource utilization of processes. "cgroup" stands for "control group", and each group contains a set of processes and all their descendant processes. All resource isolation in the computer is based on control groups.

Different control groups have an inheritance relationship, where the resources of a child group are inherited from its parent group and can be further allocated based on the resources of the parent group.

Linux provides a file system interface (/sys/fs/cgroup/**) for setting cgroup-related facilities, and Klustron also implements instance-level resource isolation based on such interfaces. Among them, cluster_mgr is responsible for receiving user resource isolation-related requests and issuing them to the corresponding physical device node_mgr, which calls the cgroup2kunlun tool to issue isolation parameters.

The key parameters of CPU resource isolation are introduced below.

cpu.cfs_period_us

Specifies the CPU resource reset cycle, in microseconds. Specify a time cycle. The CPU usage quota for the related cgroup will be redistributed at the beginning of each new period.

For example, if we want to set up a cgroup that has access to a single CPU for 0.2 seconds per second, we would configure the relevant parameters as follows:

cpu.cfs_quota_us=200000 cpu.cfs_period_us=1000000

The range of cfs_quota_us is 1000 us to 1 s.

cpu.cfs_quota_us (quota mode)

Specifies the time quota that all tasks in a cgroup can actually use CPU time within a continuous CPU cycle.

Once the task's time quota is exhausted within the current CPU time cycle, the task will not be allocated CPU resources again in the remaining time of the current cycle until the start of the next new time cycle.

cpu.shares (share mode)

The value of this indicator is identified by an integer value, which represents the proportion of CPU time that the current cgroup can obtain relative to the entire machine. For example, if there are two cgroups, and their respective cpu.shares values are 100 and 200, then when tasks in both cgroups are busy with CPU, the CPU share that each cgroup can occupy on the entire system is 30% and 60%, respectively.

There are two points to be clarified here:

  1. When only one cgroup is busy and other cgroups are idle, the current busy cgroup can consume the entire device.

  2. As new cgroups with share values are added, the resource allocation proportion of existing cgroups may be diluted. Therefore, in practical usage, it is advisable to set an upper limit on the number of cgroups that can be created in the resource pool, to prevent unlimited growth that can affect resource allocation of existing cgroups.

Multi-threading

  • The process ID needs to be written to the cgroup.procs file so that the operating system can track the creation of threads under the process and control resource utilization.
  • When deleting a cgroup control group, the contents of the existing cgroup.procs need to be written to the global cgroup.procs file. If this is not done, the operating system will report "devices are busy", which means that all surviving processes need to work under one cgroup, without any omissions.

In Klustron, there are two ways to set CPU resource isolation for instances.

The first option is to specify the initial CPU resource configuration when purchasing an instance. If not specified, the quota mode configuration of 4 cores is used by default.

The second option is to dynamically adjust resources during the instance runtime. This can be done through the API or manually through XPanel. For detailed instructions, please refer to the Klustron API Documentation and Xpanel User Manual.

END