On April 6, 2022, Koordinator, Alibaba's cloud-native hybrid deployment technology, declared its official open source. As per the introduction, Koordinator has been utilized in Alibaba's large-scale applications for many years and played a vital role in the November 11 Online Shopping Festival in 2021, which reduced nearly 50 percent of computing expenses.
In this article, we invited three Alibaba technology professionals—Yi Chuan, Zeng Fansong, and Yang Guodong—to introduce the technical foundation, implementation experience, and ideas for the future evolution of the hybrid-deployment technology from Project Koodinator to dive deep into this large-scale practical project, learn about its development history, technological concepts, and open-source plan. Meanwhile, we will also discuss how Alibaba's internal hybrid technology addresses resource problems in extreme circumstances, such as the November 11 online shopping festival.
The concept of hybrid deployment
In its simplest terms, a hybrid deployment can be understood from two perspectives: from the node's perspective, it refers to deploying various containers (online containers, offline containers, and hybrid online/offline containers) or applications on top of one node. However, we are not just talking about offline containers in the traditional sense. A "node" is the minor component capable of running containers. It could be a physical computer or a single ECS.
Meanwhile, as per the definition of cluster granularity, a hybrid deployment is when multiple applications are automatically deployed within a cluster. Utilizing predictive analysis of the application characteristics allows for the actual load between businesses to be balanced. In a nutshell, the objective of hybrid deployment is to complete all activities using fewer resources to reach the ultimate goal of lowering costs while simultaneously improving efficiency.
Based on the definition given, there are two primary forms of hybrid deployment:
1. Node granularity
Container isolation technology: including kernel-dependent kangaroo, RunC, etc.
Single-host scheduling technology: including single-host load-awareness, container policy and threshold settings, container priority and scaling control, cpushare, etc.
Central scheduling technology: support for single-host granular load feedback, scheduling policies, scheduling performance, etc.
Differentiated SLO technology of resources: differentiated SLO settings, priority settings, single-host, and central scheduling cooperation based on differentiated SLO.
2. Cluster granularity: implementation based on node granularity
Central scheduling technologies: high-performance scheduling, resource view, scheduling collaboration of multiple loads, GPU topology awareness, etc.
Single-host scheduling technology: high-frequency start/stop control of tasks, single-host multi-environment, hardware adaptation, CPU normalization, etc.
Optimization and support for k8s ecosystem frameworks: cluster scale breakthrough, stability improvement, operation supporting capacity, interface capacity, business connection, etc.
As one of the industry's hybrid deployment pioneers, Alibaba began to explore container technology in 2011 and launched the research and development of hybrid deployment in 2016. This technology has gone through several rounds of architectural iterations. It has finally evolved to today's cloud-native hybrid-deployment architecture, directly adopting K8S-based cloud-native unified scheduling to achieve multi-task and multi-load collaboration awareness and the effect of the whole hybrid-deployment. Furthermore, this is also the most proven, outstanding architecture and best engineering implementation so far.
Many companies in this sector are looking at hybrid deployments as they are anxious to begin reaping the benefits as soon as possible. The first option is to use a public cloud or a commercial provider that supports hybrid deployment and elastic capabilities by default. The second solution, which is more recommended, involves open-sourcing and co-building, which achieves a similar result as team self-building while also providing access to the open-source community's experience accumulated across all layers of the technology stack.
Koordinator: Alibaba's cloud-native hybrid-deployment system
Formally open-sourced on April 6, 2022, Koordinator is a hybrid deployment system built by Alibaba, of which the technological underpinning and ideas are from the company's in-house practical expertise accumulated over several years (https://koordinator.sh). The Koordinator and the idea behind the design of the entire system are developed. Through the use of open source, Alibaba intends to advance the standardization of the whole hybrid-deployment ecosystem in the hopes that more users will be able to use it and profit from it.
At the moment, three components make up the Koordinator:
Differentiated service level objective (SLO): abstract a set of QoS-oriented resource scheduling mechanisms on top of Kubernetes, while dividing different QoSes within the priorities and guaranteeing the resource characteristics of each priority and QoS.
Resource fine-grained scheduling: based on Alibaba's best practices, including CPU topology awareness, performance-optimized scheduling, resource reservation, interactive preemption, defragmentation, resource preview, GPU-shared scheduling, and computing power normalization.
Task scheduling: Schedule big data and AI-related tasks, such as Gang, batch, priority preemption, and elastic Quota (inter-queue borrowing), to better use the whole cluster resources.
In terms of the overall architecture, the technologies involved in the Koordinator can be classified into three distinct modules. At the very bottom is the Koordlet, which is in charge of specific technical capabilities that correspond to a single host. These capabilities include feature awareness, single-host level interference monitoring, quality of service management, and single-host resource isolation.
Koordinator components would extend specific runtime management duties on single-host systems to facilitate adaptation to various runtimes while avoiding disruptive adjustments to the components under the surface.
The two primary functions are the Koordinator Scheduler, which contains scheduling and descheduling features. The Koordinator Scheduler is comprised of scheduling and descheduling features. Although, in general, the probability of doing scheduling is much higher than that of doing descheduling, the latter is still an essential part of the hybrid deployment to ensure the high operational quality of the entire cluster's resources. This is because the probability of doing scheduling is much higher than that of doing descheduling.
The second component is called Koordinator Manager, and its primary function is to manage the policies governing hybrid deployments. These policies govern how to gain access to the workload, among other things. In addition, Koordinator Manager also integrates a resource profiling module to better support the scheduler and achieve better resource dispersion to avoid localized hotspots on machines that can lead to service impairment.
Above, we have provided an overview of Koordinator's general architecture. The primary objective of these modules is to contribute to the resolution of the two primary issues associated with hybrid deployment, which are as follows:
1) How to access the workload of hybrid deployment so that all different kinds of tasks can access the framework of Koordinator at the lowest cost;
2) How to make all different kinds of tasks run well in a hybrid deployment so that computational tasks can get the required computing power and online tasks can run without delay.
Koordinator is a more developed hybrid-deployment system, and one of its features is called "double zero-intrusion."
First, Koordinator is zero-intrusive to the application workload management. Koordinator invests a lot of effort into helping users open up the hybrid-deployment links of typical computational loads, which means users only need to do simple configurations. Koordinator will automatically apply the yamls to transform the computational tasks into hybrid-deployment tasks. This can help users implement hybrid deployment scenarios into on-premises scenarios faster.
The second zero-intrusion is that Koordinator is not intrusive to Kubernetes. Users can gain similar capabilities by installing open source components into their Kubernetes clusters, provided that the version of Kubernetes being used is one that Koordinator supports. Kubernetes scales well on the central side but not as well on the node side. As a result, there is a Hook Manager between the Kubelet and the underlying container pool to support policy scaling. This allows users to avoid making intrusive changes to the Kubelet and the underlying Containerd or Docker.
The resource model is the component of the hybrid deployment that bears the highest weight of importance. The core concept behind hybrid deployment is the creation of varied capabilities, with the end goal of increasing resource overload to achieve a higher utilization ratio. This concept may be an inverted logic chain that runs from output to input. The resource model that Koordinator established is quite comprehensive and can provide support for online services, real-time computing, artificial intelligence training tasks, batch computing jobs, and even specific testing tasks.
In addition, Koordinator offers resource isolation strategies in several different dimensions. These strategies include CPU cquset, some isolation capabilities of LLC, and priority preemption provided by the operating system. You will progressively see these isolation features deployed among the Koordinator community. In addition, interference detection capabilities will be added for various resource characteristics.
The Kubernetes community serves as the foundation for Koordinator, which does not alter the Kubernetes community. Users who employ the hybrid-deployment method implement Kubernetes first and then install the Koordinator components on top of Kubernetes. This is the order in which the steps are performed. Kubernetes would always come before Koordinator in the processing order.
There may be some inconsistencies between the capabilities offered by the Koordinator community and the prerequisites of the actual enterprise environment. Users can continue acquiring new features and only need to maintain their non-specialized components from internal versions since the Koordinator community iterates reasonably rapidly.
At this point, Alibaba has very well achieved full hybrid-deployment coverage, and the company has demonstrated via its own experience that hybrid deployment can bring about significant value. Through open source, Alibaba intends to encourage the standardization of the hybrid-deployment ecosystem. This will allow more users to more easily use hybrid deployment and reap the benefits of doing so. Everyone is invited to join in the Koordinator community's efforts to define some new roles in the future. Everyone is also welcome to continue making more contributions to open-source projects.
Perspectives on the development and practice of hybrid deployment
Alibaba Group has two hybrid-deployment scenarios. One is called "normal", where the primary use is in the spare capacity of disaster recovery. Alibaba plans to use this part of its resources for big data, improving the overall resource utilization.
The other component is the November 11 online shopping festival, which describes a situation in which Alibaba's internet traffic is around ten times the level seen daily. It is necessary to prepare more machines to meet the demand for resources if one is to prepare them according to the conventional method of resource preparation. Meanwhile, Alibaba does offline duties such as MaxCompute, batch processing, and non-real-time tasks to release more resources for online commerce through massive downgrades.
The appropriate environment for hybrid deployment may be broken down into many sections, according to the technological advancements made by Alibaba, which are as follows:
1) The workload of businesses should include a variety of tasks;
2) Only when the volume of traffic and business handled by businesses reaches a specific scale will there be a demand for resource elasticity. There will be a greater need to consider ways to cut costs and improve efficiency. When dealing with preliminary or limited-scale scenarios, the utilization of ECS resilience should be sufficient.
The technique of hybrid deployment functions as system engineering, from the scheduling of hardware to operating systems, and requires collaboration to achieve optimal performance. The resource utilization ratio between 10 and 30 percent has a lower technical threshold. The overall load can be improved by K8S' built-in scheduling and the essential hybrid-deployment capability, which applies to most businesses.
Meanwhile, the arrival of a major obstacle is upon us if the hybrid-deployment ratio climbs to between 30 and 50 percent. Along the way, Alibaba's hybrid deployment has progressed, and various enhancements have been made to the company's compute storage and networking capabilities. However, these are not nearly sufficient to match our resource model's hybrid deployment and flexibility requirements. At this point, we need to improve the application architecture. For instance, Alibaba eliminated local disks from the business architecture, shifted the IO isolation problem into a problem with network isolation, etc. By collaborating on the application side and utilizing all of these process-wide approaches, we should be able to enhance our resource consumption efficiently.
In addition, the criteria for the cluster hybrid deployment are relatively high. Depending on the task being performed, it is possible to schedule it according to the usual specification of either four cores or eight cores with 16G. However, if the hybrid-deployment workload becomes more sophisticated, the variety of resource requirements will cause the allocation problem to become an NP problem. In addition, to resolve any unique issue at this time, it is essential to use some more intricate algorithms that are analogous to decision intelligence, and the hybrid deployment process must involve the creation of a realistic application portrait.
As can be seen, a hybrid deployment is a significant component of system engineering. Today, through open-source, Alibaba presents the previous lessons and accumulated experience along the way. The company has high hopes that additional partners will be able to join in the future to reap the benefits of hybrid deployment, all while mining more in-depth technologies and building an ecosystem for hybrid-deployment that is more mature.