Introduction
According to the plan jointly approved by the National Development and Reform Commission and other three central departments, China will build eight national integrated computing hubs in the Guangdong-Hong Kong-Macao Greater Bay Area, the Chengdu-Chongqing economic circle, the Yangtze River Delta, and the Beijing-Tianjin-Hebei region, which will work as the backbone connection to the country's computing networks.
What can the computing networks do?
Living in a digital world, we have to realize that computing is an essential resource that significantly contributes to developing a range of modern devices and facilities, ranging from mobiles, PCs, and supercomputers to data centers.
However, with the current popularity of this concept, its utilization rate is significantly declining, with the rate of different computing terminals being even lower than 15%, according to some statistics.
If this is not a clear picture for you, let's take a PC as an example. For many families, it is not uncommon to own more than one computer, while most of the devices are just idle. This situation could be even worse for organizations, as private data centers of enterprises and supercomputing centers of research institutions are wasting a large amount of computing power just like we are not using our PCs at home.
In addition, as the Internet of Things matures, smart cities, smart homes, and other IoT applications can be found in our daily lives, generating enormous quantities of data from the "Internet of Everything", leading to higher demands on computing resources and computing capacity.
As a software and hardware platform that can be accessed, used on-demand, and expanded on a timely basis, cloud computing was once the main supporting technology that fully met the resource requirements of IoT terminals.
However, because of the development of the market and technology, and the surge of IoT terminals, the demand for data computing on the cloud also increases. As devices continue to generate a lot of real-time data, clouds are taking more storing responsibility. It's just that we haven't seen sufficient cloud computing data centers to satisfy the need, whose growth clearly lags far behind the data processing quantity.
So this is what the market looks like at the moment. Smart terminals lack the real-time capability to process data as there is not enough memory, CPU, bandwidth, and other ICT resources to access, making it impossible to support new data processing technologies like artificial intelligence (AI) that require large computing resources.
As a result, computing power is even a "luxury" for many R & D personnel in scientific institutions and enterprises, which is expensive to obtain and hard to utilize.
In the field of Computational Fluid Dynamics (CFD), the simulation of an engine blade requires 1000 cores to compute for a week, and there is no doubt that those requirements cannot be met with current technologies. Even if we decide to build appropriate computing platforms from now on, it may take years to see results since it is unaffordable for many organizations in terms of time costs and financial spending.
This shows traditional cloud computing alone can no longer meet the diversified and intelligent needs of IoT development, and a new generation of hierarchical computing network architecture represented by edge computing has emerged.
So-called edge computing, compared with the centrally deployed cloud computing, which is far from the user's side, is a platform that places more emphasis on deploying computing resources close to the customer's business to achieve efficient local processing.
At the core of a hierarchical computing network architecture, data processing is dispersed among devices at each tier in the network architecture rather than being centralized in a cloud computing data center.
However, as scaling from cloud computing to edge computing has become a vital development philosophy in the industry, there is a paradox between scale and cost for practical application.
The scale effect is very critical to the conventional business model of cloud computing, as its service providers need to share various infrastructures and reduce data center PUE (Power Usage Effectiveness) by continuously expanding cloud computing pools, implementing centralized construction, customized equipment usage, and intelligent operation, so as to reduce the construction and maintenance costs and get in on the ground floor of the fierce market competition.
According to the incomplete statistics, the unit computing cost of mega resource pools is only 10% to 30% of that of ordinary ones. Therefore, it's quite common to see resources flowing to those top companies in the cloud market. In China's domestic market, the top cloud service providers account for nearly 50% of the total shares, and the number is getting bigger year on year.
However, in the field of edge computing, which emphasizes distribution, the scale of nodes is severely limited.
Most edge computing nodes are located at the edge of the network that is close to users and are distributed in various environments, such as access rooms of telecom operators, substations of power companies, and vacant rooms in residential zones. These nodes are limited in space and computing resources, and there is no potential for continuous expansion of them, meaning it is impossible to reduce costs through the scale effect.
In terms of maintenance mechanisms, cloud computing nodes can adopt many intelligent operation approaches due to the high concentration of equipment. For example, robots can be used for server room inspections, significantly reducing labor costs and improving operational efficiency.
However, such a scheme is not applicable to edge computing nodes. In many discrete edge rooms, deploying a large number of intelligent operation systems only sees low benefits, and the cost of investment is even higher than the cost of computing equipment accommodated in the server room. At the same time, these intelligent operation systems themselves require delicate operations, and their malfunction rate is a little bit higher than other types of equipment.
In the short term, the only possible solution is to hire many people to carry out daily inspections of edge rooms. There is an order of magnitude difference in the number of operational engineers between cloud computing service providers and telecom operators.
Therefore, in edge computing involving a range of nodes, it is undesirable to adopt a construction and operation model similar to cloud computing. New business models and technology systems are required so more stakeholders can participate in providing and trading computing resources.
The good thing is that with the development of technologies such as 5G, AON (all-optical network), and SDN (Software Defined Network), the network is no longer a bottleneck and will connect users and resource pools on demand.
This will also bring new solutions. Information on computing resources shall be distributed through the network, and a trading platform between computing resource providers and consumers will be built, which is the computing network.
Computing network ≠ Cloud-network convergence
To some people, the computing network is just the same as cloud-network convergence or cloud-network collaboration.
In fact, not exactly.
From the perspective of resource matching, both can match the resource information of computing with the network to achieve joint optimization of multiple types of resources.
For instance, under the existing cloud-network collaboration scheme, users can select a cloud service node first and then choose the best pathway based on the network situation between the nodes of cloud services and those accessed by them. Meanwhile, users can select a suitable cloud service node according to the network situation and then choose the connection pathway. Roughly speaking, what the computing network and cloud-network collaboration do is not so different, but there is a clear distinction between the two in nature.
The core of cloud-network collaboration is cloud-centric, and the network connection should be adjusted according to the characteristics of cloud services. There are two popular approaches to this issue. First, the network opens its capacity to the cloud management system, which schedules computing, storage, and network resources; second, the cloud management system sends the network requirements to control units such as network orchestrators; and third, the cloud management system lets the network control units manage the network according to the cloud business requirements. Obviously, the key is to select the cloud service first and then determine the network connection. So, a cloud service provider can connect to multiple networks and even use technologies like SD-WAN (Software-Defined WAN) to achieve cross-domain connectivity across different network operators.
While the computing network attempts to resolve the problem from a different angle. Computing pools will send information about their free computing resources to the control plane (centralized controllers or distributed routing protocols), spreading this kind of information later. After receiving the business requirements from users, the most suitable computing pool and network pathway can be selected by analyzing the network and computing information recorded in routing tables. Therefore, networks should be chosen first, followed by computing pools (cloud computing service nodes or edge computing service nodes).
If we can only select one optional network service provider and one cloud service or computing provider, there is not much difference between cloud-network collaboration and computing networks. However, there are various network service providers and even more cloud service/computing providers, which makes a great difference between the two.
In a cloud-network collaboration solution, users would select a cloud service provider or even a specific cloud resource pool or edge computing node first, and then choose the most suitable network connection and the most optimal network pathway among multiple network service providers. In the computing network solution, by contrast, the network service provider needs to be determined first. Then the most suitable computing node is chosen from multiple resources based on the service requirements for latency and other indicators, combined with the network situation.
In short, cloud-network collaboration is "one cloud, multiple networks," while the computing network is "one network, multiple clouds (computing)".
How long do we have to wait for the realization of computing networks?
The research on computing networks has been kicking off since 2019, which brings a promising future with growing expectations from the public. Despite its continuous research for nearly three years, the computing network's development is still in its primary stage.
Computing networks do not simply put computed information into the network for distribution. It also needs to be associated with computing trading and network subscription businesses to form a system architecture to solve the two problems below. The first is resource association, which organically integrates computing resources and network resources based on users' diversified requirements. The second would be resource trading, which enables users to purchase the most appropriate computing and network resources on the trading platform per their business and budgets.
Therefore, the computing network system should be able to contain a diversity of subjects, including computing consumers, computing providers, network operators, computing network trading platforms, and computing network control planes.
Computing network system architecture, image from the Internet
Meanwhile, certain technical breakthroughs should also be required to produce practical applications of this concept.
Knowledge graph of the computing network technologies proposed by China Mobile, image from the Internet
At present, the research work on computing networks mainly aims to revolve around the following issues:
(1) Computing measurements. Currently, computing resources lack a unified and simple unit of measurement, which has become a pressing problem when it comes to the size evaluation of different types of computing resources.
(2) Information distribution. Information distribution is how to make the information about computing and other resources widely available to the public through the network control plane.
(3) Resource view. How to generate a resource view centered on each user so that they can intelligently choose the best combination of resources is also a focus point.
(4) Trusted transactions. As an intermediate platform, computing networks need to consider how to ensure that resource transactions are real and traceable since various resources in computing networks belong to different owners.
In this regard, the underlying technologies of computing networks may not have achieved real breakthroughs yet, and it is estimated to take at least another five years to grow from concepts to actual products. Ultimately, computing networks may deserve our special attention these days. Still, caution and calmness are needed simultaneously, for this shall be how we reasonably treat other emerging technologies and take advantage of it to create a better life for the whole society.