Following the release of OpenAI GPT-3, tech giants like Huawei, Google, BAAI, Kuaishou, Alibaba, and Nvidia also introduced their large-scale models in 2021, leading the AI industry to enter into a new phase of intense competition. In seeking to explore generalist AI, large-scale models are a source of original innovation and long-term impact, and it will also serve as a platform for achieving further world-class achievements.
Overview
Since the debut of the large BERT model in 2018, global tech titans have significantly invested in constructing their large-scale models, viewing them as the future AI hot spot.
Previously, it is well known that "difficult to implement" has become a constraint to AI's technical level, application size, and industrial popularity. Furthermore, research suggests that the high development costs and technical obstacles have created an invisible barrier, detaching the technology chain from the industrial chain, ultimately resulting in "small workshop"-style AI development. Consequently, developers have to repeat time-consuming, challenging, and tedious tasks such as data collection, labeling, and training, increasing the burden on developers and costs of enterprise applications.
Therefore, the industrialized development mode enters the picture as large-scale models begin to appear.
Since big models tend to have a high degree of generalization and generality, it is possible to reintegrate AI development into a "pre-training big model + downstream job fine-tuning" pipeline. The pipeline can be effectively repurposed for various application scenarios, and only a modest amount of industrial data is required to rapidly construct AI models with greater accuracy and generalizability.
The current status of large-scale model development
Large-scale models appear to be evolving faster than Moore's Law. In 2021, it will be noticeable that major academic institutions and technology businesses are investing extensively in developing their large-scale models and vastly enriching their capability boundaries and technological approaches.
What we saw in 2021 for large-scale models:
At the beginning of 2021, the world's first trillion-parameter model, Switch Transformer, was released by the Google team.
In March, the non-profit research institute BAAI published Wudao AI 1.0. Three months later, Wudao AI 2.0 was introduced, in which the parameter scale exceeded 10 billion.
In April, the Huawei Cloud Pangu Pre-trained Large Model was released, which is the industry's first 100 billion-parameter Chinese language pre-training model. It is not limited to a single area of AI, such as NLP, but is a generalist AI that integrates several popular features.
In July, the Institute of Automation of the Chinese Academy of Sciences introduced Zidong-Taichu, the world's first OPT-Omni-Perception Pre-Trainer. It possesses cross-modal comprehension and generation skills and can address problems in three domains: text, image, and audio.
A new generation of the distributed deep learning platform, known as Angel 4.0, was built by Peking University and Tencent for deep learning training based on massive training data and large-scale model parameters. In August, it was announced that the self-developed deep learning framework Hetu would be integrated into the Angel ecosystem. This represents a significant advance in the field of large-scale deep learning.
In September, Inspur published "Yuan 1.0," a large-scale model with 245.7 billion parameters and 5000 GB of Chinese training data. Compared to the US GPT-3 model, Yuan 1.0 has a parameter scale of 40% greater and a training dataset scale roughly ten times larger.
In November, Nvidia and Microsoft launched the MT-NLG, which has 530 billion parameters. In the same month, Alibaba DAMO Academy announced the latest development of a multi-modal large-scale model, M6, with 10 trillion parameters, which became the world's largest AI pre-trained model in 2021.
Except for the intuitive comparison of parameters, the real test of the strength of large-scale models is their practical performance. At present, many companies are exploring the feasibility of implementing such models.
Right now, Huawei Cloud Pangu Pre-trained Large Models have been implemented in energy, retail, financial, industrial, medical, environmental, and logistics industries for more than 100 scenarios, resulting in a 90% increase in the average efficiency of corporate AI application development.
In addition, Alibaba DAMO Academy's M6 possesses multi-modal and multitasking capabilities. It has superior cognitive and creative talents compared to conventional AIs and has been applied to Alipay, Taobao, and Tmall. The model excels in design, writing, and Q & A, which has the potential to be used extensively in e-commerce, manufacturing, literature, art, and scientific research.
Currently, offline applications are more prevalent for large-scale models. For online applications, many complicated issues, such as knowledge distillation, low-precision quantization, and other model compression techniques/real-time requirements, must be considered.
Classification of large-scale models
Divided according to model architecture: monolithic models and hybrid models:
Monolithic models | GPT-3 by OpenAI, MT-NLG by Microsoft-Nvidia, Yuan 1.0 by Inspur, etc. |
Hybrid models | Google's Switch Transformer, Wudao by BAAI, Alibaba's M6, Huawei Cloud Pangu,etc. |
Among them, Google Switch Transformer adopts the Mixture of Experts (MoE) model to slice and dice the model, which results in a sparse activation model and significantly saves computational resources.
Meanwhile, the 1.75 trillion parameters of "Wudao 2.0" set a new record for parameter size. It is remarkable that it no longer relies on a domain-specific model but instead on a fusion system of multiple domains.
Based on applications, the popular domains of large-scale models include large models of NLP (Chinese language), CV (computer vision), multi-modal models, scientific computing, etc.
GPT-3, MT-NLG, and Yuan 1.0 are now popular monolithic large-scale models in NLP. The self-supervised pre-training models that have established themselves so successfully in NLP are also effective for CV tasks.
Dilemma of large-scale models
The recent improvement in performance for large-scale models has coincided with the appearance of a new kind of difficulty.
First of all, the construction of large-scale models is a difficult task that requires a great deal of data, computing power, algorithms, and other hardware and software resources. In the short term, this vast resource consumption poses a heavy burden on enterprises and research institutions and infringes upon the international objectives of energy-saving and environmental protection as well as the Chinese government's proposal to achieve carbon peak and carbon neutrality targets. It is challenging to realize low power consumption evolution in large-scale models with limited resources.
Second, large-scale models still lack standardized evaluation criteria and modular processes.
Although large-scale model research and development is still in its infancy, companies and institutions are increasingly competing for high-quality centralized resources, ultimately leading to fragmentation of the evaluation system through various silo-type judging standards and scattered algorithm model structures.
In addition to this, innovation lacks momentum. Large-scale model applications are contingent on their generalizability; greater parameter scales do not result in superior models. Actual performance is dependent not only on the precision of the data and the network architecture but also on the software and hardware's compatibility with the industry. Presently, the industry overemphasizes the research and development of high parameter sets and large-scale models while ignoring the innovation of network models and collaborative innovation with the industry.
Finally, the implementation seems to be slow. There is a broad consensus that the most challenging aspect of AI large-scale models is how to make them available to a wide range of industries and scenarios. However, as of now, big models are only used within enterprise projects. Therefore, it has been challenging to find a way to make large-scale models useful to organizations outside of the enterprise.
Next steps for large-scale models
Large-scale parameters still have their advantages
From millions to billions to trillions, the parameter scale has increased, bringing huge models closer to the human level. For some time, large-scale models are likely to grow in size. There is a potential shift from simply increasing processing power to employing parallel computing, software and hardware collaboration, and other technologies. In addition to practical reasons, there have been some small parameter models that appear to have emerged.
Large-scale models become generalists and are utilized in multiple sectors
The original purpose of large-scale models was to empower trained models with the cognitive capabilities of a wide variety of domains and the ability for generalization and self-evolution. It has been shown that sharing big models in NLP and CV can be very effective. Moreover, GPT-3 also shows that learning from massive, unlabeled data without limitation is possible. This is well demonstrated by the recent rise of multi-modal pre-training big models. Future huge models will necessitate innovation to construct a generic AI underlying architecture, generalizing the cognitive power of models from single-domain to multi-domain, expanding themselves in many scenarios, and progressing along a sustainable and evolvable route.
Easy-to-use open-source platform
Several organizations have been committed to fostering the trend of open-source models for large-scale projects in recent years. Open-source AI is still a relatively new area for many institutions like Microsoft, IDEA, and BAAI; therefore, only algorithm packages and training queues are available at the moment. Shortly, large-scale modeling would facilitate access to algorithm systems, standard systems, underlying platforms, data sets, engineering testing, etc.
The standard and friendly workflow
While "pre-training massive models + fine-tuning" has expedited the pace of AI development, giant models can only excel in more scenarios with a well-suited process. Additionally, the industry will have a standard and mature system to measure a model's generality and ease of use, and this system will be the industry's gold standard when regulating the scalability and generality of models. By then, self-promotional branding would have ceased to be an essential aspect of marketing.
AI large-scale models on-device
As computing power and storage for large-scale models are increasingly housed on the device, similar to chips, it will no longer be necessary to constantly request computing power or data from large-scale models. Today, most models are large and complex, requiring a lot of processing power and running time. In the near future, this has the potential to change progressively.
Business patterns
More people are concerned about what business patterns the large-scale models will have in the future. It can be outlined at three levels:
Utilizing large-scale models as a basis:
The base can be sold or leased to national innovation centers, government organizations, or joint ventures for upper-layer development.
Open source:
An individual corporation cannot tackle all of the technical issues that arise from large-scale models, so it is preferable to draw on open source's shared IPs and mutual benefits.
Provide huge models to ISVs (independent software vendor):
It is impractical to ask large-scale model developers to step out of their laboratories and directly confront thousands of industrial customers. By allowing ISVs access to their models, companies can reach more downstream clients. There are two methods to implement it: through traffic billing or per-project billing; or by allowing users to access it for free but generating commercial benefits from the advertising.
Conclusion
Strikingly, the prevalent occurrence of large-scale models today is similar to the emergence of deep learning. Despite this, large-scale models are far from complete as high-level studies of cognitive intelligence. How well they can enhance their invention, generalization, and implementation skills will determine the final results.
Perhaps in the next few years, large-scale models based on massive processing power will provide an incredible level of intelligence and a constant flow of intelligent services for a broad range of artificial intelligence applications. This scene could become a reality, but it will take some time to accomplish.