Introduction
With the widespread implementation of artificial intelligence solutions, the public's expectations for "actual intelligence" will not be restricted to the perception of sight, sound, and touch. The rise of artificial intelligence relies greatly on cognitive intelligence to determine whether it can continue to surpass the ceiling and expand far beyond its current limits.
Cognitive ability gives more realistic settings for AI applications
Cognitive intelligence seeks to imitate the thinking process of the human brain, which can comprehend, infer, explain, summarize, and deduce facts and languages, so that artificial intelligence is genuinely "intelligent". This enables AI to provide more expansive applications of intelligent services, such as smart robotics, self-driving vehicles, drones, augmented reality, virtual reality, and customized recommendation systems.
On the one hand, "intelligence" is transitioning from perception to cognition as perception technologies such as computer vision and voice recognition reach their limits. In picture recognition, for instance, we have seen poor flexibility and generalizability. Meanwhile, the interactive performance of 3D reconstruction in the medical imaging area and that between the environment and the use of augmented reality and virtual reality are less successful. Additionally, semantic variety in the field of voice recognition contributes to the trend of change.
On the other hand, research on multi-modal and pre-trained big models has increased for cognitive intelligence technologies such as natural language processing (NLP), intelligent dialogue, and intelligent suggestion.
In addition, several sectors have submitted their practical AI requirements, such as how to accomplish intelligent cost reduction, income growth, efficiency development, and safety.
The number of large-scale pre-trained models has increased dramatically in the past year. Meanwhile, intelligent suggestion, search technologies, brain-computer interfaces, and virtual anchoring are all trending subjects and new barometers. Moreover, various tech companies, such as Emotibot, 4Paradigm, Langboat Technology, Beijing Academy of Artificial Intelligence (BAAI), and Mininglamp Technology, have provided a great deal of inspiration for the commercialization of cognitive intelligence.
More participants in the game
In 2021, the integration process between cognitive intelligence development and the transformation of industrial digital intelligence will have begun. Active and significant companies in artificial intelligence have adopted the latest generation of cognitive intelligence technology.
- Leyan Technologies has released "LeYu ZhuRen," a fourth-generation customer care robot system with an automated response, in-depth trained discussion, and a human-like customer service reception. It offers intelligent e-commerce customer service solutions, such as automated response to buyer consultation, smart suggestions, intelligent marketing, and intelligent quality assurance, to more than 20,000 e-commerce customers at this time.
- In July 2021, Renmin University of China and BAAI collaborated to launch the large-scale pre-trained model Wudao-Wenlan. It possesses excellent visual language retrieval capacity and a degree of general knowledge comprehension. Based on the multi-modal graphic model Wudao-Wenlan, the R & D team created an AI Mood Radio application that could play music that matched the mood of pictures.
- People's Daily and 4Paradigm inked a partnership in September to jointly develop an algorithm for "new media." As a means of encouraging media sector transformation and innovation in the AI era, this algorithm will increase the transmission of high-quality content in mainstream media while ensuring that tailor-made content perfectly meets individual consumer needs.
- The Feiyu Operating System, introduced by iFLYTEK, incorporates the key voice recognition and semantic comprehension technologies that will support numerous Internet of Vehicles application scenarios for customers to modify and build according to their needs. Through logically sound scene interaction designs, users may easily avoid comparable dangers. Using technologies such as voiceprint recognition and acoustic source localization, the artificial intelligence and intelligent speech company can lock people's voices to a specific person or direction, ensuring their safety. It will then use deep learning and sample data to develop the ontology and fact data of the knowledge graph while continually adjusting and optimizing based on input from the knowledge graph's application.
Graph research and large-scale models will become paradigms
In fundamental research, the knowledge graph is regarded as the most potential area for advancing perceptual intelligence to cognitive intelligence. The knowledge graph may assist businesses in better realizing the acquisition, inheritance, and reuse of information, efficiently resolving the issue of developing and utilizing knowledge assets. Due to its remarkable universality, it may be implemented across sectors and assist businesses in becoming more intelligent, innovative organizations that produce more breakthroughs.
Currently, knowledge graphs are often categorized as either generic knowledge graphs or domain-specific knowledge graphs. In addition to playing a prominent role in application scenarios such as semantic search, recommendation systems, and question-and-answer systems, it will represent the increasing effect on industries like banking, energy, medical care, manufacturing, and retail.
This method is utilized well in semantic search due to its simple knowledge representation and extensive nature. Enterprises and research organizations, including Google, Alibaba, Tencent, Emotibot, Baidu AI cloud, Stargraph, PERCENT Technology, and Mininglamp Technology, have developed several applications and conducted extensive research on knowledge graphs.
Graph Neural Networks (GNN) expand deep neural networks' ability to analyze standard unstructured input (images, voices, and text sequences) to higher-level structured data (graph structures). Large-scale graph data can convey both broad human knowledge and expert rules comprising logical connections. A graph node represents comprehensible symbolic knowledge, and its irregular topological structure represents how nodes are related, such as dependency, subordination, and logic. Consequently, GNN appears to be the most crucial implementation path for the intelligent empowerment of machine learning.
Multi-modal and large-scale pre-trained models are anticipated to become standard in the AI industry's development process. Focusing on the relationship between vision and language, Renmin University of China and BAAI used 650 million pairs of Internet-generated graphs and texts with self-supervised tasks to create the most extensive universal graph and text pre-trained model in Chinese—Wudao-Wenlan—in July 2021. This was done to test the feasibility of AI language learning in a multi-modal environment. According to the official introduction, Wenlan 2.0 has achieved the creation and comprehension of seven distinct languages, setting a new record for a multilingual pre-trained model and achieving a world-leading level in activities such as image and text search and picture question answering.
Who will be awarded the laurel wreath for technical advancement?
There are a few critical cognitive intelligence implementations in 2021. This year, technologies such as natural language processing, human-computer interaction, and intelligent search suggestions may become the primary focus of top AI businesses.
NLP
NLP technology is one of the most essential in artificial intelligence, with Microsoft, Google, and Tencent, among other industry heavyweights, releasing cutting-edge and pragmatic findings.
Microsoft Research Asia presented six papers at the ACL 2021, covering cross-lingual NER, code search, music production, Hi-Transformer, pre-trained models, and semantic interaction, among other subjects. Hi-Transformer can handle lengthy documents that Transformer cannot due to memory and speed limitations, and the model effect has drawn the attention of researchers.
Meanwhile, Tencent and the University of Alberta researchers have developed a straightforward yet effective pre-training method: Lichee. Lichee is an algorithmic framework for multi-modal content comprehension that incorporates data augmentation, pre-trained engines, standard models, and inference acceleration. In addition, it employs multi-granularity input information to improve the language models' capacity for representation. Lichee has been applied in a variety of commercial contexts, such as Tencent Kandian (information and content service), Tencent Video (video streaming service), and Tencent QQ (instant messaging service), lowering the average number of labeled samples by over 40%. After several cycles of experience, the R&D cycle for content comprehension in information flow may be significantly reduced with increased efficiency.
In the meanwhile, Tencent AI Lab and the Chinese University of Hong Kong have discovered a way to achieve high-performance neural network translation using a single-language memory. This paper presents a novel framework that employs monolingual memory and conducts cross-lingual learnable memory retrieval. Due to its capacity to utilize monolingual data, the approach is also useful in settings with limited resources and domain adaptability.
In addition, researchers from the Jarvis Deep Learning Cloud Platform of iQIYI and the Technical University of Munich have suggested the I2UV-HandNet high-precision hand reconstruction system, which would accomplish high-precision reconstruction by "seeing" monocular RGB. The technique is anticipated to be applied to iQIYI's next-generation VR devices, removing the need for handles and enabling excellent interaction between the real and virtual worlds, resulting in lighter, quicker, and more comfortable devices. Simultaneously, gesture reconstruction and interaction technologies are being investigated in different business situations and the firm's hardware terminals that provide an online video streaming service.
DeepMind and Google researchers have demonstrated that machine learning may be used to generate appropriate heuristics from a group of MIP cases automatically. In fact, it is typical for an application to be tasked with solving several instances of the same high-level semantic problem with varying parameters.
Many prominent organizations and academic institutions have conducted research on pre-trained models in the field of NLP during the past three years, and the current trend for models is that the bigger they are, the better.
However, bigger models would unavoidably incur higher training costs, and the criteria for customers' machine and equipment capabilities while delivering the service are pretty high, preventing many SMEs with limited hardware capabilities from using these large-scale pre-trained models. Targeting this pain point, Langboat Technology has considered developing smaller ones to boost the training pace and decrease the usage costs, and this concept has led to the introduction of the lightweight pre-trained model—Mencius (Mengzi).
Mencius employs large-scale corpora to train a large-scale language model unsupervised. This language model may determine the semantics of each word. Each phrase is given a sentence or a sentence fragment, which can be used in situations such as machine translation and question-and-answer searches. "Using pre-training as a foundation, Lanboat Technology has created a new generation of machine translation, text production, and an industry search engine through industrial cooperation."
As the benefits of large-scale supervised data diminish, the new AI infrastructure requires cheaper R & D and implementation costs. With pre-trained and self-trained platforms, we will eventually develop standardized, low-cost, and repeatable models, then integrate more deeply with the industry to uncover more innovative applications and cut labor costs.
NLP is in its golden era despite many unresolved issues, and numerous commercial uses have been identified. The development of large-scale language models for NLP will usher in a new phase of the digital revolution.
Intelligent recommendation technology
Alibaba's technology team used deep augmentation learning and adaptive online learning in the search and suggestion scenarios of the November 11 online shopping event in 2021, resulting in a 10 to 20 percent increase in the click-through rate of users. A decision engine has been developed through continuous machine learning and model optimization to analyze massive user behaviors and billions of product features in real-time. This enables users to discover products and bring buyers to merchants, thereby enhancing the efficiency of the buyer-product pairing process and vastly enhancing the user shopping experience.
Bytedance, the parent company of TikTok, opted to harness its cloud capabilities, allowing Volcengine to enhance further the resource ecology's adaptability and the customization of the algorithm impact. It has resolved the issue of local package size to accomplish dynamic resource pulling and utilization. With a robust operating foundation, tailored modification services may suit the requirements of businesses more effectively.
The technology team at 58.com has implemented effective techniques for constructing suggestion search capabilities in the context of classified advertising, including ranking frameworks that link business and multichannel deep learning models. This strategy has significantly increased the click-through rate and diversified user experiences in the current wave of industrialization improvements.
The QQ Browser laboratory at Tencent has developed a pre-trained model called "Shenzhou" that can train 10 billion parameters and directly assist in business scenarios such as search, recommendation, and content comprehension, thereby enhancing the algorithm's performance for natural language comprehension. This model satisfies the QQ Browser business's NLP requirements, such as comment comprehension and search query recommendation, and reduces the amount of labeled data and the corresponding R&D time by more than 40 percent, thereby reducing the cost of labeling and significantly enhancing the research and development efficiency.
From click rate, conversion rate, and matching efficiency to the business linkage, R & D efficiency, and user experience, it appears that the evolution of intelligent search and recommendation is reshaping society in every respect.
Intelligent Chatbots
Chatbots are now the most prevalent application of cognitive intelligence technology in the business sector. Cognitive intelligence increasingly enables computers to converse naturally, fluently, and engagingly as humans, resulting in the proliferation of smart speakers, intelligent customer service personnel, and intelligent companion robots.
In September, Baidu announced the world's largest conversation generating model, PLATO-XL. In human-computer intelligent dialogue creation, PLATO-XL has overcome rivals published by Meta, Google, and Microsoft and is now the leader in Chinese and English talks.
Tencent's virtual assistant "Xiaowei" released several digital intelligence products in November at a Cloud Intelligence session of the company's Digital Ecosystem Summit. These products will have the ability to work in different professions and provide customized roles for servicing tourists, bank customers, multilingual anchors, and sirens. With the capabilities of picture expression, recognition, and perceptron understanding, digital intelligence beings are able to distinguish over 34 languages and dialects and store over 460,000 industry-specific buzzwords.
The study team from Harbin Institute of Technology coupled the knowledge graph with the material of the Winter Olympics to create an intelligent customer service robot for the event, which can answer questions about sports event tickets and organize transportation routes.
In addition, OPPO has revealed its virtual assistant, "Xiaobu," which is capable of analyzing the emotional condition of users based on their request messages and providing a humanized answer. For instance, when a user expresses loneliness and melancholy in a text, Xiaobu might comprehend the user's emotions and respond accordingly.
In December, Xinhua News Agency's AI-powered anchor conversed with a virtual human visitor. Dressed in a suit and tie, the male virtual presenter, Xin Xiaohao conducted an interesting interview with two virtual animation guests, "Unlimited Girl" and "TV Rooster." Xin Xiaohao could speak exceptionally standard Mandarin and perform realistic hand motions.
In addition, brain-computer interfaces and smart linked cars will be engaging cognitive intelligence application scenarios in 2021.
There's still a long way to go
The urgent demand for more AI infrastructure
Professor Tang Jie, director of the Knowledge Engineering Group (KEG) at the Tsinghua-CAE Joint Research Center for Knowledge & Intelligence, stated that the current infrastructure is insufficient for cognitive intelligence to make significant advances. For instance, building a universal knowledge graph is a critical undertaking that would need substantial time and labor. Concerning NLP, the formal knowledge system is very deficient, with unstable entity connections, and the performance of deep structured semantic analysis is poor. In a word, systems lack stability despite their vastness, and algorithms, data, and industry-specific specialists are essential. This necessitates pragmatism on the part of specialists in each industry, i.e., not only providing a demo presentation and bragging to the audience with flashy slides, but rather addressing the building and upgrading of cognitive intelligence with a focus on long-term value. Innovation on the application layer is insufficient to transform the cognitive intelligence sector significantly; underlying technology development is required.
Subsector-specific product testing standards will be defined and enhanced
Standardization organizations, businesses, and academic institutions in China and overseas have become more interested in the development of standardized cognitive intelligence in recent years. According to the Development Report on Cognitive Intelligence (2021), conducted jointly by the China Academy of Information and Communications Technology (CAICT) and Emotibot, more than thirty standards in this field have been enacted or are in the process of being established, ranging from industry specifications to national and international standards. Consequently, we may now rely on a handful of generic standards, but application-specific product testing standards remain unclear. For example, scientific urban governance standards are still lacking, as are accurate modeling and efficient deduction technology systems in creating smart cities.
Uncertainty and prejudice in a complicated setting
Cognitive intelligence has several outstanding issues in its practical applications. In complicated metropolitan contexts, cognitive bottlenecks have been seen in scene reactions, intelligent inference, and decision-making processes (misattribution due to empirical data can lead to racist tendencies in algorithms for crime prediction). Existing models cannot adequately understand general principles or objective rules, making it challenging to address inference and decision-making problems in an open, dynamic, and realistic urban environment.
Multidisciplinary synergistic integration to be promoted
Cognitive intelligence is a complete system consisting of ideas, technologies, and applications, whose implementation requires the synergy and cooperative advancement of brain research, psychology, logic, languages, and other disciplines. However, there are significant gaps in the cross-border collaboration that must be overcome, and cognitive intelligence cannot be completely developed if our research just generates information silos.
Future prospects
The following four significant trends in cognitive intelligence are projected to emerge in the future years, ranging from fundamental research to commercial exploration.
The evolution of knowledge graphs toward automation
Currently, the development efficiency of knowledge graph generation is rather low since data collection, cleaning, and comparison are performed manually with little automation. Meanwhile, the building of knowledge graphs continues to rely heavily on the input of expert knowledge, and products in this market have strong industry qualities but weak adaptability, limiting their use in large-scale scenarios.
Consequently, many businesses have begun to investigate platform-based solutions. Gemini is a knowledge engineering platform developed by Emotabot. Based on the platform, users may construct knowledge graphs for general or industry-specific purposes, conduct knowledge management and search, significantly reducing the time required for manual text processing in the business, and resolving corporate data application issues. The AI service provider's 4Paradigm Sage Knowledge Base is a user-friendly information graph platform that incorporates a vast amount of expert knowledge into its NLP solutions to enable business users to conduct knowledge-driven analysis and decision-making.
Large and small models are interconnected
Artificial intelligence development is currently expanding fast from perception to cognition, with ultra-large-scale pre-trained models being the worldwide focus of R&D and commercial competitiveness in artificial intelligence. In China, Tencent, Sogou, Huawei, Alibaba DAMO Academy, and other significant organizations have taken turns dominating the authoritative Chinese Language Understanding Evaluation (CLUE) benchmark, and it is worth noting that the lightweight pre-training model Mencius of Langboat Technology, with only 1 billion parameters, has set a new record and topped the list of CLUE which was comprised of models with 10 billion or 100 billion parameters.
Currently, large-scale models have a long way to go before they can be implemented, and they must be refined and condensed into smaller versions before their distribution. Small models are typically a few tens of gigabytes in size and can only be applied effectively after software and hardware optimization.
As technology advances, this industry will have a tremendous growth spurt due to the high demand from businesses to change into digitalized and intelligent companies.
Multi-modal integration
In the case of chatbots, most current solutions are primarily text-based to discern emotions and better comprehend user needs to be based on auditory or visual feature analysis that combines information from acoustics and words. Apple has developed a technology that changes the loudness of the intelligent assistant's response based on the user's voice command. In contrast, Google is working on functionalities that might determine if the user is speaking to the intelligent assistant based on the focus of their eyes.
Transition from singular algorithm invention to full-stack innovation
As applications in industries continuously get more complex, the strategy of enhancing the application effect by developing single-point algorithms is rendered ineffective, necessitating a full-process and full-stack approach.
Meanwhile, the complexity of implementation and the diversity of actual demands will require the integration of future intelligent applications into an advanced, vast, and unified network through industrial collaboration and system integration.
In addition to employing technology associated with computational intelligence and perceptual intelligence, synergy and cooperative advancement in numerous areas, such as brain science, psychology, logic, languages, etc., are required.
Conclusion
As Arthur C. Clarke wrote in his science fiction novel, "any sufficiently advanced technology is indistinguishable from magic," fully developed artificial intelligence may, like magic, endow machines with a human-like consciousness. Consequently, the next generation of information technology, particularly cognitive intelligence, will have a significant effect on all sectors of civilization.
In general, cognitive intelligence is still in its infancy, yet we live in a world where science and technology are advancing at a breathtaking rate. In general, cognitive intelligence is still in its infancy, yet we live in a world where science and technology are advancing at a breathtaking rate. For the human race to be able to build a better world, continuous and synergistic innovation will require the efforts and advancements of multiple industries.