专题:培育新质生产力 助力高水平科技自立自强

通用大模型演进路线

  • 任福继 ,
  • 张彦如
展开
  • 1. 电子科技大学计算机科学与工程学院, 成都 611731;
    2. 电子科技大学(深圳)高等研究院, 深圳 518110
任福继,教授,日本工程院院士、欧盟科学院院士、俄罗斯工程院外籍院士,研究方向为先进智能、情感计算、智能机器人等,电子信箱:renfuji@uestc.edu.cn;张彦如(通信作者),教授,研究方向为智能博弈与决策,电子信箱:yanruzhang@uestc.edu.cn

收稿日期: 2024-05-14

  修回日期: 2024-05-28

  网络出版日期: 2024-07-09

Evolution of general large models

  • REN Fuji ,
  • ZHANG Yanru
Expand
  • 1. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China;
    2. Shenzhen Institute for Advanced study, UESTC, Shenzhen 518110, China

Received date: 2024-05-14

  Revised date: 2024-05-28

  Online published: 2024-07-09

摘要

随着人工智能技术的飞速发展,通用大模型(GLMs)已经成为人工智能领域的重要研究方向。通用大模型拥有超大规模参数,通过大规模数据进行训练,具备强大的学习和推理能力。这些模型在自然语言处理、图像识别、代码生成等多种任务中展现出卓越的能力。回顾了通用大模型的发展历程,梳理关键技术节点,从早期基于规则的系统和传统机器学习模型,到深度学习的崛起,再到Transformer架构,以及GPT系列及国内外通用大模型的进展。尽管GLMs在多个领域取得了显著进展,但其发展也面临诸多挑战,包括计算资源需求、数据偏见与伦理问题及模型的解释性与透明性。分析了这些挑战,并探讨了GLMs未来发展的5个关键方向:模型优化、多模态学习、具情感大模型、数据与知识双驱动以及伦理与社会影响。通过这些策略,通用大模型有望在未来实现更广泛和深入的应用,推动人工智能技术的持续进步。

本文引用格式

任福继 , 张彦如 . 通用大模型演进路线[J]. 科技导报, 2024 , 42(12) : 44 -50 . DOI: 10.3981/j.issn.1000-7857.2024.05.00531

Abstract

With the rapid development of artificial intelligence (AI) technology, general large models (GLMs) have become a significant research focus in the AI field. GLMs typically possess an extensive number of parameters, are trained on massive datasets and exhibit robust learning and reasoning capabilities. These models demonstrate outstanding performance in various tasks, including natural language processing, image recognition, and code generation. This paper reviews the evolution of GLMs and the key technology nodes, from the early rule-based systems and traditional machine learning models to the rise of deep learning, the introduction of the Transformer architecture, and the advancements in the GPT series and other GLMS over the world. Despite the significant progress, GLMs face numerous challenges, such as high computational resource demands, data bias, ethical issues, and model interpretability and transparency. This paper analyzes these challenges and explores five key future development directions for GLMs: model optimization, multimodal learning, emotionally intelligent models, data and knowledge dual-driven models, and ethical and societal impacts. By adopting these strategies, GLMs are expected to achieve broader and deeper applications, driving continuous progress in AI technology.

参考文献

[1] 陶建华, 聂帅, 车飞虎. 语言大模型的演进与启示[J]. 中国科学基金, 2023, 37(5):767-775.
[2] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge:MIT Press, 2017.
[3] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[EB/OL]. (2018-06-11)[2024-05-14]. https://openai.com/index/language-unsupervised.
[4] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[EB/OL]. (2019-02-14)[2024-05-14]. https://openai.com/index/better-languagemodels.
[5] Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[C]//Proceedings of the 33th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge:MIT Press, 2020:1877-1901.
[6] Achiam J, Adler S, Agarwal S, et al. Gpt-4 technical report[J/OL].[2024-05-04]. https://doi.org/10.48550/arXiv.2303.08774.
[7] Colin R, Noam S, Adam R, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of machine learning research, 2020, 21(140):1-67.
[8] Chowdhery A, Narang S, Devlin J, et al. Palm:Scaling language modeling with pathways[J]. Journal of Machine Learning Research, 2023, 24(240):1-113.
[9] Reid M, Savinov N, Teplyashin D, et al. Gemini 1.5:Unlocking multimodal understanding across millions of tokens of context[J/OL].[2024-04-25]. https://doi.org/10.48550/arXiv.2403.05530.
[10] Anthropic. The claude 3 model family:Opus, sonnet, haiku[EB/OL]. (2024-03-04)[2024-05-28]. https://wwwcdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc-618857627/Model_Card_Claude_3.pdf.
[11] Meta. Introducing Meta Llama 3:The most capable openly available LLM to date[EB/OL]. (2024-04-18)[2024-05-28]. https://ai.meta.com/blog/meta-llama-3.
[12] Du Z X, Qian Y J, Liu X, et al. GLM:General language model pretraining with autoregressive blank infilling[EB/OL].[2024-05-28]. http://arxiv.org/abs/2103.10360.
[13] Aleph Alpha. Luminous performance benchmarks[EB/OL].[2024-05-28]. https://aleph-alpha.com/luminousperformance-benchmarks.
[14] Rae J W, Borgeaud S, Cai T, et al. Scaling language models:Methods, analysis & insights from training gopher[EB/OL].[2024-05-28]. http://arxiv.org/abs/2112.11446.
[15] Hoffmann J, Borgeaud S, Mensch A, et al. Training compute-optimal large language models[EB/OL].[2024-05-28]. http://arxiv.org/abs/2203.15556.
[16] Le Scao T, Fan A, Akiki C, et al. Bloom:A 176b-parameter open-access multilingual language model[EB/OL].[2024-05-28]. http://arxiv.org/abs/2211.05100.
[17] Hoffmann J, Borgeaud S, Mensch A, et al. Training compute-optimal large language models[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge, USA:MIT Press, 2022:30016-30030.
[18] Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural network[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge:MIT Press, 2015.
[19] Dettmers T, Pagnoni A, Holtzman A, et al. Qlora:Efficient finetuning of quantized llms[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge:MIT Press, 2023:10088-10115.
[20] Wang Z Y, Huang S H, Liu Y X, et al. Democratizing reasoning ability:Tailored learning from large language model[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:Association for Computational Linguistics, 2023:1948-1966.
[21] 邵仁荣, 刘宇昂, 张伟, 等. 深度学习中知识蒸馏研究综述[J]. 计算机学报, 2022, 45(8):1638-1673.
[22] Qi J R, Fernández R, Bisazza A. Cross-lingual consistency of factual knowledge in multilingual language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:Association for Computational Linguistics, 2023:10650-10666.
[23] Shi Y X, Liu Z L, Shi Z, et al. Fairness-aware client selection for federated learning[C]//Proceedings of IEEE International Conference on Multimedia and Expo (ICME). Piscataway, NJ:IEEE, 2023:324-329.
[24] 古天龙, 李龙, 常亮, 等. 公平联邦学习及其设计研究综述[J]. 计算机学报, 2023, 46(9):1991-2024.
[25] Shanahan M, McDonell K, Reynolds L. Role play with large language models[J]. Nature, 2023, 623(7987):493-498.
[26] Ethayarajh K, Jurafsky D. Attention flows are shapley value explanations[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2:Short Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2021:49-54.
[27] 鞠天杰, 刘功申, 张倬胜, 等. 自然语言处理中的探针可解释方法综述[J]. 计算机学报, 2024, 47(4):733-758.
[28] 刘学博, 户保田, 陈科海, 等. 大模型关键技术与未来发展方向:从ChatGPT谈起[J]. 中国科学基金, 2023, 37(5):758-766.
[29] Du N, Huang Y, Dai A M, et al. Glam:Efficient scaling of language models with mixture-of-experts[C]//Proceedings of the 39th International Conference on Machine Learning(ICML). NY:PMLR, 2022:5547-5569.
[30] Rajbhandari S, Li C L, Yao Z, et al. DeepSpeed-MoE:Advancing mixture-of-experts inference and training to power next-generation AI scale[C]//Proceedings of the 39th International conference on machine learning(ICML). NY:PMLR, 2022:18332-18346.
[31] Ren F J. Construction of metaverse center based on advanced intelligence, 2022-Annual-Report-EU-Academy-of-Sciences[R]. Salzburg:European Academy of Sciences, 2023:106-116.
[32] Ren F J, Shi H. A general ontology based multi-lingual multi-function multimedia intelligent system[C]//Proceedings of SMC 2000 Conference Proceedings. 2000 IEEE International Conference on Systems, Man and Cybernetics. 'Cybernetics Evolving to Systems, Humans, Organizations, and their Complex Interactions' (Cat. No. 00CH37166). Piscataway, NJ:IEEE, 2000:2362-2368.
文章导航

/