通用大模型演进路线

任福继; 张彦如

doi:10.3981/j.issn.1000-7857.2024.05.00531

科技导报 >

2024 , Vol. 42 >Issue 12: 44 - 50

DOI: https://doi.org/10.3981/j.issn.1000-7857.2024.05.00531

专题：培育新质生产力助力高水平科技自立自强

通用大模型演进路线

任福继 ,
张彦如

展开

1. 电子科技大学计算机科学与工程学院, 成都 611731;
2. 电子科技大学(深圳)高等研究院, 深圳 518110

任福继，教授，日本工程院院士、欧盟科学院院士、俄罗斯工程院外籍院士，研究方向为先进智能、情感计算、智能机器人等，电子信箱：renfuji@uestc.edu.cn；张彦如（通信作者），教授，研究方向为智能博弈与决策，电子信箱：yanruzhang@uestc.edu.cn

收稿日期: 2024-05-14

修回日期: 2024-05-28

网络出版日期: 2024-07-09

收起

Evolution of general large models

REN Fuji ,
ZHANG Yanru

Expand

1. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China;
2. Shenzhen Institute for Advanced study, UESTC, Shenzhen 518110, China

Received date: 2024-05-14

Revised date: 2024-05-28

Online published: 2024-07-09

Fold

摘要

随着人工智能技术的飞速发展，通用大模型（GLMs）已经成为人工智能领域的重要研究方向。通用大模型拥有超大规模参数，通过大规模数据进行训练，具备强大的学习和推理能力。这些模型在自然语言处理、图像识别、代码生成等多种任务中展现出卓越的能力。回顾了通用大模型的发展历程，梳理关键技术节点，从早期基于规则的系统和传统机器学习模型，到深度学习的崛起，再到Transformer架构，以及GPT系列及国内外通用大模型的进展。尽管GLMs在多个领域取得了显著进展，但其发展也面临诸多挑战，包括计算资源需求、数据偏见与伦理问题及模型的解释性与透明性。分析了这些挑战，并探讨了GLMs未来发展的5个关键方向：模型优化、多模态学习、具情感大模型、数据与知识双驱动以及伦理与社会影响。通过这些策略，通用大模型有望在未来实现更广泛和深入的应用，推动人工智能技术的持续进步。

关键词： 通用大模型; 人工智能; 深度学习; Transformer架构; GPT系列

本文引用格式

任福继 , 张彦如 . 通用大模型演进路线[J]. 科技导报, 2024 , 42(12) : 44 -50 . DOI: 10.3981/j.issn.1000-7857.2024.05.00531

Abstract

With the rapid development of artificial intelligence (AI) technology, general large models (GLMs) have become a significant research focus in the AI field. GLMs typically possess an extensive number of parameters, are trained on massive datasets and exhibit robust learning and reasoning capabilities. These models demonstrate outstanding performance in various tasks, including natural language processing, image recognition, and code generation. This paper reviews the evolution of GLMs and the key technology nodes, from the early rule-based systems and traditional machine learning models to the rise of deep learning, the introduction of the Transformer architecture, and the advancements in the GPT series and other GLMS over the world. Despite the significant progress, GLMs face numerous challenges, such as high computational resource demands, data bias, ethical issues, and model interpretability and transparency. This paper analyzes these challenges and explores five key future development directions for GLMs: model optimization, multimodal learning, emotionally intelligent models, data and knowledge dual-driven models, and ethical and societal impacts. By adopting these strategies, GLMs are expected to achieve broader and deeper applications, driving continuous progress in AI technology.

Key words： general large models; artificial intelligence; deep learning; transformer architecture; GPT series

参考文献

[1] 陶建华, 聂帅, 车飞虎. 语言大模型的演进与启示[J]. 中国科学基金, 2023, 37(5):767-775.
[2] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge:MIT Press, 2017.
[3] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[EB/OL]. (2018-06-11)[2024-05-14]. https://openai.com/index/language-unsupervised.
[4] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[EB/OL]. (2019-02-14)[2024-05-14]. https://openai.com/index/better-languagemodels.
[5] Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[C]//Proceedings of the 33th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge:MIT Press, 2020:1877-1901.
[6] Achiam J, Adler S, Agarwal S, et al. Gpt-4 technical report[J/OL].[2024-05-04]. https://doi.org/10.48550/arXiv.2303.08774.
[7] Colin R, Noam S, Adam R, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of machine learning research, 2020, 21(140):1-67.
[8] Chowdhery A, Narang S, Devlin J, et al. Palm:Scaling language modeling with pathways[J]. Journal of Machine Learning Research, 2023, 24(240):1-113.
[9] Reid M, Savinov N, Teplyashin D, et al. Gemini 1.5:Unlocking multimodal understanding across millions of tokens of context[J/OL].[2024-04-25]. https://doi.org/10.48550/arXiv.2403.05530.
[10] Anthropic. The claude 3 model family:Opus, sonnet, haiku[EB/OL]. (2024-03-04)[2024-05-28]. https://wwwcdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc-618857627/Model_Card_Claude_3.pdf.
[11] Meta. Introducing Meta Llama 3:The most capable openly available LLM to date[EB/OL]. (2024-04-18)[2024-05-28]. https://ai.meta.com/blog/meta-llama-3.
[12] Du Z X, Qian Y J, Liu X, et al. GLM:General language model pretraining with autoregressive blank infilling[EB/OL].[2024-05-28]. http://arxiv.org/abs/2103.10360.
[13] Aleph Alpha. Luminous performance benchmarks[EB/OL].[2024-05-28]. https://aleph-alpha.com/luminousperformance-benchmarks.
[14] Rae J W, Borgeaud S, Cai T, et al. Scaling language models:Methods, analysis & insights from training gopher[EB/OL].[2024-05-28]. http://arxiv.org/abs/2112.11446.
[15] Hoffmann J, Borgeaud S, Mensch A, et al. Training compute-optimal large language models[EB/OL].[2024-05-28]. http://arxiv.org/abs/2203.15556.
[16] Le Scao T, Fan A, Akiki C, et al. Bloom:A 176b-parameter open-access multilingual language model[EB/OL].[2024-05-28]. http://arxiv.org/abs/2211.05100.
[17] Hoffmann J, Borgeaud S, Mensch A, et al. Training compute-optimal large language models[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge, USA:MIT Press, 2022:30016-30030.
[18] Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural network[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge:MIT Press, 2015.
[19] Dettmers T, Pagnoni A, Holtzman A, et al. Qlora:Efficient finetuning of quantized llms[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS). Cambridge:MIT Press, 2023:10088-10115.
[20] Wang Z Y, Huang S H, Liu Y X, et al. Democratizing reasoning ability:Tailored learning from large language model[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:Association for Computational Linguistics, 2023:1948-1966.
[21] 邵仁荣, 刘宇昂, 张伟, 等. 深度学习中知识蒸馏研究综述[J]. 计算机学报, 2022, 45(8):1638-1673.
[22] Qi J R, Fernández R, Bisazza A. Cross-lingual consistency of factual knowledge in multilingual language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:Association for Computational Linguistics, 2023:10650-10666.
[23] Shi Y X, Liu Z L, Shi Z, et al. Fairness-aware client selection for federated learning[C]//Proceedings of IEEE International Conference on Multimedia and Expo (ICME). Piscataway, NJ:IEEE, 2023:324-329.
[24] 古天龙, 李龙, 常亮, 等. 公平联邦学习及其设计研究综述[J]. 计算机学报, 2023, 46(9):1991-2024.
[25] Shanahan M, McDonell K, Reynolds L. Role play with large language models[J]. Nature, 2023, 623(7987):493-498.
[26] Ethayarajh K, Jurafsky D. Attention flows are shapley value explanations[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2:Short Papers). Stroudsburg, PA, USA:Association for Computational Linguistics, 2021:49-54.
[27] 鞠天杰, 刘功申, 张倬胜, 等. 自然语言处理中的探针可解释方法综述[J]. 计算机学报, 2024, 47(4):733-758.
[28] 刘学博, 户保田, 陈科海, 等. 大模型关键技术与未来发展方向:从ChatGPT谈起[J]. 中国科学基金, 2023, 37(5):758-766.
[29] Du N, Huang Y, Dai A M, et al. Glam:Efficient scaling of language models with mixture-of-experts[C]//Proceedings of the 39th International Conference on Machine Learning(ICML). NY:PMLR, 2022:5547-5569.
[30] Rajbhandari S, Li C L, Yao Z, et al. DeepSpeed-MoE:Advancing mixture-of-experts inference and training to power next-generation AI scale[C]//Proceedings of the 39th International conference on machine learning(ICML). NY:PMLR, 2022:18332-18346.
[31] Ren F J. Construction of metaverse center based on advanced intelligence, 2022-Annual-Report-EU-Academy-of-Sciences[R]. Salzburg:European Academy of Sciences, 2023:106-116.
[32] Ren F J, Shi H. A general ontology based multi-lingual multi-function multimedia intelligent system[C]//Proceedings of SMC 2000 Conference Proceedings. 2000 IEEE International Conference on Systems, Man and Cybernetics. 'Cybernetics Evolving to Systems, Humans, Organizations, and their Complex Interactions' (Cat. No. 00CH37166). Piscataway, NJ:IEEE, 2000:2362-2368.

Options

文章导航

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计