专题:2023年科技热点回眸

2023年生成式AI大模型发展热点回眸

  • 邓佳文 ,
  • 任福继
展开
  • 电子科技大学计算机科学与工程学院, 成都 611731
邓佳文,助理研究员,研究方向为人工智能、大模型技术、情感计算等,电子信箱:dengjw@uestc.edu.cn

收稿日期: 2023-12-30

  修回日期: 2024-01-08

  网络出版日期: 2024-04-09

Review on hot topics of large generative AI models

  • DENG Jiawen ,
  • REN Fuji
Expand
  • School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Received date: 2023-12-30

  Revised date: 2024-01-08

  Online published: 2024-04-09

摘要

2023年,生成式大模型技术发展迅速,取得了一系列突破性的进展。回顾了2023年备受关注的大模型关键技术,包括大模型的能力涌现、多模态大语言模型的发展,以及大模型对齐和知识增强技术。介绍了大模型技术在医疗、教育等领域的垂直应用,以及对AI智能体与元宇宙技术发展的促进作用。此外,从数据隐私、有偏价值观、版权争议及虚假新闻传播等方面讨论了生成式大模型技术面临的安全挑战和发展趋势。

本文引用格式

邓佳文 , 任福继 . 2023年生成式AI大模型发展热点回眸[J]. 科技导报, 2024 , 42(1) : 266 -285 . DOI: 10.3981/j.issn.1000-7857.2024.01.017

Abstract

In 2023 the development of large generative AI models achieved a series of breakthroughs. This paper reviews the key technologies of large models that have received significant attention, including the emergent abilities of large language model(LLM), development of multimodal LLMs, as well as alignment and knowledge enhancement techniques of large models. The paper also introduces the vertical applications of generative AI in various fields such as healthcare and education, and its role in advancing the development of AI agents and metaverse technologies. Additionally, the challenges and developmental trends in generative AI are discussed, including issues related to data privacy, biased values, copyright disputes, and fake news propagation.

参考文献

[1] OpenAI. GPT-4 technical report[EB/OL].[2023-12-19].https://arxiv.org/abs/2303.08774.
[2] Touvron H, Lavril T, Izacard G, et al. Llama:Open and efficient foundation language models[EB/OL].[2023-02-27]. https://arxiv.org/abs/2302.13971.
[3] Touvron H, Martin L, Stone K, et al. Llama 2:Open foundation and fine-tuned chat models[EB/OL].[2023-07-18]. https://arxiv.org/abs/2307.09288.
[4] Team G, Anil R, Borgeaud S, et al. Gemini:A family of highly capable multimodal models[EB/OL].[2023-12-19]. https://arxiv.org/abs/2312.11805.
[5] Zeng A, Liu X, Du Z, et al. Glm-130b:An open bilingual pre-trained model[EB/OL].[2023-10-05]. https://arxiv.org/abs/2210.02414.
[6] Sun T, Zhang X, He Z, et al. Moss:Training conversational language models from synthetic data[EB/OL].[2023-09-08]. https://github.com/OpenLMLab/MOSS.
[7] 徐晓刚,吴慧雯,刘竹森,等.生成式大模型安全与隐私白皮书[EB/OL].[2023-06-06]. https://download.s21i.faiusr.com/13115299/0/1/ABUIABA9GAAg3M6mpAYonqjXGg.pdf?f.
[8] 深圳市人工智能行业协会. 2023人工智能发展白皮书[EB/OL].[2023-09-22]. http://lib.ia.ac.cn/news/newsdetail/68670.
[9] 北京市科学技术委员会,中关村科技园区管理委员会.北京市人工智能行业大模型创新应用白皮书(2023年)[EB/OL].[2023-11-29]. https://www.beijing.gov.cn/ywdt/gzdt/202311/t20231129_3321720.html.
[10] Wei J, Tay Y, Bommasani R, et al. Emergent abilities of large language models[EB/OL].[2023-10-26]. https://arxiv.org/abs/2206.07682.
[11] Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models[J].Advances in Neural Information Processing Systems,2022, 35:24824-24837.
[12] Kojima T, Gu S S, Reid M, et al. Large language models are zero-shot reasoners[J]. Advances in Neural Information Processing Systems, 2022, 35:22199-22213.
[13] Wang X, Wei J, Schuurmans D, et al. Self-consistency improves chain of thought reasoning in language models[EB/OL].[2023-03-21]. https://arxiv.org/abs/2203.11171.
[14] Wang L, Xu W, Lan Y, et al. Plan-and-solve prompting:Improving zero-shot chain-of-thought reasoning by large language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Canada:Association for Computational Linguistics,2023:2609-2634.
[15] Xi Z, Jin S, Zhou Y, et al. Self-Polish:Enhance reasoning in large language models via problem refinement[EB/OL].[2023-05-23]. https://arxiv.org/abs/2305.14497.
[16] Madaan A, Tandon N, Gupta P, et al. Self-refine:Iterative refinement with self-feedback[EB/OL].[2023-03-30]. https://arxiv.org/abs/2303.17651.
[17] Creswell A, Shanahan M, Higgins I. Selection-inference:Exploiting large language models for interpretable logical reasoning[EB/OL].[2023-05-19]. https://arxiv.org/abs/2205.09712.
[18] Schaeffer R, Miranda B, Koyejo S. Are emergent abilities of large language models a mirage?[EB/OL].[2023-04-28]. https://arxiv.org/abs/2304.15004.
[19] Liu H, Li C, Wu Q, et al. Visual instruction tuning[EB/OL].[2023-04-17]. https://arxiv.org/abs/2304.08485.
[20] Zhu D, Chen J, Shen X, et al. Minigpt-4:Enhancing vision-language understanding with advanced large language models[EB/OL].[2023-04-20]. https://arxiv. org/abs/2304.10592.
[21] Wang W, Lv Q, Yu W, et al. Cogvlm:Visual expert for pretrained language models[EB/OL].[2023-11-06].https://arxiv.org/abs/2311.03079.
[22] Yin S, Fu C, Zhao S, et al. A Survey on multimodal large language models[EB/OL].[2023-06-23]. https://arxiv.org/abs/2306.13549.
[23] Yang Z, Gan Z, Wang J, et al. An empirical study of GPT-3 for few-shot knowledge-based vqa[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Washington DC:AAAI Press, 2022, 36(3):3081-3089.
[24] Wu C, Yin S, Qi W, et al. Visual ChatGPT:Talking,drawing and editing with visual foundation models[EB/OL].[2023-03-08]. https://arxiv.org/abs/2303.04671.
[25] Xu Z, Shen Y, Huang L. Multiinstruct:Improving multimodal zero-shot learning via instruction tuning[EB/OL].[2023-12-21]. https://arxiv.org/abs/2212.10773.
[26] Zhang R, Han J, Zhou A, et al. Llama-adapter:Efficient fine-tuning of language models with zero-init attention[EB/OL].[2023-03-28]. https://arxiv.org/abs/2303.16199.
[27] Gao P, Han J, Zhang R, et al. Llama-adapter v2:Parameter-efficient visual instruction model[EB/OL].[2023-04-28]. https://arxiv.org/abs/2304.15010.
[28] Luo G, Zhou Y, Ren T, et al. Cheap and quick:Efficient vision-language instruction tuning for large language models[EB/OL].[2023-05-24]. https://arxiv.org/abs/2305.15023.
[29] Wang Z, Wang L, Zhao Z, et al. GPT4Video:A unified multimodal large language model for lnstruction-followed understanding and safety-aware generation[EB/OL].[2023-11-25]. https://arxiv.org/abs/2311.16511.
[30] Girdhar R, El-Nouby A, Liu Z, et al. Imagebind:One embedding space to bind them all[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Van Couver:IEEE, 2023:15180-15190.
[31] Han J, Zhang R, Shao W, et al. Imagebind-llm:Multimodality instruction tuning[EB/OL].[2023-09-07]. https://arxiv.org/abs/2309.03905.
[32] Center for AI Safety. Statement on AI risk[EB/OL].[2023-05-30]. https://www.safe.ai/statement-on-ai-risk.
[33] Stiennon N, Ouyang L, Wu J, et al. Learning to summarize with human feedback[J]. Advances in Neural Information Processing Systems, 2020, 33:3008-3021.
[34] Zheng R, Dou S, Gao S, et al. Secrets of RLHF in large language models part I:PPO[EB/OL].[2023-07-11].https://arxiv.org/abs/2307.04964.
[35] Ji Z, Lee N, Frieske R, et al. Survey of hallucination in natural language generation[J]. ACM Computing Surveys, 2023, 55(12):1-38.
[36] Agrawal G, Kumarage T, Alghami Z, et al. Can knowledge graphs reduce hallucinations in LLMs? A Survey[EB/OL].[2023-11-14].https://arxiv. org/abs/2311.07914.
[37] Izacard G, Lewis P, Lomeli M, et al. Few-shot learning with retrieval augmented language models[EB/OL].[2023-08-05]. https://arxiv.org/abs/2208.03299.
[38] Qian H, Zhu Y, Dou Z, et al. WebBrain:Learning to generate factually correct articles for queries by grounding on large web corpus[EB/OL].[2023-04-10]. https://arxiv.org/abs/2304.04358.
[39] Liu X, Lai H, Yu H, et al. WebGLM:Towards an efficient web-enhanced question answering system with human preferences[C]//KDD'23:Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. CA, USA:ACM, 2023:4549-4560.
[40] Shi W, Min S, Yasunaga M, et al. Replug:Retrieval-augmented black-box language models[EB/OL].[2023-01-30]. https://arxiv.org/abs/2301.12652.
[41] Ma X, Gong Y, He P, et al. Query rewriting for retrievalaugmented large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Lan-guage Processing. Singapore:Association for Computational Linguistics, 2023:5303-5315.
[42] Liu J, Jin J, Wang Z, et al. RETA-LLM:A retrievalaugmented large language model toolkit[EB/OL].[2023-06-08]. https://arxiv.org/abs/2306.05212.
[43] Elizabeth Reid. Supercharging search with generative AI[EB/OL].[2023-05-10]. https://blog.google/products/search/generative-ai-search/.
[44] 医联.医联推出国内首款AI医生medGPT[EB/OL].[2023-04-28]. https://www.medlinker.com/news/198.
[45] Zhou J, Chen Z, Wan D, et al. CharacterGLM:Customizing Chinese conversational AI Characters with large language models[EB/OL].[2023-11-08]. https://arxiv.org/abs/2311.16832.
[46] Park J S, O'Brien J, Cai C J, et al. Generative agents:Interactive simulacra of human behavior[C]//Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. New York:Association for Computing Machinery, 2023:1-22.
[47] Xi Z, Chen W, Guo X, et al. The rise and potential of large language model based agents:A survey[EB/OL].[2023-09-14]. https://arxiv.org/abs/2309.07864.
[48] Modarressi A, Imani A, Fayyaz M, et al. RET-LLM:Towards a general read-write memory for large language models[EB/OL].[2023-05-23]. https://arxiv.org/abs/2305.14322.
[49] Schick T, Dwivedi-Yu J, DessìR, et al. Toolformer:Language models can teach themselves to use tools[EB/OL].[2023-02-09]. https://arxiv.org/abs/2302.04761.
[50] Gravitas S. Auto-GPT:An Autonomous GPT-4 experiment[EB/OL].[2023-04-08]. https://github.com/antony0-596/auto-gpt.2023.
[51] Wang Y, Su Z, Zhang N, et al. A survey on metaverse:Fundamentals, security, and privacy[J]. IEEE Communications Surveys&Tutorials, 2022, 25(1):319-352.
[52] 上海市科学技术委员会.上海市“元宇宙”关键技术攻关行动方案(2023-2025年)[EB/OL].[2023-06-13].https://stcsm.sh.gov.cn/cmsres/f5/f570dd0a6a334f06a646-ec9a34e94a70/db06f664db4f1b186dac1f9ef6ebb98d.pdf.
[53] 百度Create2022:希壤发布元宇宙底座MetaStack[EB/OL].[2023-01-11]. https://cloud.baidu.com/news/news_dc9cf47a-e84d-4052-9ddc-634fd997dafe.
[54] Ren F J. Construction of Metaverse Center based on Advanced Intelligenceg[EB/OL].[2023-12-28]. https://www.interpaper.org/2022-Annual-Report-EU-Academy-ofSciences.pdf.
[55] Chen C, Feng X, Zhou J, et al. Federated large language model:A position paper[EB/OL].[2023-07-18]. https://arxiv.org/abs/2307.08925.
[56] Sun H, Zhang Z, Deng J, et al. Safety assessment of Chinese large language models[EB/OL].[2023-04-20]. https://arxiv.org/abs/2304.10436.
[57] Rozado D. The political biases of ChatGPT[J]. Social Sciences, 2023, 12(3):148.
[58] Sven G, Pushmeet K. Identifying AI-generated images with SynthID[EB/OL].[2023-08-29]. https://deepmind.google/discover/blog/identifying-ai-generated-imageswith-synthid/.
[59] Fernandez P, Couairon G, Jégou H, et al. The stable signature:Rooting watermarks in latent diffusion models[EB/OL].[2023-03-27]. https://arxiv.org/abs/2303.15435.
[60] 刘明录,郑彦,韩雪,等.基于生成式因果语言模型的水印嵌入与检测[J].电信科学, 2023, 39(9):32-42.
[61] 王一博,郭鑫,刘智锋,等. AI生成与学者撰写中文论文摘要的检测与差异性比较研究[J].情报杂志, 2023,42(9):127-134.
[62] Chen X, Jin P, Jing S, et al. Automatic Detection of Chinese Generated essayss based on pre-trained BERT[C]//Proceedings of the Joint International Information Technology and Artificial Intelligence Conference. Chongqing:IEEE, 2022, 10:2257-2260.
[63] Guo Z, Schlichtkrull M, Vlachos A. A survey on automated fact-checking[J]. Transactions of the Association for Computational Linguistics, 2022, 10:178-206.
[64] Zellers R, Holtzman A, Rashkin H, et al. Defending against neural fake news[C]//Proceedings of the International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2019:9054-9065.
[65] Morris M R, Sohl-dickstein J, Fiedel N, et al. Levels of AGI:Operationalizing progress on the path to AGI[EB/OL].[2023-11-04]. http://arxiv.org/abs/2311.02462.
文章导航

/