生成式人工智能的研究现状和发展趋势

车璐; 张志强; 周金佳; 李磊

doi:10.3981/j.issn.1000-7857.2024.01.00029

科技导报 >

2024 , Vol. 42 >Issue 12: 35 - 43

DOI: https://doi.org/10.3981/j.issn.1000-7857.2024.01.00029

专题：培育新质生产力助力高水平科技自立自强

生成式人工智能的研究现状和发展趋势

车璐 ,
张志强 ,
周金佳 ,
李磊

展开

1. 西南科技大学环境与资源学院, 绵阳 621000;
2. 西南科技大学计算机科学与技术学院, 绵阳 621000;
3. 法政大学, 东京 184-8584

车璐，博士研究生，研究方向为人工智能多源数据融合技术，电子信箱：chelu1994@swust.edu.cn；周金佳（通信作者），副教授，研究方向为生成式人工智能，电子信箱：zhou@hosei.ac.jp

收稿日期: 2024-01-31

修回日期: 2024-05-24

网络出版日期: 2024-07-10

基金资助

西南科技大学研究生创新基金项目（24ycx3004）

收起

The research status and development trends of generative artificial intelligence

CHE Lu ,
ZHANG Zhiqiang ,
ZHOU Jinjia ,
LI Lei

Expand

1. School of Environment and Resource, Southwest University of Science and Technology, Mianyang 621000, China;
2. School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621000, China;
3. Faculty of Science and Engineering, Hosie University, Tokyo 184-8584, Japan

Received date: 2024-01-31

Revised date: 2024-05-24

Online published: 2024-07-10

Fold

摘要

随着ChatGPT的问世，生成式人工智能研究在文本、图像和视频等多模态信息处理领域取得了突破性的进展，引起了广泛的关注。梳理了生成式人工智能的研究进展，并探讨了其未来发展趋势。具体包含3个部分：一是从自然语言模型、图像与多模态模型回顾生成式人工智能的发展历程和研究现状；二是探讨生成式人工智能在不同领域的应用前景，主要聚焦内容交流、辅助设计、内容创作和个性化定制4个方面；三是分析了生成式人工智能面临的主要挑战及未来的发展趋势。

关键词： 生成式人工智能; 自然语言; 多模态

本文引用格式

车璐 , 张志强 , 周金佳 , 李磊 . 生成式人工智能的研究现状和发展趋势[J]. 科技导报, 2024 , 42(12) : 35 -43 . DOI: 10.3981/j.issn.1000-7857.2024.01.00029

Abstract

With the advent of ChatGPT, the research of generative artificial intelligence (GAI) has made a breakthrough in the field of multimodal information processing, such as text, image, and video, and has attracted broad attention. This paper aims to systematically review the research progress of GAI and to discuss its future development trend. Being divided into three parts, the paper first reviewed the development history and research status of GAI in terms of natural language models, image and multimodal models; secondly, it discussed the application prospects of GAI in different fields, mainly focusing on content communication, assisted design, content creation, personalized customization, and etc. In the third part, with an in-depth analysis of the main challenges facing GAI, the author summarized the development trends of GAI in the future.

Key words： artificial intelligence generated content; natural language; multimodal

参考文献

[1] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model[J]. Advances in Neural Information Processing Systems, 2000, 13:932-938.
[2] Mikolov T, Karafiát M, Burget L, et al. Recurrent neural network based language model[C]//Interspeech. Baltimore:Johns Hopkins University, 2010, 2(3):1045-1048.
[3] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[4] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York:ACM, 2017:6000-6010.
[5] Devlin J, Chang M W, Lee K, et al. BERT:Pre-training of deep bidirectional transformers for language understanding[EB/OL].[2024-03-04]. http://arxiv.org/abs/1810.04805.
[6] Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[EB/OL]. (2018-06-11)[2024-03-04]. https://openai.com/blog/language-unsupervised.
[7] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[EB/OL]. (2019-02-14)[2024-03-04]. https://openai.com/blog/better-languagemodels.
[8] Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners[EB/OL].[2024-03-04]. http://arxiv.org/abs/2005.14165.
[9] Ouyang L, Wu J, Xu J, et al. Training language models to follow instructions with human feedback[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New York:ACM, 2022:27730-27744.
[10] OpenAI. ChatGPT[EB/OL]. (2022-11-30)[2024-03-04]. https://openai.com/index/chatgpt.
[11] Achiam J, Adler S, Agarwal S, et al. GPT-4 technical report[EB/OL]. (2023-03-15)[2024-03-04]. https://arxiv.org/abs/2303.08774.
[12] Ding N, Qin Y J, Yang G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models[J]. Nature Machine Intelligence, 2023, 5:220-235.
[13] Wu T, Luo L, Li Y F, et al. Continual learning for large language models:A survey[EB/OL]. (2024-02-07)[2024-03-10]. https://arxiv.org/abs/2402.01364.
[14] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90.
[15] Kingma D P, Welling M. Auto-encoding variational Bayes[EB/OL].[2024-03-15]. http://arxiv.org/abs/1312.6114.
[16] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. New York:ACM, 2014:2672-2680.
[17] Karras T, Laine S, Aila T M. A style-based generator architecture for generative adversarial networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ:IEEE, 2019:4401-4410.
[18] Karras T, Laine S, Aittala M, et al. Analyzing and improving the image quality of stylegan[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Piscataway, NJ:IEEE, 2020:8110-8119.
[19] Karras T, Aittala M, Laine S, et al. Alias-free generative adversarial networks[EB/OL].[2024-03-15]. http://arxiv.org/abs/2106.12423.
[20] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words:Transformers for image recognition at scale[EB/OL].[2024-04-11]. http://arxiv.org/abs/2010.11929.
[21] Liu Z, Lin Y T, Cao Y, et al. Swin transformer:Hierarchical vision transformer using shifted windows[EB/OL].[2024-04-13]. http://arxiv.org/abs/2103.14030.
[22] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[EB/OL].[2024-05-01]. http://arxiv.org/abs/2006.11239.
[23] Ramesh A, Pavlov M, Goh G, et al. Zero-shot text-toimage generation[EB/OL].[2024-05-01]. http://arxiv.org/abs/2102.12092.
[24] Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[EB/OL].[2024-05-05]. http://arxiv.org/abs/2103.00020.
[25] Ramesh A, Dhariwal P, Nichol A, et al. Hierarchical text-conditional image generation with CLIP latents[EB/OL].[2024-05-06]. http://arxiv.org/abs/2204.06125.
[26] Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models[EB/OL].[2023-12-21]. http://arxiv.org/abs/2112. 10752.
[27] David H. MidJourney[EB/OL]. (2022-07-12)[2023-12-21]. https://www.midjourney.com/explore.
[28] Esser P, Chiu J, Atighehchian P, et al. Structure and content-guided video synthesis with diffusion models[EB/OL]. (2023-02-06)[2024-05-11]. https://arxiv.org/abs/2302.03011.
[29] Demi G. Pika[EB/OL]. (2023-11-29)[2024-05-11]. https://pika.art.
[30] Zhang S, Wang J, Zhang Y, et al. I2vgen-xl:High-quality image-to-video synthesis via cascaded diffusion models[EB/OL].[2023-11-07]. https://arxiv.org/abs/2311.04145.

Options

文章导航

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计