Advances in deep learning technology and the development of the digital economy have promoted the development of artificial intelligence-generated content (AIGC) technologies such as virtual humans. Cross-domain face synthesis is one of the key technologies in virtual human production, and it has a wide range of applications in social media, film and television production and other fields. This paper summarizes the origin of cross-domain face synthesis technology, and its typical task types and difficulties, technological development and challenges, potential applications, and issues, and discusses its future development trend and challenges from the aspects of self-supervised and weakly supervised cross-domain synthesis, utilization of pre-trained large models, and privacy protection.
LIU Qi
,
WU Haozhan
,
XIE Tianxin
,
HAN Hu
. Research progress and trend of cross-domain face synthesis technology[J]. Science & Technology Review, 2023
, 41(16)
: 113
-123
.
DOI: 10.3981/j.issn.1000-7857.2023.16.010
[1] Blanz V, Vetter T. A morphable model for the synthesis of 3D faces[C]//Proceedings of the 26th annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1999: 187-194.
[2] 苏从勇, 庄越挺, 黄丽, 等. 基于正交图像生成人脸模型的合成分析方法[J]. 浙江大学学报(工学版), 2005, 39(2): 175-179.
[3] Tran A T, Hassner T, Masi I, et al. Regressing robust and discriminative 3d morphable models with a very deep neural network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1493-1502.
[4] Tang X O, Wang X G. Face photo recognition using sketch[C]//Proceedings of International Conference on Image Processing. Piscataway: IEEE Press, 2002.
[5] Williams I. Performance-driven facial animation[C]//SIGGRAPH '06: ACM SIGGRAPH 2006 Courses. New York: ACM, 2006.
[6] Gleicher M. Animation from observation[J]. ACM SIGGRAPH Computer Graphics, 1999, 33(4): 51-54.
[7] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems. Montreal: Curran Associates Inc., 2014: 2672-2680.
[8] Karras T, Laine S, Aila T M. A style-based generator architecture for generative adversarial networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 4396-4405.
[9] Mildenhall B, Srinivasan P P, Tancik M, et al. NeRF: Representing scenes as neural radiance fields for view synthesis[C]//Vedaldi A, Bischof H, Brox T, et al. European Conference on Computer Vision. Cham: Springer, 2020: 405-421.
[10] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models [C]//Advances in Neural Information Processing Systems. Virtual: Curran Associates Inc., 2020: 6840-6851.
[11] Yu S K, Han H, Shan S G, et al. CMOS-GAN: Semi-supervised generative adversarial model for cross-modality face image synthesis[J]. IEEE Transactions on Image Processing, 2023, 32: 144-158.
[12] Hou A, Zhang Z, Sarkis M, et al. Towards high fidelity face relighting with realistic shadows[C]//2021 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2021: 14714-14723.
[13] Wei Y X, Liu M, Wang H L, et al. Learning flow-based feature warping for face frontalization with illumination inconsistent supervision[C]//European Conference on Computer Vision. Cham: Springer, 2020: 558-574.
[14] Shen Y J, Yang C Y, Tang X O, et al. InterFaceGAN: Interpreting the disentangled face representation learned by GANs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(4): 2004-2018.
[15] Thies J, Zollhöfer M, Stamminger M, et al. Face2Face: Real-time face capture and reenactment of RGB videos[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2016: 2387-2395.
[16] Saharia C, Ho J, Chan W, et al. Image super-resolution via iterative refinement[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(4): 4713-4726.
[17] Yuan X W, Park I K. Face de-occlusion using 3D morphable model and generative adversarial network[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2020: 10061-10070.
[18] Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[DB/OL]. arXiv preprint: 1412.6572, 2014.
[19] Deng Y, Yang J L, Xu S C, et al. Accurate 3D face reconstruction with weakly-supervised learning: From single image to image set[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW). Piscataway: IEEE Press, 2020: 285-295.
[20] Tang X O, Wang X G. Face sketch synthesis and recognition[C]// Proceedings Ninth IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2008: 687-694.
[21] Kingma D P, Welling M. Auto-encoding variational bayes[C]// International Conference on Learning Representations 2014. Banff, AB, Canada: 2014.
[22] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851.
[23] 肖冰 . 人脸画像——照片的合成与识别方法研究[D].西安: 西安电子科技大学, 2010.
[24] 黄法秀, 张世杰, 吴志红, 等 . 数据增广下的人脸识别研究[J]. 计算机技术与发展, 2020, 30(3): 67-72.
[25] Kirillov A, Mintun E, Ravi N, et al. Segment anything[DB/OL]. arXiv preprint: 2304.02643, 2023.
[26] Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[DB/OL]. arXiv preprint: 2103.00020, 2022.
[27] 马玉琨 . 基于人脸的安全身份认证关键技术研究[D]. 北京: 北京工业大学, 2018.