深度学习:多层神经网络的复兴与变革

山世光; 阚美娜; 刘昕; 刘梦怡; 邬书哲

doi:10.3981/j.issn.1000-7857.2016.14.007

科技导报 >

2016 , Vol. 34 >Issue 14: 60 - 70

DOI: https://doi.org/10.3981/j.issn.1000-7857.2016.14.007

专题论文

深度学习:多层神经网络的复兴与变革

山世光 ,
阚美娜 ,
刘昕 ,
刘梦怡 ,
邬书哲

展开

中国科学院计算技术研究所, 北京 100190

山世光,研究员,研究方向为图像处理、计算机视觉、模式识别、人机交互,电子信箱:sgshan@ict.ac.cn

收稿日期: 2016-05-30

修回日期: 2016-06-30

网络出版日期: 2016-08-18

收起

Deep learning: The revival and transformation of multi layer neural networks

SHAN Shiguang ,
KAN Meina ,
LIU Xin ,
LIU Mengyi ,
WU Shuzhe

Expand

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Received date: 2016-05-30

Revised date: 2016-06-30

Online published: 2016-08-18

Fold

摘要

人工智能（AI）已经进入一个新的蓬勃发展期。推动这一轮AI 狂澜的是三大引擎，即深度学习（DL）、大数据和大规模并行计算，其中又以DL 为核心。本文回顾本轮“深度神经网络复兴”的基本情况，概要介绍常用的4 种深度模型，即：深度信念网络（DBN）、深度自编码网络（DAN）、深度卷积神经网络（DCNN）及长短期记忆递归神经网络（LSTM-RNN）。简要介绍深度学习在语音识别和计算机视觉领域几个重要任务上的应用效果情况。为便于应用DL，介绍了几种常用的深度学习开源平台。对深度学习带来的启示和变革做了一些开放式的评述，讨论了该领域的开放问题和发展趋势。

关键词： 深度神经网络; 深度信念网络; 深度自编码网络; 深度卷积神经网络; 长短期记忆递归神经网络; 语音识别; 计算机视觉

本文引用格式

山世光 , 阚美娜 , 刘昕 , 刘梦怡 , 邬书哲 . 深度学习:多层神经网络的复兴与变革[J]. 科技导报, 2016 , 34(14) : 60 -70 . DOI: 10.3981/j.issn.1000-7857.2016.14.007

Abstract

Artificial intelligence (AI) has entered a new period of vigorous development. This round of AI topsy is driven by three engines, namely the depth of learning (DL), big data and massively parallel computing, with DL as the core. This article reviews from a historical perspective the basic situation of the round "deep neural networks renaissance", then summarizes the four common depth models: deep belief network (DBN), depth from network coding (DAN), deep convolutional neural networks (DCNN) and long short term memory recurrent neural network LSTM-RNN. After that, this paper briefly introduces the application effects of deep learning in speech recognition and computer vision. In order to facilitate the application of DL, it also introduces several commonly used deep learning platforms. Finally, the enlightenment and reform of deep learning are commented, and the open problems and development trend in this field are discussed.

Key words： multilayer neural networks; DBN; DAN; DCNN; LSTM-RNN; speech recognition; computer vision

参考文献

[1] McCulloch W S, Pitts W. A logical calculus of the ideas immanent in nervous activity[J]. Bulletin of Mathematical Biophysics, 1943, 5(4): 115-133,
[2] Hebb D O. The organization of behavior[M]. New York: Wiley, 1949.
[3] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65(6): 386-408,.
[4] Rumelhart D E, Hinton G E, Williams R J. Learning internal representations by error propagation[J]. Nature, 323, 1986, doi: 10.1016/B978-1-4832-1446-7.50035-2.
[5] Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators[J]. Neural Networks, 1989, 2(2): 359-366,.
[6] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313: 504-507.
[7] Hinton G E, Osindero S, Teh Y. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18: 1527-1554.
[8] Bengio Y, Lamblin P, Popovici D, et al. Greedy Layer-Wise training of deep networks[M]//Advances in Neural Information Processing Systems 19 (NIPS' 06), Cambridge MA: MIT Press, 2007: 153-160.
[9] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[M]//Advances in Neural Information Processing Systems (NIPS). Cambridge MA: MIT Press, 2012: 1097-1105.
[10] Szegedy C, Liu W, Jia Y Q. et al. Vincent vanhoucke and andrew rabinovich. Going deeper with convolutions[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015: 1-9.
[11] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, June 26-July 1, 2016.
[12] LeCun Y, Boser B, Denker J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541-551.
[13] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86: 2278-2324,.
[14] Cognitron F K. A self-organizing multilayered neural network[J]. Biological Cybernetics 1975, 20: 121-136.
[15] Neocognitron F K. A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biological Cybernet-ics 1980, 36: 193-202.
[16] Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in cat's visual cortex[J]. Journal of Physiology(london), 1962, 160: 106-154.
[17] Salakhutdinov R, Hinton G E. Deep boltzmann machines[C]. International Conference on Artificial Intelligence and Statistics (AISTATS), Clearwater, April 16-18, 2009: 448-455.
[18] Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criteri-on[J]. Journal of Machine Learning Research (JMLR), 2010, 11: 3371-3408,.
[19] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]. International Conference on Learning Representations (ICLR), San Diego, CA, May 7-9, 2015.
[20] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]. The IEEE Conference on Computer Vision and Pattern Rec-ognition (CVPR), Boston, MA, June 7-12, 2015: 3431-3440.
[21] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[22] Gers F A. Schmidhuber J. Recurrent nets that time and count[C]. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Net-works. Italy, July 24-27, 2000.
[23] Cho K, van Merrienboer B, Bahdanau D, et al. On the properties of neural machine translation: Encoder-decoder approaches[R]. Semantics and Struc-ture in Statistical Translation, Doha, Qatar, October 25, 2014.
[24] Lee H, Pham P T, Largman Yan, et al. Unsupervised feature learning for audio classification using convolutional deep belief networks[C]. Advances in Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada. December 7-10, 2009.
[25] Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 30-42.
[26] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]. IEEE Conference on Comput-er Vision and Pattern Recognition, Las Vegas, June 26-July 1, 2016.
[27] Girshick R, Donahue J, Darrell T, et al. Region-based convolutional networks for accurate object detection and semantic segmentation[M]. Redmond: IEEE Computer Society, 2015: 142-158.
[28] Girshick R. Fast R-CNN[C]. IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 13-16, 2015.
[29] Chen L C. Semantic image segmentation with deep convolutional nets and fully connected CRFs[C]. ICLR 2015, San Diego, May 7-9, 2015.
[30] Zheng S. Conditional random fields as recurrent neural networks[C]. IEEE International Conference on Computer Vision (ICCV), , Santiago, Chile, December 13-16, 2015.
[31] Huang G B. Ramesh M, Berg T, et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments[R]. Technical Report 07-49, Amherst: University of Massachusetts , 2007.
[32] Chen D, Cao X, Wen F, et al. Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification[C]. IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR), Portland, June 23-28, 2013.
[33] Taigman Y, Yang M. Marc'aurelio ranzato and lior wolf. DeepFace: closing the gap to human-level performance in face verification[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, June 24-29, 2014.
[34] Sun Y, Chen Y H, Wang X G, et al. Deep learning face representation by joint identification-verification[C]//Advances in Neural Information Processing Systems (NIPS), 2014: 1988-1996,
[35] Sun Y, Wang X G, Tang X O. Deeply learned face representations are sparse, selective, and robust[C]. IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[36] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[37] Ali F. Every picture tells a story: Generating sentences from images[C]//Proceedings of the 11th European conference on Computer vision: Part IV. Berlin: Springer-Verlag, 2015: 15-29.
[38] Gaurav K. Babytalk: Understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2891-2903.
[39] Vinyals O. Show and tell: A neural image caption generator[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[40] Karpathy A. Deep visual-semantic alignments for generating image descriptions[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[41] Donahue J. Long-term recurrent convolutional networks for visual recognition and description[C]. IEEE Conference on Computer Vision and Pattern Rec-ognition (CVPR), Boston, MA, June 7-12, 2015.

Options

文章导航

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计