Deep learning: The revival and transformation of multi layer neural networks

SHAN Shiguang; KAN Meina; LIU Xin; LIU Mengyi; WU Shuzhe

doi:10.3981/j.issn.1000-7857.2016.14.007

Science & Technology Review >

2016 , Vol. 34 >Issue 14: 60 - 70

DOI: https://doi.org/10.3981/j.issn.1000-7857.2016.14.007

Special Issues

Deep learning: The revival and transformation of multi layer neural networks

SHAN Shiguang ,
KAN Meina ,
LIU Xin ,
LIU Mengyi ,
WU Shuzhe

Expand

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Received date: 2016-05-30

Revised date: 2016-06-30

Online published: 2016-08-18

Fold

Abstract

Artificial intelligence (AI) has entered a new period of vigorous development. This round of AI topsy is driven by three engines, namely the depth of learning (DL), big data and massively parallel computing, with DL as the core. This article reviews from a historical perspective the basic situation of the round "deep neural networks renaissance", then summarizes the four common depth models: deep belief network (DBN), depth from network coding (DAN), deep convolutional neural networks (DCNN) and long short term memory recurrent neural network LSTM-RNN. After that, this paper briefly introduces the application effects of deep learning in speech recognition and computer vision. In order to facilitate the application of DL, it also introduces several commonly used deep learning platforms. Finally, the enlightenment and reform of deep learning are commented, and the open problems and development trend in this field are discussed.

Key words： multilayer neural networks; DBN; DAN; DCNN; LSTM-RNN; speech recognition; computer vision

Cite this article

SHAN Shiguang , KAN Meina , LIU Xin , LIU Mengyi , WU Shuzhe . Deep learning: The revival and transformation of multi layer neural networks[J]. Science & Technology Review, 2016 , 34(14) : 60 -70 . DOI: 10.3981/j.issn.1000-7857.2016.14.007

References

[1] McCulloch W S, Pitts W. A logical calculus of the ideas immanent in nervous activity[J]. Bulletin of Mathematical Biophysics, 1943, 5(4): 115-133,
[2] Hebb D O. The organization of behavior[M]. New York: Wiley, 1949.
[3] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65(6): 386-408,.
[4] Rumelhart D E, Hinton G E, Williams R J. Learning internal representations by error propagation[J]. Nature, 323, 1986, doi: 10.1016/B978-1-4832-1446-7.50035-2.
[5] Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators[J]. Neural Networks, 1989, 2(2): 359-366,.
[6] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313: 504-507.
[7] Hinton G E, Osindero S, Teh Y. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18: 1527-1554.
[8] Bengio Y, Lamblin P, Popovici D, et al. Greedy Layer-Wise training of deep networks[M]//Advances in Neural Information Processing Systems 19 (NIPS' 06), Cambridge MA: MIT Press, 2007: 153-160.
[9] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[M]//Advances in Neural Information Processing Systems (NIPS). Cambridge MA: MIT Press, 2012: 1097-1105.
[10] Szegedy C, Liu W, Jia Y Q. et al. Vincent vanhoucke and andrew rabinovich. Going deeper with convolutions[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015: 1-9.
[11] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, June 26-July 1, 2016.
[12] LeCun Y, Boser B, Denker J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541-551.
[13] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86: 2278-2324,.
[14] Cognitron F K. A self-organizing multilayered neural network[J]. Biological Cybernetics 1975, 20: 121-136.
[15] Neocognitron F K. A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biological Cybernet-ics 1980, 36: 193-202.
[16] Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in cat's visual cortex[J]. Journal of Physiology(london), 1962, 160: 106-154.
[17] Salakhutdinov R, Hinton G E. Deep boltzmann machines[C]. International Conference on Artificial Intelligence and Statistics (AISTATS), Clearwater, April 16-18, 2009: 448-455.
[18] Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criteri-on[J]. Journal of Machine Learning Research (JMLR), 2010, 11: 3371-3408,.
[19] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]. International Conference on Learning Representations (ICLR), San Diego, CA, May 7-9, 2015.
[20] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]. The IEEE Conference on Computer Vision and Pattern Rec-ognition (CVPR), Boston, MA, June 7-12, 2015: 3431-3440.
[21] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[22] Gers F A. Schmidhuber J. Recurrent nets that time and count[C]. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Net-works. Italy, July 24-27, 2000.
[23] Cho K, van Merrienboer B, Bahdanau D, et al. On the properties of neural machine translation: Encoder-decoder approaches[R]. Semantics and Struc-ture in Statistical Translation, Doha, Qatar, October 25, 2014.
[24] Lee H, Pham P T, Largman Yan, et al. Unsupervised feature learning for audio classification using convolutional deep belief networks[C]. Advances in Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada. December 7-10, 2009.
[25] Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 30-42.
[26] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]. IEEE Conference on Comput-er Vision and Pattern Recognition, Las Vegas, June 26-July 1, 2016.
[27] Girshick R, Donahue J, Darrell T, et al. Region-based convolutional networks for accurate object detection and semantic segmentation[M]. Redmond: IEEE Computer Society, 2015: 142-158.
[28] Girshick R. Fast R-CNN[C]. IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 13-16, 2015.
[29] Chen L C. Semantic image segmentation with deep convolutional nets and fully connected CRFs[C]. ICLR 2015, San Diego, May 7-9, 2015.
[30] Zheng S. Conditional random fields as recurrent neural networks[C]. IEEE International Conference on Computer Vision (ICCV), , Santiago, Chile, December 13-16, 2015.
[31] Huang G B. Ramesh M, Berg T, et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments[R]. Technical Report 07-49, Amherst: University of Massachusetts , 2007.
[32] Chen D, Cao X, Wen F, et al. Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification[C]. IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR), Portland, June 23-28, 2013.
[33] Taigman Y, Yang M. Marc'aurelio ranzato and lior wolf. DeepFace: closing the gap to human-level performance in face verification[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, June 24-29, 2014.
[34] Sun Y, Chen Y H, Wang X G, et al. Deep learning face representation by joint identification-verification[C]//Advances in Neural Information Processing Systems (NIPS), 2014: 1988-1996,
[35] Sun Y, Wang X G, Tang X O. Deeply learned face representations are sparse, selective, and robust[C]. IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[36] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[37] Ali F. Every picture tells a story: Generating sentences from images[C]//Proceedings of the 11th European conference on Computer vision: Part IV. Berlin: Springer-Verlag, 2015: 15-29.
[38] Gaurav K. Babytalk: Understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2891-2903.
[39] Vinyals O. Show and tell: A neural image caption generator[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[40] Karpathy A. Deep visual-semantic alignments for generating image descriptions[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[41] Donahue J. Long-term recurrent convolutional networks for visual recognition and description[C]. IEEE Conference on Computer Vision and Pattern Rec-ognition (CVPR), Boston, MA, June 7-12, 2015.

Options

Outlines

Abstract

Cite this article

References

Contact

Visited

模态框（Modal）标题

Abstract

Cite this article

References

Contact

Visited