[1] McCulloch W S, Pitts W. A logical calculus of the ideas immanent in nervous activity[J]. Bulletin of Mathematical Biophysics, 1943, 5(4): 115-133,
[2] Hebb D O. The organization of behavior[M]. New York: Wiley, 1949.
[3] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65(6): 386-408,.
[4] Rumelhart D E, Hinton G E, Williams R J. Learning internal representations by error propagation[J]. Nature, 323, 1986, doi: 10.1016/B978-1-4832-1446-7.50035-2.
[5] Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators[J]. Neural Networks, 1989, 2(2): 359-366,.
[6] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313: 504-507.
[7] Hinton G E, Osindero S, Teh Y. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18: 1527-1554.
[8] Bengio Y, Lamblin P, Popovici D, et al. Greedy Layer-Wise training of deep networks[M]//Advances in Neural Information Processing Systems 19 (NIPS' 06), Cambridge MA: MIT Press, 2007: 153-160.
[9] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[M]//Advances in Neural Information Processing Systems (NIPS). Cambridge MA: MIT Press, 2012: 1097-1105.
[10] Szegedy C, Liu W, Jia Y Q. et al. Vincent vanhoucke and andrew rabinovich. Going deeper with convolutions[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015: 1-9.
[11] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, June 26-July 1, 2016.
[12] LeCun Y, Boser B, Denker J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541-551.
[13] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86: 2278-2324,.
[14] Cognitron F K. A self-organizing multilayered neural network[J]. Biological Cybernetics 1975, 20: 121-136.
[15] Neocognitron F K. A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position[J]. Biological Cybernet-ics 1980, 36: 193-202.
[16] Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in cat's visual cortex[J]. Journal of Physiology(london), 1962, 160: 106-154.
[17] Salakhutdinov R, Hinton G E. Deep boltzmann machines[C]. International Conference on Artificial Intelligence and Statistics (AISTATS), Clearwater, April 16-18, 2009: 448-455.
[18] Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criteri-on[J]. Journal of Machine Learning Research (JMLR), 2010, 11: 3371-3408,.
[19] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]. International Conference on Learning Representations (ICLR), San Diego, CA, May 7-9, 2015.
[20] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]. The IEEE Conference on Computer Vision and Pattern Rec-ognition (CVPR), Boston, MA, June 7-12, 2015: 3431-3440.
[21] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[22] Gers F A. Schmidhuber J. Recurrent nets that time and count[C]. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Net-works. Italy, July 24-27, 2000.
[23] Cho K, van Merrienboer B, Bahdanau D, et al. On the properties of neural machine translation: Encoder-decoder approaches[R]. Semantics and Struc-ture in Statistical Translation, Doha, Qatar, October 25, 2014.
[24] Lee H, Pham P T, Largman Yan, et al. Unsupervised feature learning for audio classification using convolutional deep belief networks[C]. Advances in Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada. December 7-10, 2009.
[25] Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 30-42.
[26] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]. IEEE Conference on Comput-er Vision and Pattern Recognition, Las Vegas, June 26-July 1, 2016.
[27] Girshick R, Donahue J, Darrell T, et al. Region-based convolutional networks for accurate object detection and semantic segmentation[M]. Redmond: IEEE Computer Society, 2015: 142-158.
[28] Girshick R. Fast R-CNN[C]. IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, December 13-16, 2015.
[29] Chen L C. Semantic image segmentation with deep convolutional nets and fully connected CRFs[C]. ICLR 2015, San Diego, May 7-9, 2015.
[30] Zheng S. Conditional random fields as recurrent neural networks[C]. IEEE International Conference on Computer Vision (ICCV), , Santiago, Chile, December 13-16, 2015.
[31] Huang G B. Ramesh M, Berg T, et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments[R]. Technical Report 07-49, Amherst: University of Massachusetts , 2007.
[32] Chen D, Cao X, Wen F, et al. Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification[C]. IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR), Portland, June 23-28, 2013.
[33] Taigman Y, Yang M. Marc'aurelio ranzato and lior wolf. DeepFace: closing the gap to human-level performance in face verification[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, June 24-29, 2014.
[34] Sun Y, Chen Y H, Wang X G, et al. Deep learning face representation by joint identification-verification[C]//Advances in Neural Information Processing Systems (NIPS), 2014: 1988-1996,
[35] Sun Y, Wang X G, Tang X O. Deeply learned face representations are sparse, selective, and robust[C]. IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[36] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[37] Ali F. Every picture tells a story: Generating sentences from images[C]//Proceedings of the 11th European conference on Computer vision: Part IV. Berlin: Springer-Verlag, 2015: 15-29.
[38] Gaurav K. Babytalk: Understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2891-2903.
[39] Vinyals O. Show and tell: A neural image caption generator[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[40] Karpathy A. Deep visual-semantic alignments for generating image descriptions[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, June 7-12, 2015.
[41] Donahue J. Long-term recurrent convolutional networks for visual recognition and description[C]. IEEE Conference on Computer Vision and Pattern Rec-ognition (CVPR), Boston, MA, June 7-12, 2015.