Articles

AlphaGo's breakthrough and challenges of wargaming

  • HU Xiaofeng ,
  • HE Xiaoyuan ,
  • TAO Jiuyang
Expand
  • 1. Department of Information Operation & Command Training, National Defense University, Beijing 100091, China;
    2. College of Command Information Systems, Army Engineering University, Nanjing 210007, China

Received date: 2016-09-06

  Revised date: 2017-06-18

  Online published: 2017-11-16

Abstract

This paper summarizes the principles, new methods, technological breakthrough, and the epistemological sense of AlphaGo. Then the bottleneck of intelligent wargaming is analyzed, and the significance of intelligent situation awareness in wargaming is addressed. Next, the way to realize situation awareness in operations is proposed. Finally, new challenges of man-machine intelligence for wargaming are discussed.

Cite this article

HU Xiaofeng , HE Xiaoyuan , TAO Jiuyang . AlphaGo's breakthrough and challenges of wargaming[J]. Science & Technology Review, 2017 , 35(21) : 49 -60 . DOI: 10.3981/j.issn.1000-7857.2017.21.006

References

[1] Campbell M, Jr A J H, Hsu F H. Deep Blue[J]. Artificial Intelligence, 2002, 134(1/2):57-83.
[2] Silver D, Huang A. Mastering the game of go with deep neural net-works and tree search[J]. Nature, 2016(529):484-489.
[3] Allis L V. Searching for solutions in games and artifcial intelligence[D]. Maastricht Netherlands:University Limburg, 1994.
[4] Newnan M E J. The structure and function of complex networks[J]. Si-am Review, 2006, 45(2):167-256.
[5] Wang F Y, Zhang J J, Zheng X H. Where does AlphaGo go:From church turing thesis to AlphaGo thesis and beyond[J]. IEEE/CAAJour-nal of Automatica Sinica, 2016, 3(2):113-120.
[6] Hopfield J J. Neural networks and physical systems with emergent col-lective computational abilities[J]. Proceedings of the National Academy of Sciences. 1982,79(8):2554-2558.
[7] Andrade M A, Chacón P, Merelo J J, et al. Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsu-pervised learning neural network[J]. Protein Engineering, 1993, 6(4):383-390.
[8] Anguita D, Gomes B A. Mixing floating and fixed-point formats for neu-ral network learning on neuroprocessors[J]. Microprocessing & Micropro-gramming, 1996, 41(10):757-769.
[9] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[10] Schölkopf B, Platt J, Hofmann T. Greedy layer-wise training of deep networks[C]//International Conference on Neural Information Process-ing Systems. Cambridge:MIT Press, 2006:153-160.
[11] Arel I, Rose D C, Karnowski T P. Deep machine learning:A new fron-tier in artificial intelligence research[J]. IEEE Computational Intelli-gence Magazine, 2010, 5(4):13-18.
[12] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[13] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems. South Lake Tahoe, Nevada, USA:Curran Associates Inc, 2012:1097-1105.
[14] Jaderberg M, Simonyan K, Vedaldi A, et al. Reading Text in the Wild with Convolutional Neural Networks[J]. International Journal of Com-puter Vision, 2016, 116(1):1-20.
[15] Clark C, Storkey A. Teaching deep convolutional neural networks to play go[J]. Eprint Arxiv, 2015:1766-1774.
[16] Tesauro G, Galperin G R. On-line policy improvement using montecarlo search[J]. Advances in Neural Information Processing Systems, 1996(9):1068-1074.
[17] Browne C B, Powley E, Whitehouse D, et al. A survey of monte carlo tree search methods[J]. IEEE Transactions on Computational Intelli-gence & Ai in Games, 2012, 4(1):1-43.
[18] 陶九阳, 吴琳, 胡晓峰. AlphaGo技术原理分析及人工智能军事应用展望[J]. 指挥与控制学报, 2016, 2(2):114-120. Tao Jiuyang, Wu Lin, Hu Xiaofeng. Principle analysis on AlphaGo and perspectives in military application of artificial intelligence[J]. Journal of Command and Control, 2016, 2(2):114-120.
[19] Sutton R, Barto A. Reinforcement learning:An introduction[M]. Massa-chusetts:MIT Press, 1998.
[20] Kimura H, Miyazaki K, Kobayashi S. Reinforcement learning in POM-DPs with function approximation[C]//Fourteenth International Confer-ence on Machine Learning. Sydney:Morgan Kaufmann Publishers Inc. 1997:152-160.
[21] Hasselt H V, Guez A, Silver D. Deep reinforcement learning with dou-ble Q-learning[C]//Proceedings of the 30th AAAI Conference on Artifi-cial Intelligence. Phoenix, Arizona USA:American Institute of Aero-nautics and Astronautics, 2016.
[22] Alberts D S. The agility advantage:A survival guide for complex enter-prises and endeavors[M]. Washington:CCRP, 2011:23-71.
[23] Peter P P. The art of wargaming[M]. Annapolis Maryland:Naval Insti-tute Press, 1990:1-21.
[24] Peter P P. Ed Mcgrady. Why wargaming works[J]. Rhode Island:Na-val War College Review, 2011, 64(3):111-133.
[25] Blank Dennis. Military wargaming:A commercial battlefield[J]. Jane's Defence Weekly, 2004(2):5-9.
[26] Rose D. Designing a system on system wargame[R]. Ohio:US Air Force Research Laboratory, 2006.
[27] 胡晓峰, 司光亚, 吴琳, 等. 战争模拟原理与系统[M]. 北京:国防大学出版社, 2009. Hu Xiaofeng, Si Guangya, Wu Lin, et al. War gaming & simulation principle and system[M]. Beijing:National Defense University Press, 2009.
[28] Caffrey J, Mattew B. Intelligent computing and wargaming[C]//The In-ternational Society for Optical Engineering Orlando, Florida:The Soci-ety of Photo-Optical Instrumentation Engineers, 2011:5-9.
[29] Musman S, Temin A. Evaluating the impact of cyber attacks on mis-sions[J]. M&S Journal, 2013(7):25-36.
[30] Endsley M. Toward a theory of situation awareness in dynamic systems[J]. Human Factors, 1995, 37(1):35-64.
[31] Oosthuizen R, Pretorius L. System dynamics modelling of situation awareness[C]//Military Communications and Information Systems Con-ference. Piscataway, NJ:IEEE, 2015:1-6.
[32] Tadda G, Salerno J J. Realizing situation awareness within a cyber en-vironment[J]. Proceedings of Spie, 2006, Doi:10.1117/12.665763.
[33] Sutton R S, Barto A G. Reinforcement learning:An introduction[J]. IEEE Transactions on Neural Networks, 2005, 16(1):285-286.
[34] Kimura H, Miyazaki K, Kobayashi S. Reinforcement learning in POM-DPs with function approximation[C]//Fourteenth International Confer-ence on Machine Learning. Netherlands:Morgan Kaufmann Publish-ers Inc. 1997:152-160.
[35] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[36] Bellemare M G, Veness J, Investigating contingency awareness using Atari 2600 games[C]//AAAI Conference on Artificial Intelligence. Washington:AAAI, 2013:864-871.
[37] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep rein-forcement learning[J]. Computer Science, 2013, arXiv:1312.5602.
[38] Ji S, Xu W, Yang M, et al. 3D convolutional neural networks for hu-man action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):221-231.
[39] Jayanth Koushik. Understanding convolutional neural networks[J]. Computer Science, 2016, arXiv:1605.09081v1.
[40] Pan S, Yang Q. A survey on transfer learning[J]. Knowledge and Data Engineering, IEEE Transactions on, 2010, 22(10):1345-1359.
[41] Lake B M, Salakhutdinov R, Tenenbaum J B. Human-level concept learning through probabilistic program induction[J]. Science, 2015, 350(6266):1332-1338.
[42] Assael J A M, Wang Z, Shahriari B, et al. Heteroscedastic treed bayes-ian optimisation[J]. Computer Science, 2014, arXiv:1410.7172.
[43] Heinrich J, Silver D. Deep reinforcement learning from self-play in imperfect-information games[J]. Computer Science, 2016, arXiv:1603. 01121v1.
[44] Iii T J L, Epelman M A, Smith R L. A fictitious play approach to large-scale optimization[J]. Operations Research, 2003, 53(3):477-489.
[45] Ponsen M, De Jong S, Lanctot M. Computing approximate Nash equi-libria and robust best-responses using sampling[J]. Journal of Artifi-cial Intelligence Research, 2011, 42(1):575-605.
[46] Ernest N, Carroll D, Schumacher C, et al. Genetic fuzzy based artifi-cial intelligence for unmanned combat aerial[J]. Journal of Defense Management, 2016, 6(1):1-7.
[47] Cordon O. A historical review of evolutionary learning methods for Mamdani-type fuzzy rule-based systems:Designing interpretable ge-netic fuzzy systems[J]. International Journal of Approximate Reason-ing, 2011(52):894-913.
Outlines

/