[1] 加里·卡斯帕罗夫 . 深度思考——人工智能的终点与人类创造力的起点[M]. 集智俱乐部, 译 . 北京: 中国人民大学出版社, 2018.
[2] Campbell M, Hoane A J Jr, Hsu F H. Deep blue[J]. Artifi⁃cial Intelligence, 2002, 134(1/2): 57-83.
[3] Müller M. Computer go[J]. Artificial Intelligence, 2002,134(1/2): 145-179.
[4] Silver D. Tutorial: Deep reinforcement learning, Google DeepMind[EB/OL]. [2022-08-31]. https://www.davidsilver.uk/wp-content/uploads/2020/03/deep_rl_tutorial_small_compressed.pdf.
[5] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J].Nature, 2016, 529(7587): 484-489.
[6] Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of Go without human knowledge[J]. Nature,2017, 550(7676): 354-359.
[7] 尹立蒙 . 上帝能否创造一块自己推不动的石头: Alpha⁃Go 击败柯洁[M]//社会热点解读(2017). 北京: 北京出版社, 2018: 335-353.
[8] 赵冬斌, 邵坤, 朱圆恒, 等 . 深度强化学习综述: 兼论计算机围棋的发展[J]. 控制理论与应用, 2016, 33(6): 701-717.
[9] DeepMind. master-series-60-online-games[EB/OL]. [2022-08-31]. https://www.deepmind.com/research/ highlighted-research/alphago/master-series-60-online-games.
[10] Silver D, Hubert T, Schrittwieser J, et al. A general rein⁃forcement learning algorithm that Masters chess, shogi,and Go through self-play[J]. Science, 2018, 362(6419): 1140-1144.
[11] Schrittwieser J, Antonoglou I, Hubert T, et al. Mastering Atari, Go, chess and Shogi by planning with a learned model[J]. Nature, 2020, 588(7839): 604-609.
[12] Stockfish: Strong open source chess engine[EB/OL].[2022-08-31]. https://stockfishchess.org/.
[13] The 27th World Computer Shogi Championship[EB/OL].[2022-08-31]. http://www2. computer-shogi. org/wcsc27/index_e.html.
[14] 苏剑波, 陈叶飞, 马哲, 等. 从AlphaGo到BetaGo——基于任务可完成性分析的定性人工智能的定量实现[J].控制理论与应用, 2016, 33(12): 1572-1583.
[15] 陶九阳, 吴琳, 胡晓峰 . AlphaGo 技术原理分析及人工智能军事应用展望[J]. 指挥与控制学报, 2016, 2(2):114-120.
[16] Wang F Y, Zhang J J, Zheng X H, et al. Where does AlphaGo go: From church-Turing thesis to AlphaGo thesis and beyond[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(2): 113-120.
[17] 杨小康 . 未来人工智能: 从 AlphaGo到 BetaGo[J]. 科学,2017, 69(3): 6-8.
[18] 胡晓峰, 贺筱媛, 陶九阳 . AlphaGo 的突破与兵棋推演的挑战[J]. 科技导报, 2017, 35(21): 49-60.
[19] Jiang Q, Li K, Du B, et al. Deltadou: Expert-level Doudizhu AI through self-play[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2019: 1265-1271.
[20] Yang X, Duvaud W, Wei P. Continuous control for searching and planning with a learned model[DB/OL].arXiv preprint: 2006.07430, 2020.
[21] Fawzi A, Balog M, Huang A, et al. Discovering faster matrix multiplication algorithms with reinforcement learning[J]. Nature, 2022, 610(7930): 47-53.
[22] 田渊栋 . 阿法狗围棋系统的简要分析[J]. 自动化学报,2016, 42(5): 671-675.
[23] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[24] 唐振韬, 邵坤, 赵冬斌, 等 . 深度强化学习进展: 从 Al⁃phaGo 到 AlphaGo Zero[J]. 控制理论与应用, 2017, 34(12): 1529-1546.
[25] 刘知青, 吴修竹 . 解读 AlphaGo 背后的人工智能技术[J]. 控制理论与应用, 2016, 33(12): 1685-1687.
[26] 薛永红, 王洪鹏. 机器下棋的历史与启示: 从“深蓝”到AlphaZero[J]. 科技导报, 2019, 37(19): 87-96.
[27] 陈铭禹. AlphaGo与AlphaZero原理和未来应用研究[J].通讯世界, 2019, 26(12): 22-23.
[28] 唐川, 陶业荣, 麻曰亮 . AlphaZero 原理与启示[J]. 航空兵器, 2020, 27(3): 27-36.
[29] 李理 . 深度学习理论与实战: 提高篇[EB/OL]. (2019-03-14)[2022-08-31]. http://fancyerii.github.io/ 2019/03/14/dl-book/.
[30] 浅述: 从 Minimax到 AlphaZero, 完全信息博弈之路[EB/OL]. [2022-08-31]. https://zhuanlan.zhihu.com/p/31809-930.
[31] Redflashing. 与 AI博弈: 从 AlphaGo 到 MuZero [EB/OL].[2022-08-31]. https://zhuanlan.zhihu.com/p/458714600.
[32] 从 α 到 μ: DeepMind 棋 盘 游 戏 AI 进 化 史 [EB/OL].[2022-08-31]. https://baijiahao.baidu.com/s?id=165759-0770469176607&wfr=spider&for=pc.
[33] CSDN. AI 中的搜索(二)——对抗搜索(最小最大搜索Minimax、Alpha-Beta剪枝搜索、蒙特卡洛树搜索MCTS)[EB/OL]. [2022-08-31]. https://blog.csdn.net/hxxjxw/arti⁃cle/details/105848821.
[34] Amir R, Evstigneev V I. On Zermelo's theorem[J]. Jour⁃nal of Dynamics and Games, 2017, 4(3): 191-194.
[35] Knuth D E, Moore R W. An analysis of alpha-beta prun⁃ing[J]. Artificial Intelligence, 1975, 6(4): 293-326.
[36] Herik H J D, Uiterwijk J W H M, Rijswijck J V. Games solved: Now and in the future[J]. Artificial Intelligence,2002, 134: 277-311.
[37] Müller M. Computer Go[J]. Artificial Intelligence, 2002,134(1/2): 145-179.
[38] Tesauro G, Galperin G. On-line policy improvement us⁃ing Monte-Carlo search[C]//Proceedings of the 9th Inter⁃national Conference on Neural Information Processing Systems. New York: ACM, 1996: 1068-1074.
[39] Coulom R. Efficient selectivity and backup operators in Monte-Carlo tree search[C]//Proceedings of the 5th inter⁃national conference on Computers and games. New York: ACM, 2006: 72-83.
[40] 王少锋. 棋类运动在人工智能背景下的发展问题研究:AI围棋对弈软件优化技术研究[C]//中国围棋论丛(第6辑). 杭州: 浙江古籍出版社, 2021: 310-339.
[41] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classi⁃fication with deep convolutional neural networks[C]//Proceeding in Advances in Neural Information Process⁃ing Systems. 2012.
[42] Silver D. Reinforcement learning and simulation-based search in computer Go[D]. Edmonton: University of Al⁃berta, 2009.
[43] Kocsis L, Szepesvári C. Bandit based Monte-Carlo plan⁃ning[M]//Lecture Notes in Computer Science. Berlin,Heidelberg: Springer Berlin Heidelberg, 2006: 282-293.
[44] 28天自制你的 AlphaGo(6): 蒙特卡洛树搜索(MCTS)基础[EB/OL]. [2022-08-31]. https://zhuanlan.zhihu.com/p/25345778.
[45] Monte Carlo tree search—Beginners guide[EB/OL].[2022-08-31]. https://int8. io/monte-carlo-tree-searchbeginners-guide/.
[46] 董豪, 丁子涵, 仉尚航, 等. 深度强化学习基础、研究与应用[M]. 北京: 电子工业出版社, 2021.
[47] Browne C B, Powley E, Whitehouse D, et al. A survey of Monte Carlo tree search methods[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(1): 1-43.
[48] Auer P, Cesa-Bianchi N, Fischer P. Finite-time analy⁃sis of the multiarmed bandit problem[J]. Machine Learn⁃ing, 2002, 47(2/3): 235-256.
[49] Rosin C D. Multi-armed bandits with episode context[J].Annals of Mathematics and Artificial Intelligence, 2011,61(3): 203-230.
[50] Enzenberger M, Müller M. A lock-free multithreaded Monte-Carlo tree search algorithm[M]//Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010: 14-20.
[51] Silver D, Hasselt H V, Hessel M, et al. The predictron:End-to-end learning and planning[C]// Proceedings of the 34th International Conference on Machine Learning-Volume 70. New York: ACM, 2017: 3191-3199.
[52] Oh J, Singh S, Lee H. Value prediction network[C]//Pro⁃ceding in the Conference and Workshop on Neural Infor⁃mation Processing Systems. La Jolla: Neural Information Processing Systems (NIPS), 2017.
[53] 围棋人工智能程序 AI 之 Zen 系列[EB/OL]. [2022-08-31]. www.bilibili.com/read/cv13112575/.
[54] Coulom R. Computing Elo ratings of move patterns in the game of Go[J]. ICGA Journal, 2007, 30(4): 198-208.
[55] Baudiš P, Gailly J L. PACHI: State of the art open source go program[M]//Lecture Notes in Computer Sci⁃ence. Berlin, Heidelberg: Springer Berlin Heidelberg,2012: 24-38.
[56] Müller M, Enzenberger M, Arneson B, et al. Fuego—An open-source framework for board games and Go engine based on Monte-Carlo tree search[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2010, 2(4): 259-270.
[57] Hinton G E, Osindero S, Teh Y W. A fast learning algo⁃rithm for deep belief nets[J]. Neural Computation, 2006,18(7): 1527-1554.
[58] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature,2015, 521(7553): 436-444.
[59] Clark C, Storkey A. Teaching deep convolutional neural networks to play go[DB/OL]. arXiv preprint: 1412.3409,2014.
[60] Tian Y D, Zhu Y. Better computer go player with neural network and long-term prediction[DB/OL]. arXiv pre⁃print: 1511.06410, 2015.
[61] Werf E, Herik H, Uiterwijk J. Learning to score final po⁃sitions in the game of Go[J]. Theoretical Computer Sci⁃ence, 2005, 349(2): 168-183.
[62] Sutskever I, Nair V. Mimicking go experts with convolu⁃tional neural networks[M]//Artificial Neural NetworksICANN 2008. Berlin, Heidelberg: Springer Berlin Hei⁃delberg, 2008: 101-110.
[63] Maddison C J, Huang A, Sutskever I, et al. Move evalua⁃tion in Go using deep convolutional neural networks[DB/OL]. arXiv preprint: 1412.6564, 2014.
[64] Yang Y, Wang J. An overview of multi-agent reinforce⁃ment learning from game theoretical perspective[DB/OL]. arXiv preprint: 2011.00583, 2020.
[65] KGS 围 棋 服 务 器 [EB/OL]. [2022-08-31]. https://www.gokgs.com/index.jsp?locale=zh_CN.
[66] Sutton R S, Barto A G. Reinforcement learning: An intro⁃duction[M]. 2nd edition. Cambridge: MIT Press, 2018.
[67] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Confer⁃ence on Computer Vision and Pattern Recognition. Pis⁃cataway: IEEE Press, 2016: 770-778.
[68] Hussein A, Gaber M M, Elyan E, et al. Imitation learn⁃ing: A survey of learning methods[J]. ACM Computing Surveys, 2018, 50(2): 1-35.
[69] Khalid E A. Dirichlet processes, a gentle tutorial[EB/OL]. [2022-08-31]. Carnegie Mellon University, 2008,https://www.cs.cmu.edu/~kbe/dp_tutorial.pdf.
[70] Steinitz W. The modern chess instructor[M]. Russell En⁃terprises, 2017.
[71] Wang T, Bao X, Clavera I, et al. Benchmarking model-based reinforcement learning[DB/OL]. arXiv preprint:1907.02057, 2019.
[72] Ye W, Liu S, Kurutach T, et al. Mastering atari games with limited data[DB/OL]. arXiv preprint: 2111.00210,2021.