With the rapid development of IT technology, worldwide businesses are experiencing informatization changes, and almost every industry is trying to exploit and utilize the value of big data. In order to take full advantage of the opportunity while effectively dealing with the challenges brought by big data, academia, industry and government actively promote the layout adjustment and establish the holistic strategic plan. This paper first introduces the background and dynamics of big data, and expounds the national policy environment and industry development status. Then it describes the technical progress on big data, with a comprehensive overview about the technology architecture and innovative features. Finally, the development trend is forecasted and some suggestions relating to big data visualization, multidisciplinary integration, security and privacy, depth analysis etc. are proposed.
[1] 李国杰. 大数据研究的科学价值[J].中国计算机学会通讯, 2012, 8(9): 8-15. Li Guojie. Scientific value of big data research[J].China Computer Society Newsletter, 2012, 8(9): 8-15.
[2] 李国杰, 程学旗. 大数据研究: 未来科技及经济社会发展的重大战略领域[J]. 中国科学院院刊, 2012, 27(6): 647-657. Li Guojie, Cheng Xueqi. Big data research: The major strategic areas of the development of the future science and economic society[J]. Bulletin of Chinese Academy of Sciences, 2012, 27(6):647-657.
[3] 王元卓, 靳小龙, 程学旗. 网络大数据:现状与展望[J]. 计算机学报, 2013, 36(6): 1125-1138. Wang Yuanzhuo, Jin Xiaolong, Cheng Xueqi. Network big data: Present and future[J]. Chinese Journal of Computers, 2013, 36(6): 1125-1138.
[4] 李国杰, 程学旗, 赵国栋, 等. 2014中国大数据技术与产业发展报告[M]. 北京: 机械工业出版社, 2013: 6-11. Li Guojie, Cheng Xueqi, Zhao Guodong, et al. 2014 China big data technology and industry development report[M]. Beijing: China Mechine Press, 2013: 6-11.
[5] 周慧. 国家发改委: 资金支持大数据重大建设项目[EB/OL]. 2016-01-20 [2016-04-08]. http://news.hexun.com/2016-01-20/181906965.html. Zhou Hui. National development and reform commission: funding support major construction projects of big data[EB/OL]. 2016-01-20 [2016-04-08]. http://news.hexun.com/2016-01-20/181906965.html.
[6] Apache H. What is apache hadoop?[EB/OL]. 2013-08-26[2016-04-13]. http://hadoop.apache.org.
[7] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
[8] Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing[C]//Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 141-146.
[9] Lublinsky B, Smith K T, Yakubovich A. Professional hadoop solutions[M]. Birmingham: Wrox Press, 2013.
[10] Gartner Research Report. Magic quadrant for data quality tools [EB/OL]. [2016-04-12]. http://useready.com/wp-content/uploads/2013/07/Gartner-Data-Quality-2012.pdf
[11] Gonzalez J E, Low Y, Gu H, et al. Powergraph: Distributed graph-parallel computation on natural graphs[C]//Proceedings of the 10th USENIX Sympo-sium on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 17-30.
[12] 吴甘沙. 大数据计算范式的分野与交融[J]. 程序员, 2013(9): 104-108. Wu Gansha. The difference and blending of big data computing paradigm[J]. Programmer, 2013(9): 104-108.
[13] Engle C, Lupher A, Xin R, et al. Shark: Fast data analysis using coarse-grained distributed memory[C]//Proceedings of the 2012 ACM SIGMOD Interna-tional Conference on Management. New York: ACM, 2012.
[14] Neumeyer L, Robbins B, Nair A, et al. S4: Distributed stream computing platform[C]//Proceedings of the 10th International Conference on Data Mining Workshops. Washington, DC: IEEE, 2010: 170-177.
[15] Zaharia M, Das T, Li H, et al. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters[C]//Proceedings of the 4th USENIX conference on Hot Topics in Cloud computing. Berkeley CA: USENIX Association, 2012: 10-16.
[16] Bu Y, Howe B, Balazinska M, et al. HaLoop: Efficient iterative data processing on large clusters[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 285-296.
[17] Zhang Y, Gao Q, Gao L, et al. iMapReduce: A distributed computing framework for iterative computation[C]//Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum. Washington, DC: IEEE, 2011: 1112-1121.
[18] Ekanayake J, Li H, Zhang B, et al. Twister: A runtime for iterative mapreduce[C]//Proceedings of the 19th ACM International Symposium on High Perfor-mance Distributed Computing. New York: ACM, 2010: 810-818.
[19] Malewicz G, Austern M, Bik A, et al. Pregel: A system for large-scale graph processing[C]//Proceedings of the 2010 International Conference on Manage-ment of Data. New York: ACM, 2010: 135-146.
[20] Shao B, Wang H, Li Y, et al. Trinity: A distributed graph engine on a memory cloud[C]//Proceedings of the 2013 ACM SIGMOD International Confer-ence on Management. New York: ACM, 2013: 1-12.
[21] Gonzalez J, Low Y, Gu H. PowerGraph: Distributed graph-parallel computation on natural graphs[C]//Proceedings of the 10th USENIX Symposium on Op-erating Systems Design and Implementation. New York: ACM, 2012: 17-30.
[22] Xin R, Gonalez J, Franklin M. GraphX: A resilient distributed graph system on spark[C]//Proceedings of the First International Workshop on Graph Data Management Experience and System. New York: ACM, 2013: 12-18.
[23] 程学旗, 王元卓. 大数据计算的技术体系与引擎系统[J]. 高科技与产业化, 2013, 9(5): 62-65. Cheng Xueqi, Wang Yuanzhuo. Technology architecture and engine system for big data computing[J]. High-Technology & Industrialization, 2013, 9(5): 62-65.
[24] Xing E P, Qirong H Xie P T, et al. Strategies and principles of distributed machine learning on big data[J]. ArXiv Preprint ArXiv:1512.09295, 2015.
[25] Gropp W, Lusk E, Thakur R. Using MPI-2: Advanced features of the message-passing interface[M]. Cambridge MA: MIT Press, 1999.
[26] Smola A, Narayanamurthy S. An architecture for parallel topic models[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 703-710.
[27] Xing E P, Ho Q, Dai W, et al. Petuum: A new platform for distributed machine learning on big data[J]. IEEE Transactions on Big Data, 2015, 1(2): 49-67.
[28] Li M, Andersen D G, Park J W, et al. Scaling distributed machine learning with the parameter server[C]//11th USENIX Symposium on Operating Sys-tems Design and Implementation (OSDI 14). Berkeley, CA: USENIX Association, 2014: 583-598.
[29] Bethel E W, Childs H, Hansen C. High performance visualization: Enabling extreme-scale scientific insight[M]. Boca Raton, FL: CRC Press, 2012.
[30] 潘柱廷, 程学旗, 袁晓如. CCF大专委2016年大数据发展趋势预测——解读和行动建议[J]. 大数据, 2016, 2(1): 2016012. Pan Zhuting, Cheng Xueqi, Yuan Xiaoru. Developing trend forecasting of big data in 2016 from CCF TFBD: Interpretation and proposals[J]. Big Data Research, 2016, 2(1): 2016012.