专题论文

大数据技术进展与发展趋势

  • 程学旗 ,
  • 靳小龙 ,
  • 杨婧 ,
  • 徐君
展开
  • 中国科学院计算技术研究所, 中国科学院网络数据科学与技术重点实验室, 北京 100190
程学旗,研究员,研究方向为网络科学与社会计算、互联网搜索与挖掘、网络信息安全等,电子信箱:cxq@ict.ac.cn

收稿日期: 2016-05-30

  修回日期: 2016-06-28

  网络出版日期: 2016-08-18

基金资助

国家重点基础研究发展计划(973计划)项目(2013CB329602,2014CB340401);国家自然科学基金面上项目(61572473);国家杰出青年科学基金项目(61425016);国家自然科学基金青年科学基金项目(61303049)

Technological progress and trends of big data

  • CHENG Xueqi ,
  • JIN Xiaolong ,
  • YANG Jing ,
  • XU Jun
Expand
  • CAS Key Laboratory of Network Data Science & Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Received date: 2016-05-30

  Revised date: 2016-06-28

  Online published: 2016-08-18

摘要

随着IT 技术的高速发展,世界范围内各行各业都在进行信息化变革,几乎每个行业都在努力发现和利用大数据的价值。为了充分利用大数据带来的机遇,同时有效应对大数据带来的挑战,国内外产业界、科学界和政府部门都在积极布局、制定战略规划。本文介绍大数据背景与动态,描述各国大数据政策实践及中国大数据发展的政策环境和产业界生态发展状况;阐述大数据技术的进展,梳理其生态体系和创新特点;提出大数据可视化、多学科融合、安全与隐私、深度分析等发展趋势和相关建议。

本文引用格式

程学旗 , 靳小龙 , 杨婧 , 徐君 . 大数据技术进展与发展趋势[J]. 科技导报, 2016 , 34(14) : 49 -59 . DOI: 10.3981/j.issn.1000-7857.2016.14.006

Abstract

With the rapid development of IT technology, worldwide businesses are experiencing informatization changes, and almost every industry is trying to exploit and utilize the value of big data. In order to take full advantage of the opportunity while effectively dealing with the challenges brought by big data, academia, industry and government actively promote the layout adjustment and establish the holistic strategic plan. This paper first introduces the background and dynamics of big data, and expounds the national policy environment and industry development status. Then it describes the technical progress on big data, with a comprehensive overview about the technology architecture and innovative features. Finally, the development trend is forecasted and some suggestions relating to big data visualization, multidisciplinary integration, security and privacy, depth analysis etc. are proposed.

参考文献

[1] 李国杰. 大数据研究的科学价值[J].中国计算机学会通讯, 2012, 8(9): 8-15. Li Guojie. Scientific value of big data research[J].China Computer Society Newsletter, 2012, 8(9): 8-15.
[2] 李国杰, 程学旗. 大数据研究: 未来科技及经济社会发展的重大战略领域[J]. 中国科学院院刊, 2012, 27(6): 647-657. Li Guojie, Cheng Xueqi. Big data research: The major strategic areas of the development of the future science and economic society[J]. Bulletin of Chinese Academy of Sciences, 2012, 27(6):647-657.
[3] 王元卓, 靳小龙, 程学旗. 网络大数据:现状与展望[J]. 计算机学报, 2013, 36(6): 1125-1138. Wang Yuanzhuo, Jin Xiaolong, Cheng Xueqi. Network big data: Present and future[J]. Chinese Journal of Computers, 2013, 36(6): 1125-1138.
[4] 李国杰, 程学旗, 赵国栋, 等. 2014中国大数据技术与产业发展报告[M]. 北京: 机械工业出版社, 2013: 6-11. Li Guojie, Cheng Xueqi, Zhao Guodong, et al. 2014 China big data technology and industry development report[M]. Beijing: China Mechine Press, 2013: 6-11.
[5] 周慧. 国家发改委: 资金支持大数据重大建设项目[EB/OL]. 2016-01-20 [2016-04-08]. http://news.hexun.com/2016-01-20/181906965.html. Zhou Hui. National development and reform commission: funding support major construction projects of big data[EB/OL]. 2016-01-20 [2016-04-08]. http://news.hexun.com/2016-01-20/181906965.html.
[6] Apache H. What is apache hadoop?[EB/OL]. 2013-08-26[2016-04-13]. http://hadoop.apache.org.
[7] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
[8] Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing[C]//Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 141-146.
[9] Lublinsky B, Smith K T, Yakubovich A. Professional hadoop solutions[M]. Birmingham: Wrox Press, 2013.
[10] Gartner Research Report. Magic quadrant for data quality tools [EB/OL]. [2016-04-12]. http://useready.com/wp-content/uploads/2013/07/Gartner-Data-Quality-2012.pdf
[11] Gonzalez J E, Low Y, Gu H, et al. Powergraph: Distributed graph-parallel computation on natural graphs[C]//Proceedings of the 10th USENIX Sympo-sium on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2012: 17-30.
[12] 吴甘沙. 大数据计算范式的分野与交融[J]. 程序员, 2013(9): 104-108. Wu Gansha. The difference and blending of big data computing paradigm[J]. Programmer, 2013(9): 104-108.
[13] Engle C, Lupher A, Xin R, et al. Shark: Fast data analysis using coarse-grained distributed memory[C]//Proceedings of the 2012 ACM SIGMOD Interna-tional Conference on Management. New York: ACM, 2012.
[14] Neumeyer L, Robbins B, Nair A, et al. S4: Distributed stream computing platform[C]//Proceedings of the 10th International Conference on Data Mining Workshops. Washington, DC: IEEE, 2010: 170-177.
[15] Zaharia M, Das T, Li H, et al. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters[C]//Proceedings of the 4th USENIX conference on Hot Topics in Cloud computing. Berkeley CA: USENIX Association, 2012: 10-16.
[16] Bu Y, Howe B, Balazinska M, et al. HaLoop: Efficient iterative data processing on large clusters[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 285-296.
[17] Zhang Y, Gao Q, Gao L, et al. iMapReduce: A distributed computing framework for iterative computation[C]//Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum. Washington, DC: IEEE, 2011: 1112-1121.
[18] Ekanayake J, Li H, Zhang B, et al. Twister: A runtime for iterative mapreduce[C]//Proceedings of the 19th ACM International Symposium on High Perfor-mance Distributed Computing. New York: ACM, 2010: 810-818.
[19] Malewicz G, Austern M, Bik A, et al. Pregel: A system for large-scale graph processing[C]//Proceedings of the 2010 International Conference on Manage-ment of Data. New York: ACM, 2010: 135-146.
[20] Shao B, Wang H, Li Y, et al. Trinity: A distributed graph engine on a memory cloud[C]//Proceedings of the 2013 ACM SIGMOD International Confer-ence on Management. New York: ACM, 2013: 1-12.
[21] Gonzalez J, Low Y, Gu H. PowerGraph: Distributed graph-parallel computation on natural graphs[C]//Proceedings of the 10th USENIX Symposium on Op-erating Systems Design and Implementation. New York: ACM, 2012: 17-30.
[22] Xin R, Gonalez J, Franklin M. GraphX: A resilient distributed graph system on spark[C]//Proceedings of the First International Workshop on Graph Data Management Experience and System. New York: ACM, 2013: 12-18.
[23] 程学旗, 王元卓. 大数据计算的技术体系与引擎系统[J]. 高科技与产业化, 2013, 9(5): 62-65. Cheng Xueqi, Wang Yuanzhuo. Technology architecture and engine system for big data computing[J]. High-Technology & Industrialization, 2013, 9(5): 62-65.
[24] Xing E P, Qirong H Xie P T, et al. Strategies and principles of distributed machine learning on big data[J]. ArXiv Preprint ArXiv:1512.09295, 2015.
[25] Gropp W, Lusk E, Thakur R. Using MPI-2: Advanced features of the message-passing interface[M]. Cambridge MA: MIT Press, 1999.
[26] Smola A, Narayanamurthy S. An architecture for parallel topic models[J]. Proceedings of the VLDB Endowment, 2010, 3(1-2): 703-710.
[27] Xing E P, Ho Q, Dai W, et al. Petuum: A new platform for distributed machine learning on big data[J]. IEEE Transactions on Big Data, 2015, 1(2): 49-67.
[28] Li M, Andersen D G, Park J W, et al. Scaling distributed machine learning with the parameter server[C]//11th USENIX Symposium on Operating Sys-tems Design and Implementation (OSDI 14). Berkeley, CA: USENIX Association, 2014: 583-598.
[29] Bethel E W, Childs H, Hansen C. High performance visualization: Enabling extreme-scale scientific insight[M]. Boca Raton, FL: CRC Press, 2012.
[30] 潘柱廷, 程学旗, 袁晓如. CCF大专委2016年大数据发展趋势预测——解读和行动建议[J]. 大数据, 2016, 2(1): 2016012. Pan Zhuting, Cheng Xueqi, Yuan Xiaoru. Developing trend forecasting of big data in 2016 from CCF TFBD: Interpretation and proposals[J]. Big Data Research, 2016, 2(1): 2016012.
文章导航

/