专题:大数据战略

大数据可视化技术及应用

  • 沈恩亚
展开
  • 清华大学软件学院;大数据系统软件国家工程实验室, 北京 100084
沈恩亚,博士,研究方向为大数据、数据可视化、可视分析及人机交互,电子信箱:shenenya@mails.tsinghua.edu.cn

收稿日期: 2019-11-11

  修回日期: 2020-01-08

  网络出版日期: 2020-04-01

基金资助

北京市科技计划项目(Z111100067311053);国家重点研发计划项目(2016YFB0501504);国家自然科学基金项目(U1509213)

Big data visualization technology and applications

  • SHEN Enya
Expand
  • School of Software, Tsinghua University;National Engineering Laboratory of Big Data System Software, Beijing 100084, China

Received date: 2019-11-11

  Revised date: 2020-01-08

  Online published: 2020-04-01

摘要

随着人类产生数据量的增加,数据可视化需要处理的数据规模、类型及需求都发生了显著变化。在大数据时代,数据可视化面临诸多新的挑战。从大数据本身的特点及其应用需求出发,结合数据可视化的研究现状,介绍了适用于大数据的数据可视化技术;分析在大数据条件下数据可视化所要解决的8个关键问题;讨论了针对大数据可视化应用需求自主研发的交互式可视化设计平台AutoVis及其应用。

本文引用格式

沈恩亚 . 大数据可视化技术及应用[J]. 科技导报, 2020 , 38(3) : 68 -83 . DOI: 10.3981/j.issn.1000-7857.2020.03.004

Abstract

With the growth of data generated by human activities, the scale, the type and the demands for the data visualization have expanded greatly. In the big data era, the data visualization faces many challenges. In this paper, based on the characteristics and the requirements of the big data, and the current research states of the data visualization, the common data visualization techniques are reviewed. Eight important challenges that the data visualization has to deal with in the big data applications are highlighted. The AutoVis, a data-aware interactive visualization design platform, is specially discussed, as well as its applications.

参考文献

[1] Hey T, Tansley S, Tolle K. The fourth paradigm:Data-intensive scientific discovery[J]. Proceedings of the IEEE, 2009, 99(8):1334-1337.
[2] Shen E Y, Xia J Z, Cheng Z Q, et al. Model-driven multicomponent volume exploration[J]. Visual Computer, 2015, 31(4):441-454.
[3] 沈恩亚, 王攀, 李思昆, 等. 大规模数据并行可视化与交互环境[C]//2012全国高性能计算学术年会论文集. 北京:中国计算机学会, 2012:1-7.
[4] Shen E, Wang Y, Li S. Spatiotemporal volume saliency[J]. Journal of Visualization, 2016, 19(1):157-168.
[5] McAfee A, Brynjolfsson E, Thomas H, et al. Big data:The management revolution[J]. Harvard Business Review, 2012, 90(10):60-68.
[6] Doctorow C. Big data:Welcome to the Petacentre[J]. Nature, 2008, 455(7209):16-21.
[7] Reichman O J, Jones M B, Schildhauer M P. Challenges and opportunities of open data in ecology[J]. Science, 2011, 331(6018):703-705.
[8] Rosenblum L D. See what I'm saying:The extraordinary powers of our five senses[M]. London:W.W. Norton & Company Ltd., 2011.
[9] Foley T A, Lane D A, Nielson G M, et al. Scientific Visualization[J]. IEEE Computer Graphics and Applications, 1990, 10(1):32-40.
[10] Ware C. Information visualization:Perception for design[M]. San Francisco:Morgan Kaufmann Publishers Inc., 2012.
[11] Keim D, Andrienko G, Fekete J D, et al. Visual analytics:Definition, process, and challenges[M]//Information Visualization. Berlin:Springer, 2008.
[12] Chang W L, Grady N. NIST big data interoperability framework:Volume 6, big data taxonomies[R]. Gaithersburg:NIST, 2019.
[13] Abela A. Advanced presentations by design:Creating communication that drives action[M]. New York:John Wiley & Sons, 2008.
[14] Ahrens J, Brislawn K, Martin K, et al Large-scale data visualization using parallel data streaming[J]. IEEE Computer Graphics and Applications, 2001, 21(4):34-41.
[15] Singh J P, Gupta A, Levoy M. Parallel visualization algorithms:Performance and architectural implications[J]. Computer, 1994, 27(7):45-55.
[16] Moreland K. A survey of visualization pipelines[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(3):367-378.
[17] Ma K L. In situ visualization at extreme scale:Challenges and opportunities[J]. IEEE Computer Graphics and Applications,2009, 6:14-19.
[18] He W, Wang J, Guo H, et al. Insitunet:Deep image synthesis for parameter space exploration of ensemble simulations[J]. IEEE Transactions on Visualization and Computer Graphics, 2019, 26(1):23-33.
[19] Ahrens J, Jourdain S, O'Leary P, et al. In situ MPASocean image-based visualization[J/OL].[2019-10-31]. http://sc14.supercomputing.org/sites/all/themes/sc14/files/archive/sci_vis/sci_vis_files/svs105s3-file4.pdf.
[20] Ahrens J, Jourdain S, O'Leary P, Patchett J, et al. An image-based approach to extreme scale in situ visualization and analysis[C]//SC'14:Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Piscataway N J:IEEE, 2015:10.1109/SC.2014.40.
[21] Dutta S, Chen C M, Heinlein G, et al. In situ distribution guided analysis and visualization of transonic jet engine simulations[J]. IEEE Transactions on Visualization and Computer Graphics, 2016, 23(1):811-820.
[22] Di S, Cappello F. Fast error-bounded lossy HPC data compression with SZ[C]//2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). Piscataway N J:IEEE, 2016.
[23] Lakshminarasimhan S, Shah N, Ethier S, et al. Isabela for effective in situ compression of scientific data[J].Concurrency and Computation:Practice and Experience, 2013, 25(4):524-540.
[24] Bremer P T, Weber G, Tierny J, et al. Interactive exploration and analysis of large scale simulations using topology-based data segmentation[J]. IEEE:Transaction on Visualization and Computer Graphics, 2011, 17(9):1307-1324..
[25] The data visualisation catalogue[EB/OL].[2019-11-08]. https://datavizcatalogue.com/search/time.html.
[26] Morrow B, Manz T, Chung A E, et al. Periphery plots for contextualizing heterogeneous time-based charts[J]. arXiv, 2019:1906.07637.
[27] Tominski C, Aigner W. The timeviz browser[M/OL].[2019-09-10]. https://vcg.informatik.uni-rostock.de/~ct/timeviz/timeviz.html, 2017.
[28] Shneiderman B. Extreme visualization:Squeezing a billion records into a million pixels[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008. New York:ACM, 2008, doi:10.1145/1376616.1376618.
[29] Steinarsson S. Down sampling time series for visual representation[R/OL].[2019-10-31]. https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf.
[30] Kehagias A. A hidden markov model segmentation procedure for hydrological and environmental time series[J]. Stochastic Environmental Research and Risk Assessment, 2004, 18(2):117-130.
[31] Guo T, Feng K, CongG, et al. Efficient selection of geospatial data on maps for interactive and visualized exploration[C]//Proceedings of the 2018 International Conference on Management of Data. New York:ACM, 2018, doi:10.1145/3183713.3183738.
[32] Wu Y, Cao N, Archambault D, et al. Evaluation of graph sampling:A visualization perspective[J]. IEEE Transactions on Visualization and Computer Graphics, 2016, 23(1):401-410.
[33] Zhang J, Zhu K, Pei Y, et al. Clustering-structure representative sampling from graph streams[C]//International Conference on Complex Networks and their Applications. Berlin:Springer, 2017, doi:10.1007/978-3-319-72150-7_22.
[34] Woo M, Neider J, Davis T, et al. OpenGL programming guide:The official guide to learning OpenGL, version 1.2[M]. Boston:Addison-Wesley Longman Publishing Co. Inc., 1999.
[35] Schroeder W, Martin K, Lorensen B. The visualization toolkit:An object-oriented approach to 3D graphics[J]. Upper Saddle River:Prentice Hall Inc., 1998.
[36] Bostock M, Ogievetsky V, Heer J. D3 data-driven documents[J]. IEEE Transactions on Visualization and Computer Graphics, 2011, 17(12):2301-2309.
[37] Satyanarayan A, Russell R, Hoffswell J, et al. Reactive vega:A streaming dataflow architecture for declarative interactive visualization[J]. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1):659-668.
[38] Satyanarayan A, Moritz D, Wongsuphasawat K, et al. Vega-Lite:A grammar of interactive graphics[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(1):341-350.
[39] Stolte C, Tang D, Hanrahan P. Polaris:A system for query, analysis, and visualization of multidimensional relational databases[J]. IEEE Transactions on Visualization and Computer Graphics, 2002, 8(1):52-65.
[40] Tableau Inc[EB/OL].[2019-11-08]. https://www.tableau.com/.
[41] Wongsuphasawat K, Moritz D, Anand A, et al. Voyager:Exploratory analysis faceted browsing of visualization recommendations[J]. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1):649-658.
[42] Dibia V, Demiralp Ç. Data2vis:Automatic generation of data visualizations using sequence to sequence recurrent neural networks[J]. arXiv, 2018:1804.03126.
[43] Satyanarayan A, Heer J. Lyra:An interactive visualization design environment[J]. Computer Graphics Forum, 2014, 33(3):351-360.
[44] Liu Z, Thompson J, Wilson A, et al. Data illustrator:Augmenting vector design tools with lazy data binding for expressive visualization authoring[C]//Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. New York:ACM, 2018, doi:10.1145/3173574.3173697.
[45] Yu B W, Silva C T. Visflow-web-based visualization framework for tabular data with a subset flow model[J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(1):251-260.
[46] Microsoft Inc[EB/OL].[2019-11-08]. https://powerbi.microsoft.com/.
[47] Qlik Inc[EB/OL].[2019-11-08]. https://www.qlik.com/us/products/qlikview.
[48] Apache software foundation[EB/OL].[2019-11-08]. https://superset.incubator.apache.org/.
[49] MadhaviLatha A, Vijaya K A. Streaming data analysis using apache cassandra and zeppelin[J]. IJISET-International Journal of Innovative Science, Engineering & Technology, 2016, 3(10), http://ijiset.com/vol3/v3s10/IJISET_V3_I10_02.pdf.
[50] Wang L, Wang G, Alexander C A. Big data and visualization:Methods, challenges and technology progress[J]. Digital Technologies, 2015, 1(1):33-38.
[51] Agrawal R, Kadadi A, Dai X, et al. Challenges and opportunities with big data visualization[C]//International Conference on Management of Computational & Collective Intelligence in Digital Ecosystems. New York:ACM, 2015.
[52] Ali S M, Gupta N, Nayak G K, et al. Big data visualization:Tools and challenges[C]//2nd International Conference on Contemporary Computing and Informatics (IC3I). Piscataway NJ:IEEE, 2016, doi:10.1109/IC3I. 2016.7918044.
[53] Bikakis N. Big data visualization tools[J]. arXiv, 2018:1801.08336.
[54] Wang Y. Deck.Gl:Large-scale web-based visual analytics made easy[J]. arXiv, 2019:1910.08865.
[55] Gartner Inc[EB/OL].[2019-11-08]. https://www.gartner.com/en/information-technology/glossary/augmented-analytics.
[56] Balakrishnama S, Ganapathiraju A. Linear discriminant analysis-A brief tutorial[J]. Institute for Signal and Information Processing, 1998, 18:1-8.
[57] Hong F, Lai C, Guo H, et al. Flda:Latent dirichlet allocation based unsteady flow analysis[J]. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12):2545-2554.
[58] Shen E, Z Cheng, J Xia, and S Li. "Intuitive volume eraser[C]//1st International Conference on Computational Visual Media. Berlin:Springer, 2012:10.1007/978-3-642-34263-9_32.
[59] Shen E, Li S, Cai X, et al. SAVE:saliency-assisted volume exploration[J]. Journal of Visualization, 2015, 18(2):369-379.
[60] Shen E, Li S, Cai X, Zeng L, et al. Sketch-based interactive visualization:a survey[J]. Journal of Visualization, 2014, 14(4):275-294.
[61] Yu B, Silva C T. Flowsense:A natural language interface for visual data exploration within a dataflow system[J]. IEEE Transactions on Visualization and Computer Graphics, 2019, doi:10.1109/TVCG.2019.2934668.
[62] Gao Y, Lou J, Zhang D. Annaparser:Semantic parsing for tabular data analysis[J]. arXiv, 2019:1910.10363.
文章导航

/