综述

大数据时代下数据挖掘技术的应用

  • 刘铭, 吕丹, 安永灿
展开
  • 长春工业大学数学与统计学院, 长春 130012
刘铭,副教授,研究方向为智能计算与数据挖掘,电子信箱:jlcclm@163.com

收稿日期: 2017-12-15

  修回日期: 2018-04-09

  网络出版日期: 2018-05-19

基金资助

国家自然科学基金项目(61503150)

Applications research of data mining technology in big data era

  • LIU Ming, LÜ Dan, AN Yongcan
Expand
  • School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China

Received date: 2017-12-15

  Revised date: 2018-04-09

  Online published: 2018-05-19

摘要

大数据时代下,数据挖掘技术越来越受到人们的关注。本文介绍了数据挖掘技术的研究背景和研究现状,论述了决策树、支持向量机、神经网络等数据挖掘技术的相关算法,分析了数据挖掘技术在大数据中的相关应用及未来的发展趋势,探讨了在大数据时代数据挖掘技术面临的挑战。

本文引用格式

刘铭, 吕丹, 安永灿 . 大数据时代下数据挖掘技术的应用[J]. 科技导报, 2018 , 36(9) : 73 -83 . DOI: 10.3981/j.issn.1000-7857.2018.09.010

Abstract

In the era of big data, data mining technology has received more and more attention. This paper introduces the research background and status of data mining technology, followed by detailed description of its relevant algorithms such as decision tree, support vector machine, neural networks in detail. It then analyzes the data mining related applications and future development trend. Finally, it summarizes the challenges data mining technology will be faced with in the era of big data.

参考文献

[1] 吉根林, 赵斌. 面向大数据的时空数据挖掘综述[J]. 南京师大学报(自然科学版), 2014, 37(1):1-7. Jin Genlin, Zhao Bin. A Review of spatio-temporal data mining for big data[J]. Journal of Nanjing Normal University (Science & Technology Edition), 2014, 37(1):1-7
[2] 王刚, 黄丽华, 张成洪, 等. 数据挖掘分类算法研究综述[J]. 科技导报, 2006, 24(12):73-76. Wang Gang, Huang Lihua, Zhang Chenghong, et al. Research summary of data mining classification algorithm[J]. Science Technology Review, 2006, 24(12):73-76.
[3] 李海林, 梁叶, 王少春. 时间序列数据挖掘中的动态时间弯曲研究综述[J]. 控制与决策, 2017, doi:10.13195/j.kzyjc.2017. 1037. Li Hailin, Liang Ye, Wang Shaochun. Review of dynamic time bending in time series data mining[J]. Control and Decision Making, 2017, doi:10.13195/j.kzyjc.2017.1037.
[4] 龚著琳, 陈瑛, 苏懿, 等. 数据挖掘在生物医学数据分析中的应用[J]. 上海交通大学学报(医学版), 2010, 30(11):1420-1423. Gong Zhulin, Chen Ying, Su Yi, et al. Application of data mining in biomedical data analysis[J]. Journal of Shanghai Jiaotong University(Medical Science), 2010, 30(11):1420-1423.
[5] 屈芳, 郭骅."互联网+大数据"养老的实现路径[J]. 科技导报, 2017, 35(16):84-90. Qu Fang, Guo Hua. "Internet + big data" pension path to achieve[J]. Science & Technology Review, 2017, 35(16):84-90.
[6] Pan T L, Sumalee A, Zhong R X, et al. Short-term traffic state prediction based on temporal-spatial correlation[J]. IEEE Transactions on Intelligent Transportation Systems, 2013, 14(3):1242-1254.
[7] Wright J, Yang A Y, Ganesh A, et al. Robust face recognition via sparse representation[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2008, 31(2):210-227.
[8] 沙亚清, 孙宏伟, 顾明. 基于智能卡和指纹识别的电子报税认证系统[J]. 计算机工程, 2006, 32(14):133-135. Sha Yaqing, Sun Hongwei, Gu Ming, et al. Electronic tax certification system based on smart card and fingerprint identification[J]. Computer Engineering, 2006, 32(14):133-135.
[9] 周磊, 武建军, 张洁. 以遥感为基础的干旱监测方法研究进展[J]. 地理科学, 2015, 35(5):630-636. Zhou Lei, Wu Jianjun, Zhang Jie. Research progress of remote sensing based drought monitoring methods[J]. Geography Science, 2015, 35(5):630-636.
[10] 谢玮, 刘斌, 刘鑫,等. 大数据时代的石油地震勘探系统与软件平台[J]. 科技导报, 2017, 35(15):57-62. Xie Wei, Liu Bin, Liu Xin, et al. Petroleum seismic exploration system and software platform in big data era[J]. Science & Technology Review, 2017, 35(29):172-174.
[11] Bishop C M. Neural networks for pattern recognition[M]. New York:Oxford University Press, 1995.
[12] Kistler, Werner M. Spiking neuron models[M]. Cambridge:Cambridge University Press, 2002.
[13] Lécun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
[14] Quinlan J R. C4.5:Programs for machine learning[M]. Cambridge:Morgan Kaufmann Publishers Inc., 1992.
[15] 万赟. 从图灵测试到深度学习:人工智能60年[J]. 科技导报, 2016, 34(7):26-33. Wan Yun. From Turing test to in-depth learning:60 years of artificial intelligence[J]. Science & Technology Review, 2016, 34(7):26-33.
[16] Quinlan J R. Introduction of decision trees[J]. Machine Learning, 1986(1):81-106.
[17] Guo H, Gelfand S B. Classification trees with neural network feature extraction[J]. IEEE Transactions on Neural Networks, 1992, 3(6):923-33.
[18] 何禹德. 基于数据挖掘技术的糖尿病临床数据分析[D]. 长春:长春工业大学, 2016. He Yude. Clinical data analysis of diabetes based on data mining technology[D]. Changchun:Changchun University of Technology, 2016.
[19] Li W, Han J, Pei J. CMAR:Accurate and efficient classification based on multiple class-association rules[C]//Proceedings 2001 IEEE International Conference on Data Mining. Piscataway NJ:IEEE, 2001, 28(6):369-376.
[20] Han J, Yin X. CPAR:Classification based on predictive association rules[J]. Lecture Notes of the Institute for Computer Sciences Social Informatics & Telecommunications Engineering, 2003, 24:236-255.
[21] 唐晓东. 基于关联规则映射的生物信息网络多维数据挖掘算法[J]. 计算机应用研究, 2015, 32(6):1614-1616. Tang Xiaodong. Multidimensional data mining algorithm for bioinformatics network based on association rule mapping[J]. Application Research of Computers, 2015, 32(6):1614-1616.
[22] Liu B, Hsu W, Ma Y. Integrating classification and association rule mining[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining California AAAI, 1998, 1711:80-86.
[23] Bishop C M. Pattern recognition and machine learning (Information science and statistics)[M]. New York:Springer-Verlag New York, Inc., 2006.
[24] Friedman N, Dan G, Goldszmidt M. Bayesian network classifiers[J]. Machine Learning, 1997, 29(2/3):131-163.
[25] Sahami M. Learning limited dependence Bayesian classifiers[C]//International Conference of Knowledge Discovery and Data Mining. California:AAAI,1996:335-338.
[26] 朱凌云, 吴宝明. 医学数据挖掘的技术、方法及应用[J]. 生物医学工程学杂志, 2003(3):559-562. Zhu Lingyun, Wu Baoming. Techniques, methods and applications of medical data mining[J]. Biomedical Engineering Journal, 2003(3):559-562.
[27] Holland J H. Adaptation in natural and artificial systems:An introductory analysis with applications to biology, control, and artificial intelligence[M]. Cambridge:The MIT Press, 1975.
[28] Eberhart R, Kennedy J. A new optimizer using particle swarm theory[C]//Proceedings of the Sixth International Symposium on International Symposium on MICRO Machine and Human Science, 1995. Piscataway NJ:IEEE, 2002:39-43.
[29] 邓乃扬, 田英杰. 支持向量机:理论、算法与拓展[M]. 北京:科学出版社, 2009. Deng Naiyang, Tian Yingjie. Support vector machines:Theory, algorithms, and extensions[M]. Beijing:Science Press, 2009.
[30] Drucker H, Burges C J C, Kaufman L, et al. Support vector regression machines[J]. Advances in Neural Information Processing Systems, 1996, 28(7):779-784.
[31] Hsieh C J, Chang K W, Lin C J, et al. A dual coordinate descent method for large-scale linear SVM[C]//Proceeding of International Conference on Machine Learning. New York:ACM, 2008:408-415.
[32] 肖娟. 数据挖掘在物流业的应用综述[J]. 统计与决策, 2013(11):95-97. Xiao Juan. Application of data mining in logistics industry[J]. Statistics and Decision, 2013(11):95-97.
[33] Chen Y C, Chen C C, Peng W C, et al. Mining correlation patterns among appliances in smart home environment[J]. Lecture Notes in Computer Science, 2014, 8444:222-233..
[34] Ollmann G. The evolution of commercial malware development kits and colour-by-numberscustom malware[J]. Computer Fraud & Security, 2008(9):4-7.
[35] Ghiasi M, Sami A, Salehi Z. Dynamic VSA:A framework for malware detection based on register contents[J]. Engineering Applications of Artificial Intelligence, 2015, 44:111-122.
[36] Bruschi D, Martignoni L, Monga M. Detecting self-mutating malware using control-flow graph matching[C]//International Conference on Detection of Intrusions and Malware & Vulnerability Assessment. Verlag:Springer-Verlag, 2006:129-143.
[37] Kuzurin N, Shokurov A, Varnovsky N, et al. On the concept of software obfuscation in computer security[C]//International Conference on Information Security. Verlag:Springer-Verlag, 2007:281-298.
[38] Christodorescu M, Jha S. Testing malware detectors[C]//ACM Sigsoft International Symposium on Software Testing and Analysis. New York:ACM, 2004:34-44.
[39] Norouzi M, Souri A, Zamini M S. A data mining classification approach for behavioral malware detection[M]. Cairo:Hindawi Publishing Corp., 2016.
[40] 孙勤红. 基于梯度采样局部收敛的生物信息大数据挖掘[J]. 科技通报, 2015, 31(10):214-216. Sun Qinhong. Bioinformatics big data mining based on gradient sample local convergence[J]. Bulletin of Science and Technology, 2015, 31(10):214-216.
[41] 朱佳俊, 郑建国, 李金兵. 基于粗糙分类的不确定可拓群决策数据挖掘及应用[J]. 控制与决策, 2012, 27(6):850-854. Zhu Jiajun, Zheng Jianguo, Li Jinbing. Uncertain extension computer aided decision data mining based on rough classification[J]. Control and Decision Making, 2012, 27(6):850-854.
[42] Chen L H, Chiou T W. A fuzzy credit-rating approach for commercial loans:A Taiwan case[J]. Omega, 1999, 27(4):407-419.
[43] 刘铭, 张双全, 何禹德. 基于改进型模糊神经网络的信用卡客户违约预测[J]. 模糊系统与数学, 2017(1):143-148. Liu Ming, Zhang Shuangquan, He Yude. Credit card customer default prediction based on improved fuzzy neural network[J]. Fuzzy Systems and Mathematics, 2017(1):143-148.
[44] Fernandes K, Cardoso J S, Fernandes J. Transfer learning with partial observability applied to cervical cancer screening[C]//Iberian Conference on Pattern Recognition and Image Analysis. Berlin:Springer, 2017:243-250.
[45] Mangasarian O L, Street W N, Wolberg W H. Breast cancer diagnosis and prognosis via linear programming[J]. Operations Research, 1995, 43(4):570-577.
[46] Liu M, Dong X G. The application of improved BP neural network in the diagnosis of breast tumors[C]//International Conference on Systems and Informatics. Piscataway NJ:IEEE, 2012:1239-1242.
[47] Zheng C H, LI D W. The value of coronary arteriography in diagnosing coronary heart disease[J]. Shandong Medical Journal, 2005, 45(32):42.
[48] Karimi M, Amirfattahi R, Sadri S, et al. Noninvasive detection and classification of coronary artery occlusions using wavelet analysis of heart sounds with neural networks[C]//London:Medical Applications of Signal Processing, the 3rd IEEE International Seminal. Piscataway NJ:IEEE, 2005:117-120.
[49] Yang W, Fang P. New developments of resting ECG in detecting ventricular function in coronary artery disease[J]. Chinese Journal of Medicine, 2006, 41(1):13-16.
[50] Liu M, Wang Y, Dong X G, et al. Improved BP algorithm and its application to intelligent diagnosis of coronary heart disease[C]//International Conference on Electronic Measurement & Instruments. Piscataway NJ:IEEE, 2011:204-207.
[51] Liu M, He Y D, Wang J, et al. Hybrid intelligent algorithm and its application in geological hazard risk assessment[J]. Neurocomputing, 2015, 149(PB):847-853.
[52] Lazarova V, Manem J. Biofilm characterization and activity analysis in water and wastewater treatment[J]. Water Research, 1995, 29(10):2227-2245.
[53] Lin S, Wang X, Chao Y, et al. Predicting biofilm thickness and biofilm viability based on the concentration of carbon-nitrogen-phosphorus by support vector regression[J]. Environmental Science & Pollution Research, 2015, 23(1):418-425.
[54] 郭婷, 郑颖. 数据挖掘在国内图书情报领域的应用现状分析——基于文献计量分析和共词分析[J]. 情报科学, 2015, 33(10):91-98. Guo Ting, Zheng Ying. Application of data mining in library and information service in china-Based on bibliometric analysis and co-word analysis[J]. Information Science, 2015, 33(10):91-98.
[55] 王光宏, 蒋平. 数据挖掘综述[J]. 同济大学学报(自然科学版), 2004, 32(2):246-252. Wang guanghong, Jiang Ping. Data mining overview[J]. Journal of Tongji University(Science & Technology Edition), 2004, 32(2):264-252.
文章导航

/