A Pattern Classification Privacy Preserve Algorithm for Sparse Data Based on Primary Component Analysis

YUAN Yongbin; YANG Jing; ZHANG Jianpei; YU Xu

doi:10.3981/j.issn.1000-7857.2014.12.010

Science & Technology Review >

2014 , Vol. 32 >Issue 12: 68 - 73

DOI: https://doi.org/10.3981/j.issn.1000-7857.2014.12.010

Articles

A Pattern Classification Privacy Preserve Algorithm for Sparse Data Based on Primary Component Analysis

YUAN Yongbin ,
YANG Jing ,
ZHANG Jianpei ,
YU Xu

Expand

1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China;
2. College of Electrical Engineering & Automation, Fuzhou University, Fuzhou 350108, China;
3. School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266001, China

Received date: 2014-01-06

Revised date: 2014-03-16

Online published: 2014-05-13

Fold

Abstract

The pattern classification process involves the learning from the original training samples, which easily leads to privacy disclosure. In order to avoid the leaks of privacy in the pattern classification process and not to affect the performance of the algorithm, this paper proposes a pattern classification privacy preserve algorithm based on the primary component analysis (PCA). This algorithm extracts the principal component of the original training data and converts the original training samples to new samples corresponding to the primary components. Then, a classification model is trained on the new samples. Experiments are carried out on the Adult data set and the KDD CUP 99 data set, and the precision and recall indexes are used to evaluate the proposed algorithm. It is shown that this algorithm can avoid the leakage of the original attributes through extracting the principal components of the feature attributes about the raw data. PCA can achieve de-noising to some extent, so that the classification performance on the classifier is better than that on the original data set. Therefore, compared with the existing algorithms, this algorithm has better pattern classification accuracy and privacy preserve performance.

Key words： primary component analysis; pattern classification; privacy preserve algorithms

Cite this article

YUAN Yongbin , YANG Jing , ZHANG Jianpei , YU Xu . A Pattern Classification Privacy Preserve Algorithm for Sparse Data Based on Primary Component Analysis[J]. Science & Technology Review, 2014 , 32(12) : 68 -73 . DOI: 10.3981/j.issn.1000-7857.2014.12.010

References

[1] Han J, Kamber M, Pei J. Data mining: Concepts and techniques[M]. CA,San Mateo: Morgan kaufmann, 2006.
[2] Sweeney L. k-anonymity: A model for protecting privacy[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002, 10(5): 557-570.
[3] Machanavajjhala A, Kifer D, Gehrke J, et al. L-diversity: Privacy beyond k-anonymity[J]. ACM Transactions on Knowledge Discovery from Data, 2007(1): 3.
[4] 田秀霞, 王晓玲, 高明, 等. 数据库服务——安全与隐私保护[J]. 软件学报, 2010, 21(5): 991-1006. Tian Xiuxia, Wang Xiaoling, Gao Ming, et al. Database as a service? security and privacy preserving[J]. Journal of Software, 2010, 21(5): 991-1006.
[5] Yang J, Yu X, Xie Z Q, et al. A novel virtual sample generation method based on Gaussian distribution[J]. Knowledge-Based Systems, 2011, 24(6): 740-748.
[6] 戴群, 陈松灿, 王喆. 一个基于自组织特征映射网络的混合神经网络结构[J]. 软件学报, 2009, 20(5): 1329-1336. Dai Qun, Chen Songcan, Wang Zhe. Hybrid neural network architecture based on self-organizing feature maps[J]. Journal of Software, 2009, 20(5): 1329-1336.
[7] 杨静, 辛宇, 谢志强. 面向物联网传感器事件监测的双向反馈系统[J]. 计算机学报, 2013, 36(3): 506-520. Yang Jing, Xin Yu, Xie Zhiqiang. A bi-feedback system of wireless sensor network event detection in the internet of things[J]. Chinese Journal of Computers, 2013, 36(3): 506-520.
[8] Cortes C, Vapnik V. Support-vector networks[J]. Machine learning, 1995, 20(3): 273-297.
[9] 曾志强, 高济. 基于向量集约简的精简支持向量机[J]. 软件学报, 2007, 18(11): 2719-2727. Zeng Zhiqiang, Gao Ji. Simplified support vector machine based on reduced vector set method[J]. Journal of Software, 2007, 18(11): 2719-2727.
[10] 顾彬, 郑关胜, 王建东. 增量和减量式标准支持向量机的分析[J]. 软件学报, 2013, 24(7): 1601-1613. Gu Bin, Zheng Guansheng, Wang Jiandong. Analysis for incremental and decremental standard support vector machine[J]. Journal of Software, 2013, 24(7): 1601-1613.
[11] Quinlan J R. C4.5: Programs for machine learning[M]. San Mateo, CA: Morgan Kaufmann, 1993.
[12] Zhou Z H, Jiang Y. NeC4.5: Neural ensemble based C4.5[J]. Knowledge and Data Engineering, IEEE Transactions on, 2004, 16(6): 770-773.
[13] Breiman L, Friedman J, Stone C J, et al. Classification and regression trees[M]. Florida: CRC Press, 1984.
[14] Agrawal R, Srikant R. Privacy-preserving data mining[J]. ACM Sigmod Record, 2000, 29(2): 439-450.
[15] Kargupta H, Datta S, Wang Q, et al. On the privacy preserving properties of random data perturbation techniques[C]//Data Mining, 2003. Third IEEE International Conference on. New York: IEEE, 2003: 99-106.
[16] Bapna S, Gangopadhyay A. A wavelet-based approach to preserve privacy for classification mining[J]. Decision Sciences, 2006, 37(4): 623-642.
[17] 胡文军, 王士同. 隐私保护的SVM 快速分类方法[J] . 电子学报, 2012, 40(2): 280-286. Hu Wenjun, Wang Shitong. Fast classification approach of support vector machine with privacy preservation[J]. Acta Electronica Sinica, 2012, 40(2): 280-286.
[18] Xiao X, Tao Y. Personalized privacy preservation[C]//Proceedings of the 2006 ACM SIGMOD International Conference on Management of data. Chicago: ACM, 2006: 229-240.
[19] Duda R O, Hart P E, Stork D G. Pattern classification[M]. New York: John Wiley & Sons, 2012.
[20] Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C]// IJCAI'95 Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1995, 2: 1137-1145.
[21] 董春曦. 支持向量机及其在入侵检测中的应用研究[D]. 西安: 西安电子科技大学, 2004. Dong Chunxi. Study of support vector machines and its application in intrusion detection systems[D]. Xi'an: Xidian University, 2004.

Options

Outlines

Abstract

Cite this article

References

Contact

Visited

模态框（Modal）标题

Abstract

Cite this article

References

Contact

Visited