In this paper, a pattern classification privacy preservation algorithm is proposed based on the Parzen window kernel density estimation on large scale dataset. Firstly, the probability density is estimated through the original large scale training set. Then the replacement training samples are constructed by the estimated probability. Finally, the replacement training samples are published for the pattern classification training. Thus the privacy on the original training set can be protected effectively. The simulation experiments on Adult datasets fully verify the effectiveness of the proposed algorithm.
YUAN Yongbin
,
YANG Jing
,
ZHANG Jianpei
,
YU Xu
. A Pattern Classification Privacy Preservation Algorithm Based on Parzen Window Kernel Density Estimation for Large Data Set[J]. Science & Technology Review, 2014
, 32(36)
: 104
-109
.
DOI: 10.3981/j.issn.1000-7857.2014.36.017
[1] Han J W, Kamber M. Data mining: Concepts and techniques[M]. San Francisco, CA: Morgan Kaufmann, 2001: 257-259.
[2] 周水庚, 李丰, 陶宇飞, 等. 面向数据库应用的隐私保护研究综述[J]. 计算机学报, 2009, 32(5): 847-861. Zhou Shuigeng, Li Feng, Tao Yufei, et al. Privacy preservation in database applications: A survey[J]. Chinese Journal of Computers, 2009, 32(5): 847-861.
[3] 周恩策, 刘纯平, 张玲燕, 等. 基于时间窗的自适应核密度估计运动检 测方法[J]. 通信学报, 2011, 3(2): 106-114. Zhou Ence, Liu Chunping, Zhang Lingyan, et al. Foreground object detection based on time information window adaptive kernel density estimation[J]. Journal on Communications, 2011, 3(2): 106-114.
[4] Yang J, Yu X, Xie Z Q. A novel virtual sample generation method based on Gaussian distribution[J]. Knowledge-Based Systems, 2011, 24 (6): 740-748.
[5] Cortes C, Vapnik V. Support vector networks[J]. Machine Learning, 1995, 20(8): 273-297.
[6] Quinlan J R. C4.5: Programs for Machine Learning[M]. San Mateo, CA: Morgan Kaufmann, 1993, 17-69.
[7] Xiao X, Tao Y. Personalized privacy preservation[C]//Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. Illinois, Chicago: ACM, 2006: 229-240.
[8] Sweeney L. K-anonymity: A model for protecting privacy[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002, 10(5): 557-570.
[9] Machanavajjhala A, Kifer D, Gehrke J, et al. L-diversity: Privacy beyond K-anonymity[J]. ACM Transactions on Knowledge Discovery from Data, 2007(1): 3-15.
[10] Agrawal R, Srikant R. Privacy-preserving data mining[J]. ACM Sigmod Record, 2000, 29(2): 439-450.
[11] Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C]//Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, 1995, 14(2): 1137-1145.