研究论文

Parzen窗核密度估计的大规模数据模式分类隐私保护方法

  • 原永滨 ,
  • 杨静 ,
  • 张健沛 ,
  • 于旭
展开
  • 1. 哈尔滨工程大学计算机科学与技术学院, 哈尔滨150001;
    2. 福州大学电气工程与自动化学院, 福州350108;
    3. 青岛科技大学信息科学与技术学院, 青岛266001
原永滨, 副教授, 研究方向为隐私保护、机器学习, 电子信箱:yyb1688@163.com

收稿日期: 2014-01-09

  修回日期: 2014-09-03

  网络出版日期: 2015-01-09

基金资助

国家自然科学基金项目(61073041, 61073043, 61370083, 61402126);黑龙江省自然科学基金项目(F200901);福建省自然科学基金项目(2011J1296);高等学校博士学科点基金项目(20112304110011, 20112304110012)

A Pattern Classification Privacy Preservation Algorithm Based on Parzen Window Kernel Density Estimation for Large Data Set

  • YUAN Yongbin ,
  • YANG Jing ,
  • ZHANG Jianpei ,
  • YU Xu
Expand
  • 1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China;
    2. College of Electrical Engineering & Automation, Fuzhou University, Fuzhou 350108, China;
    3. College of Information Science and Technology, Qingdao University of Science & Technology, Qingdao 266001, China

Received date: 2014-01-09

  Revised date: 2014-09-03

  Online published: 2015-01-09

摘要

针对大规模数据集上的模式分类任务, 提出基于Parzen 窗核密度估计的模式分类隐私保护算法。利用Parzen 窗算法对原始大规模训练集服从的概率密度进行估计, 根据估计的概率密度函数构造la 个替换训练样本, 其中l 为原始样本的数目, a 通过10 折交叉验证方式确定。最后发布替换训练样本进行模式分类, 以实现原始数据上的隐私保护。在Adult 数据集上的仿真实验充分验证了算法的有效性。

本文引用格式

原永滨 , 杨静 , 张健沛 , 于旭 . Parzen窗核密度估计的大规模数据模式分类隐私保护方法[J]. 科技导报, 2014 , 32(36) : 104 -109 . DOI: 10.3981/j.issn.1000-7857.2014.36.017

Abstract

In this paper, a pattern classification privacy preservation algorithm is proposed based on the Parzen window kernel density estimation on large scale dataset. Firstly, the probability density is estimated through the original large scale training set. Then the replacement training samples are constructed by the estimated probability. Finally, the replacement training samples are published for the pattern classification training. Thus the privacy on the original training set can be protected effectively. The simulation experiments on Adult datasets fully verify the effectiveness of the proposed algorithm.

参考文献

[1] Han J W, Kamber M. Data mining: Concepts and techniques[M]. San Francisco, CA: Morgan Kaufmann, 2001: 257-259.
[2] 周水庚, 李丰, 陶宇飞, 等. 面向数据库应用的隐私保护研究综述[J]. 计算机学报, 2009, 32(5): 847-861. Zhou Shuigeng, Li Feng, Tao Yufei, et al. Privacy preservation in database applications: A survey[J]. Chinese Journal of Computers, 2009, 32(5): 847-861.
[3] 周恩策, 刘纯平, 张玲燕, 等. 基于时间窗的自适应核密度估计运动检 测方法[J]. 通信学报, 2011, 3(2): 106-114. Zhou Ence, Liu Chunping, Zhang Lingyan, et al. Foreground object detection based on time information window adaptive kernel density estimation[J]. Journal on Communications, 2011, 3(2): 106-114.
[4] Yang J, Yu X, Xie Z Q. A novel virtual sample generation method based on Gaussian distribution[J]. Knowledge-Based Systems, 2011, 24 (6): 740-748.
[5] Cortes C, Vapnik V. Support vector networks[J]. Machine Learning, 1995, 20(8): 273-297.
[6] Quinlan J R. C4.5: Programs for Machine Learning[M]. San Mateo, CA: Morgan Kaufmann, 1993, 17-69.
[7] Xiao X, Tao Y. Personalized privacy preservation[C]//Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. Illinois, Chicago: ACM, 2006: 229-240.
[8] Sweeney L. K-anonymity: A model for protecting privacy[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002, 10(5): 557-570.
[9] Machanavajjhala A, Kifer D, Gehrke J, et al. L-diversity: Privacy beyond K-anonymity[J]. ACM Transactions on Knowledge Discovery from Data, 2007(1): 3-15.
[10] Agrawal R, Srikant R. Privacy-preserving data mining[J]. ACM Sigmod Record, 2000, 29(2): 439-450.
[11] Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C]//Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, 1995, 14(2): 1137-1145.
文章导航

/