研究论文

政府网站搜索系统的日志挖掘、行为分析及改进

  • 叶小榕 ,
  • 邵晴
展开
  • 1. 中国科学技术信息研究所, 北京100038;
    2. 北龙中网(北京)科技有限责任公司, 北京100190
叶小榕,高级工程师,研究方向为计算机软件、数字图书馆,电子信箱:yeelfine@sina.com

收稿日期: 2014-10-22

  修回日期: 2015-03-30

  网络出版日期: 2015-06-11

Log mining, behavioral analysis and improvement of government website search system

  • YE Xiaorong ,
  • SHAO Qing
Expand
  • 1. Institute of Scientific and Technical Information of China, Beijing 100038, China;
    2. KNET Co., Ltd., Beijing 100190, China

Received date: 2014-10-22

  Revised date: 2015-03-30

  Online published: 2015-06-11

摘要

为提高政府网站的搜索质量并优化网站内容, 对某政府网站现有搜索系统进行二次开发, 增加了日志挖掘模块、行为分析模块、系统改进模块, 实现了对搜索系统日志挖掘和用户行为的分析处理。日志挖掘模块负责收集、过滤和识别用户的搜索操作记录;在行为分析模块, 根据操作记录从查询过程、聚类分析和查询热词3 个角度, 分析用户行为的特点和规律, 得到了待调整权重的网页和热点查询词等分析结果;在系统改进模块, 通过调整网页的权重使查询结果更加精准, 改善了搜索系统, 根据统计查询热词, 既提供了搜索热点等新功能, 又为用户提供了个性化网页并优化了政府网站的内容, 实现了与舆情系统的数据交互。通过这些优化和改进, 从多方面使搜索系统和政府网站能更好的为用户服务。

本文引用格式

叶小榕 , 邵晴 . 政府网站搜索系统的日志挖掘、行为分析及改进[J]. 科技导报, 2015 , 33(11) : 94 -102 . DOI: 10.3981/j.issn.1000-7857.2015.11.017

Abstract

In this paper, secondary development was conducted on the search system of one e-government website by adding the log mining module, behavioral analysis module and system improvement module, to improve the search quality and optimize website content. Log mining, processing and analysis of user behaviors have been achieved in the improved search system. The log mining module is able to record, filter and identify the query log. The behavioral analysis module analyzes the characteristics and rules of user behaviors from three aspects including the query process, clustering analysis and hotspot query words, and obtains the results of weights of the webpage and hotspot query words. The system improvement module makes the query results more precise, provides new function of search hotspot and personalized webpage, improves the content of e-government website, and exchanges the data with public opinion system. In this way, the search system and e-government websites will provide users with better service.

参考文献

[1] 詹圣君. 基于用户行为日志分析的搜索引擎排序算法研究[D]. 武汉: 湖北工业大学, 2011. Zhan Shengjun. Based on user behavior log analysis of search engine ranking algorithm[D]. Wuhan: Hubei University of Technology, 2011.
[2] 岑荣伟, 刘奕群, 张敏. 基于日志挖掘的搜索引擎用户行为分析[J]. 中文信息学报, 2010, 24(3): 49-54. Ceng Rongwei, Liu Yiqun, Zhang Min. Search engine user behavior analysis based on log mining[J]. Journal of Chinese Information Processing, 2010, 24(3): 49-54.
[3] 刘承启, 邓庚盛, 江婕. 基于用户行为分析的搜索引擎研究[J]. 计算机与现代化, 2008(9): 75-77. Liu Chengqi, Deng Gengsheng, Jiang jie. Research on search engine based on user behavior analysis[J]. Computer and Modernization, 2008 (9): 75-77.
[4] 国家信息中心网络政府研究中心. 中国政府网站发展数据报告(2012)[EB/OL]. (2012-12-06) [2013-09-01]. http://www.gwd.gov.cn/uploads/ worddownload/2012_development_report_of_governments'_website.pdf. E-government Research Center of State Information Center. Development data report of Chinese government website(2012) [EB/OL]. (2012- 12- 06) [2013- 09- 01]. http://www.gwd.gov.cn/uploads/worddownload/2012 _development_report_of_governments'_website.pdf.
[5] 中国软件测评中心. 2012年中国政府网站绩效评估总报告[EB/OL]. (2012- 12- 05) [2013- 09- 01]. http://www.cstc.org.cn/zhuanti/fbh2012/ zbg1/zbg.html. China Software Testing Center. The general report of Chinese government website performance evaluation in 2012[EB/OL]. (2012-12- 05) [2013-09-01]. http://www.cstc.org.cn/zhuanti/fbh2012/zbg1/zbg.html.
[6] 陈红涛, 杨放春, 陈磊. 基于大规模中文搜索引擎的搜索日志挖掘[J]. 计算机应用研究, 2008(6): 1663-1665. Chen Hongtao, Yang Fangchun, Chen Lei. Mining query log of largescale Chinese search engine[J]. Application Research of Computers, 2008(6): 1663-1665.
[7] 张磊, 李亚楠, 王斌. 网页搜索引擎查询日志的Session划分研究[J]. 中文信息学报, 2009, 23(2): 54-61. Zhan Lei, Li Yanan, Wang Bin. Session segmentation based on query logs of web search[J]. Journal of Chinese Information Processing, 2009, 23(2): 54-61.
[8] Heasoo H, Hady W L, Lise G, et al. Organizing user search histories[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(5): 912-925.
[9] 邱娣. 基于Web日志挖掘的用户信息需求识别研究[D]. 武汉: 华中师范大学, 2012. Qiu Di. Research on user information demand of recognition based on web log mining[D]. Wuhan: Central China Normal University, 2012.
[10] 叶小榕, 邵晴. 政府网站移动搜索的日志挖掘和个性化改进[J]. 科技导报, 2014, 32(36): 110-116. Ye Xiaorong, Shao Qing. Log mining and personalization improvements for mobile search system of government websites[J]. Science & Technology Review, 2014, 32(36): 110-116.
[11] Qian Xueming, Feng He, Zhao Guoshuai, et al. Personalized recommendation combining user interest and social circle[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 26(7): 1763- 1777.
[12] 宋宇轩. 基于搜索日志和点击日志的同义词挖掘的研究和实现[D]. 北京: 北京交通大学, 2011. Song Yuxuan. The research and implementation of synonyms mining method based on the search log and click log[D]. Beijing: Beijing Jiaotong University, 2011.
[13] 乐嘉锦, 姚岚. 基于Solr的体育视频信息全文搜索研究[J]. 计算机工程, 2012, 38(24): 269-273. Le Jiajin, Yao Lan. Research on full- text search of sports video information based on Solr[J]. Computer Engineering, 2012, 38(24): 269-273.
[14] The Apache Software Foundation. Public websites using Solr[EB/OL]. (2013-09-19) [2013-10-01]. http://wiki.apache.org/solr/PublicServers.
[15] Yadav D, Sonia S C, Jorge M, et al. An approach for spatial search using Solr[C]//Confluence 2013: The Next Generation Information Technology Summit (4th International Conference). Noida, India: IET, 2013: 202-208.
[16] 闻峥. 基于Lucene的搜索引擎优化[D]. 北京: 北京交通大学, 2011. Wen Zheng. Search engine optimization based on lucene[D]. Beijing: Beijing Jiaotong University, 2011.
[17] Saravanakumar K, Aswani K C. Optimized web search results through additional retrieval lists inferred using wordnet similarity measure[C]// International Conference on Data Mining and Intelligent Computing 2014. New Delhi, India: IEEE Conference Publications, 2014: 1-7.
[18] 王宏勇. 网络舆情热点发现与分析研究[D]. 成都: 西南交通大学, 2011. Wang Hongyong. Hot-topic detection and analysis on internet public opinion[D]. Chengdu: Southwest Jiaotong University, 2011.
文章导航

/