Log Mining and Personalization Improvement for Mobile Search System of Government Websites

  • YE Xiaorong ,
  • SHAO Qing
  • 1. Institute of Scientific and Technical Information of China, Beijing 100038, China;
    2. KNET Co., Ltd., Beijing 100190, China

Received date: 2014-10-22

  Revised date: 2014-11-20

  Online published: 2015-01-09


By taking full advantage of the characteristics of mobile search and government website, a log mining and customization system, which makes use of the advantages of Hadoop in large data processing, is designed and developed. First, it uses Flume and HDFS to realize the collection and storage of massive log and to provide source data and program interface of log mining. Second, the system uses MapReduce to efficiently analyze the log by taking advantage of labels and navigation bar of search result pages. Thus, the vector space model of search result pages and user interest model are established. Third, based on user interest model and combined with MapReduce again, the K-means algorithm which is for cluster analysis is used. Then, users are divided into different interest groups depending on their interests. Finally, by calculating the distance between search result page and the user's interest group, whether the user is interested in this page is determined, then the system adjusts the order of search results and pushes a new page to this user accordingly. Therefore, the personalized search and push function are implemented.

Cite this article

YE Xiaorong , SHAO Qing . Log Mining and Personalization Improvement for Mobile Search System of Government Websites[J]. Science & Technology Review, 2014 , 32(36) : 110 -116 . DOI: 10.3981/j.issn.1000-7857.2014.36.018


[1] 中国互联网络信息中心. 第34 次中国互联网络发展状况统计报告[EB/OL]. 2014-07-21[2014-08-20]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201407/P020140721507223212132.pdf. China Internet Network Information Center. The 34th statistical report on internet development in China[EB/OL]. 2014-07-21[2014-08-20]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201407/P020140721507223212132.pdf.
[2] 王继民, 李雷明子, 郑玉凤. 基于日志挖掘的移动搜索用户行为研究 综述[J]. 情报理论与实践, 2014, 37(3): 134-139. Wang Jimin, Li Leimingzi, Zheng Yufeng. Review on mobile users search behavior based on Web log mining[J]. Information Studies: Theory & Application, 2014, 37(3): 134-139.
[3] 万飞, 赵溪, 梁循, 等. 基于移动互联网日志的搜索引擎用户行为研究[J]. 中文信息学报, 2014, 28(2): 144-150. Wan Fei, Zhao Xi, Liang Xun, et al. Research on search engine mobile Internet user behavior based on log[J]. Journal of Chinese Information Processing, 2014, 28(2): 144-150.
[4] 赵龙. 基于hadoop的海量搜索日志分析平台的设计和实现[D]. 大连: 大连理工大学, 2013. Zhao Long. The design and implementation of massive search logs analysis platform based on hadoop[D]. Dalian: Dalian University of Technology, 2013.
[5] 周婷婷. 基于海量查询日志的数据挖掘及用户行为分析[D]. 北京: 北 京邮电大学, 2012. Zhou Tingting. Data mining and user behavior analysis based on the massive query log[D]. Beijing: Beijing University of Posts and Telecommunications, 2012.
[6] 王振宇, 郭力. 基于Hadoop的搜索引擎用户行为分析[J]. 计算机工程 与科学, 2011, 33(4): 115-120. Wang Zhenyu, Guo Li. Search engine user behavior analysis based on Hadoop[J]. Computer Engineering & Science, 2011, 33(4): 115-120.
[7] 胡晓, 王理, 潘守慧. 基于改进VSM的Web文本分类方法[J]. 情报杂 志, 2010, 29(5): 144-147. Hu Xiao, Wang Li, Pan Shouhui. Web text classification method based on improved VSM[J]. Journal of Intelligence, 2010, 29(5): 144-147.
[8] 周炎涛, 唐剑波, 王家琴. 基于信息熵的改进TFIDF特征选择算法[J]. 计算机工程与应用, 2007, 43(35): 156-171. Zhou Yantao, Tang Jianbo, Wang Jiaqin. Improved TFIDF feature selection algorithm based on information entropy[J]. Computer Engineering and Applications, 2007, 43(35): 156-171.
[9] 李杉, 刘莉莉. 基于MapReduce的Web日志挖掘[J]. 计算机工程与应 用, 2012, 48(22): 95-98. Li Shan, Liu Lili. MapReduce log mining based on Web[J]. Computer Engineering and Applications, 2012, 48(22): 95-98.
[10] Amresh K, Kiran M, Prathap B R. Verification and validation of mapreduce program model for parallel K-means algorithm on hadoop cluster [C]//2013 Fourth International Conference on Computing, Communications and Networking Technologies. Tiruchengode, India: IEEE, 2013: 274-282.
[11] 江小平, 李成华, 向文, 等. K-means聚类算法的MapReduce并行化实 现[J]. 华中科技大学学报: 自然科学版, 2011, 39(6): 120-124. Jiang Xiaoping, Li Chenghua, Xiang Wen, et al. Parallel implementation of K-means clustering algorithm MapReduce[J]. Journal of Huazhong University of Science and Technology: Natural Science Edition, 2011, 39 (6): 120-124.
[12] 周婷, 张君瑛, 罗成. 基于Hadoop的K-means聚类算法的实现[J]. 计 算机工程与发展, 2013, 23(4): 18-21. Zhou Ting, Zhang Junying, Luo Cheng. Realization of K-means clustering algorithm based on Hadoop[J]. Computer Technology and Development, 2013, 23(4): 18-21.
[13] 冀素琴, 石洪波. 基于MapReduce的K-means聚类集成[J]. 计算机工 程, 2013, 39(9): 84-87. Yi Suqin, Shi Hongbo. Clustering of K-means integration based on MapReduce[J]. Computer Engineering, 2013, 39(9): 84-87.
[14] 倪红军. 基于Android平台的消息推送研究与实现[J]. 实验室研究与 探索, 2014, 33(5): 96-100. Ni Hongjun. Research and implementation of push messages based on Android platform[J]. Research and Exploration in Laboratory, 2014, 38 (5): 96-100.