Research and Design of Topical Crawl Module Based on Deep Web Search Technology

  • MENG Jing;LIU Shouqiang;
  • 1. Guangdong Communication Polytechnic, Guangzhou 510650, China;2. School of Physics & Telecommunication Engineering, South China Normal University, Guangzhou 510006, China;3. School of Computer Science & Engineering, South China University of Technology, Guangzhou 510040, China

Received date: 2011-06-29

  Revised date: 2011-07-08

  Online published: 2011-07-28


As the web rapidly grows, massive data management and search becomes particularly important. Heterogeneous mass information and dynamic characteristics of information integration require Web crawlers to automatically access these Web pages in order to further process the data, the internal confidential information of enterprises must be only used by different internal staffs, the openness and conservative features become the major bottleneck for the enterprise development. To help out this task, some forms of the traditional resource sharing are changed, an efficient, convenient, and confidential resource sharing management platform-Enterprise Search Engine (ESE) is provided, and the design and implementation method for Deep Web ESE based on topical crawl and indexed enterprise search systems based on open source Java Lucene is proposed. After the deployment and experiment of Deep Web site in the telecommunications industry, the results are proved to meet the design target. It plays an important role in the telecommunications industry. Finally, the studies on the search accuracy and speed, anti-spam pages and fraud, etc are looked forward.

Cite this article

MENG Jing;LIU Shouqiang; . Research and Design of Topical Crawl Module Based on Deep Web Search Technology[J]. Science & Technology Review, 2011 , 29(21) : 31 -35 . DOI: 10.3981/j.issn.1000-7857.2011.21.004