As the web rapidly grows, massive data management and search becomes particularly important. Heterogeneous mass information and dynamic characteristics of information integration require Web crawlers to automatically access these Web pages in order to further process the data, the internal confidential information of enterprises must be only used by different internal staffs, the openness and conservative features become the major bottleneck for the enterprise development. To help out this task, some forms of the traditional resource sharing are changed, an efficient, convenient, and confidential resource sharing management platform-Enterprise Search Engine (ESE) is provided, and the design and implementation method for Deep Web ESE based on topical crawl and indexed enterprise search systems based on open source Java Lucene is proposed. After the deployment and experiment of Deep Web site in the telecommunications industry, the results are proved to meet the design target. It plays an important role in the telecommunications industry. Finally, the studies on the search accuracy and speed, anti-spam pages and fraud, etc are looked forward.
MENG Jing;LIU Shouqiang;
. Research and Design of Topical Crawl Module Based on Deep Web Search Technology[J]. Science & Technology Review, 2011
, 29(21)
: 31
-35
.
DOI: 10.3981/j.issn.1000-7857.2011.21.004