|
|
|
|
Web Crawling
Foundations and Trends® in Information Retrieval
Volume 4 Issue 3
DOI: 10.1561/1500000017
Web Crawling
Christopher Olston
Yahoo! Research olston@yahoo-inc.com
Marc Najork
Microsoft Research najork@microsoft.com
SUGGESTED CITATION:
Christopher Olston and Marc Najork (2010)
"Web Crawling",
Foundations and Trends® in Information Retrieval: Vol. 4: No 3, pp 175-246.
http://dx.doi.org/10.1561/1500000017
Abstract
This is a survey of the science and practice of web crawling.
While at first glance web crawling may appear to be merely an application of
breadth-first-search, the truth is that there are many challenges ranging from
systems concerns such as managing very large data structures, to theoretical
questions such as how often to revisit evolving content sources. This survey
outlines the fundamental challenges and describes the state-of-the-art models
and solutions. It also highlights avenues for future work.
|
|
|
|
|
|
|
|
|