Federated search (federated information retrieval or distributed information retrieval) is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant answers. The results returned by selected collections are integrated and merged into a single list. Federated search is preferred over centralized search alternatives in many environments. For example, commercial search engines such as Google cannot easily index uncrawlable hidden web collections while federated search systems can search the contents of hidden web collections without crawling. In enterprise environments, where each organization maintains an independent search engine, federated search techniques can provide parallel search over multiple collections.
There are three major challenges in federated search. For each query, a subset of collections that are most likely to return relevant documents are selected. This creates the collection selection problem. To be able to select suitable collections, federated search systems need to acquire some knowledge about the contents of each collection, creating the collection representation problem. The results returned from the selected collections are merged before the final presentation to the user. This final step is the result merging problem.
The goal of this work, is to provide a comprehensive summary of the previous research on the federated search challenges described above.
Web search has significantly evolved in recent years. For many years, web search engines such as Google and Yahoo! were only providing search service over text documents. Aggregated search was one of the first steps to go beyond text search, and was the beginning of a new era for information seeking and retrieval. These days, web search engines support aggregated search over a number of verticals, and blend different types of documents (e.g. images, videos) in their search results. Moreover, web search engines have started to crawl and search the hidden web. Federated search (federated information retrieval or distributed information retrieval) has played a key role in providing the technology for aggregated search and crawling the hidden web. The application of federated search is not limited to the web search engines. There are many scenarios such as digital libraries in which information is distributed across different sources/servers. Peer-to-peer networks and personalized search are two examples in which federated search has been successfully used for searching multiple independent collections. Federated Search provides a comprehensive summary of the research done to date, looks at some of the challenges still to be faced, and suggests some directions for future research on this important and current topic.