Web-scale information extraction in knowitall

Etzioni, Oren; Cafarella, Michael; Downey, Doug; Kok, Stanley; Popescu, Ana-Maria; Shaked, Tal; Soderland, Stephen; Weld, Daniel S.; Yates, Alexander

doi:10.1145/988672.988687

articleMay 17, 2004Closed access

Web-scale information extraction in knowitall

OEOren Etzioni MCMichael Cafarella DDDoug Downey SKStanley Kok APAna-Maria Popescu

University of Washington

Indexed incrossref

Abstract

Manually querying search engines in order to accumulate a large bodyof factual information is a tedious, error-prone process of piecemealsearch. Search engines retrieve and rank potentially relevantdocuments for human perusal, but do not extract facts, assessconfidence, or fuse information from multiple documents. This paperintroduces KnowItAll, a system that aims to automate the tedious process ofextracting large collections of facts from the web in an autonomous,domain-independent, and scalable manner.The paper describes preliminary experiments in which an instance of KnowItAll, running for four days on a single machine, was able to automatically extract 54,753 facts. KnowItAll associates a probability with…

Citation impact

751

total citations

FWCI: 112.99
Percentile: 100%
References: 37

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Computer science
Scalability
Information extraction
Search engine
Information retrieval
Process (computing)
Precision and recall
Fuse (electrical)

No related works found for this paper.

Funding

NS
National Science Foundation