Open information extraction from the web

Banko, Michele; Cafarella, Michael; Soderland, Stephen; Broadhead, Matt; Etzioni, Oren

articleJan 6, 2007Closed access

Open information extraction from the web

MBMichele Banko MCMichael Cafarella SSStephen Soderland MBMatt Broadhead OEOren Etzioni

Abstract

Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly…

Citation impact

1,323

total citations

FWCI: 118.61
Percentile: 100%
References: 94

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Tuple
Computer science
Information extraction
Relationship extraction
Scalability
Set (abstract data type)
Information retrieval
Reduction (mathematics)

UN Sustainable Development Goals

Decent work and economic growth

No related works found for this paper.