A brief survey of web data extraction tools

Laender, Alberto H. F.; Ribeiro‐Neto, Berthier; Silva, Altigran S. da; Teixeira, Juliana S.

doi:10.1145/565117.565137

articleACM SIGMOD RecordJun 1, 2002Closed access

A brief survey of web data extraction tools

AHAlberto H. F. Laender BRBerthier Ribeiro‐Neto ASAltigran S. da Silva JSJuliana S. Teixeira

Universidade Federal de Minas Gerais

Indexed incrossref

Abstract

In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data…

Citation impact

690

total citations

FWCI: 75.00
Percentile: 100%
References: 40

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Data extraction
Information retrieval
Information extraction
Web page
Web modeling
Data Web
Data science

UN Sustainable Development Goals

Quality Education

No related works found for this paper.