Indexing compressed text

Ferragina, Paolo; Manzini, Giovanni

doi:10.1145/1082036.1082039

articleJournal of the ACMJul 1, 2005Closed access

Indexing compressed text

PFPaolo Ferragina GMGiovanni Manzini

University of Pisa · Università degli Studi del Piemonte Orientale “Amedeo Avogadro”

Indexed incrossref

Abstract

We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P [1, p ] within a text T [1, n ] in O ( p + occ log 1+ε n ) time for any chosen ε, 0<ε<1. This data structure uses at most 5 n H k ( T ) + o ( n ) bits of storage, where H k ( T ) is the k th order empirical entropy of T . The space usage is Θ( n ) bits in the worst case and o ( n ) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a…

Citation impact

713

total citations

FWCI: 30.77
Percentile: 100%
References: 49

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Compressed suffix array
Substring
Suffix array
Data structure
Search engine indexing
Suffix
Computer science
Algorithm

No related works found for this paper.