articleJournal of the ACMJul 1, 2005Closed access

Indexing compressed text

University of Pisa · Università degli Studi del Piemonte Orientale “Amedeo Avogadro”

Indexed incrossref

Abstract

We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P [1, p ] within a text T [1, n ] in O ( p + occ log 1+ε n ) time for any chosen ε, 0<ε<1. This data structure uses at most 5 n H k ( T ) + o ( n ) bits of storage, where H k ( T ) is the k th order empirical entropy of T . The space usage is Θ( n ) bits in the worst case and o ( n ) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a…

Citation impact

713
total citations
FWCI
30.77
Percentile
100%
References
49
Citations per year

Authors

2

Topics & keywords

Keywords
  • Compressed suffix array
  • Substring
  • Suffix array
  • Data structure
  • Search engine indexing
  • Suffix
  • Computer science
  • Algorithm
No related works found for this paper.