Indexing compressed text
University of Pisa · Università degli Studi del Piemonte Orientale “Amedeo Avogadro”
Abstract
We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ occurrences of a pattern P [1, p ] within a text T [1, n ] in O ( p + occ log 1+ε n ) time for any chosen ε, 0<ε<1. This data structure uses at most 5 n H k ( T ) + o ( n ) bits of storage, where H k ( T ) is the k th order empirical entropy of T . The space usage is Θ( n ) bits in the worst case and o ( n ) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a…
Citation impact
- FWCI
- 30.77
- Percentile
- 100%
- References
- 49
Authors
2Topics & keywords
- Compressed suffix array
- Substring
- Suffix array
- Data structure
- Search engine indexing
- Suffix
- Computer science
- Algorithm