articleKU ScholarWorks (The University of Kansas)Jan 12, 2003GREEN OA

High-order entropy-compressed text indexes

University of Pisa · Purdue University West Lafayette

Abstract

We present a novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet , where each symbol is encoded by lg|Σ| bits. We show that compressed suffix arrays use just nH_h + O(n lg lg n/lg_|Σ| n) bits, while retaining full text indexing functionalities, such as searching any pattern sequence of length m in O(m lg|Σ| + polylog(n)) time. The term H_h ≤ lg|Σ| denotes the hth-order empirical entropy of the text, which means that our index is nearly optimal in space apart from lower-order terms, achieving asymptotically the empirical…

Citation impact

672
total citations
FWCI
18.26
Percentile
100%
References
20
Citations per year

Authors

3

Topics & keywords

Keywords
  • Compressed suffix array
  • Suffix
  • Search engine indexing
  • Alphabet
  • Entropy (arrow of time)
  • Combinatorics
  • Multiplicative function
  • Suffix array
No related works found for this paper.