HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm
Computer Algorithms for Medicine · Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
Abstract
This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of \emphdistinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, "short bytes''), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about $1.04/\sqrt{m}$. This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond $10^9$ with a typical…
Citation impact
- FWCI
- 6.06
- Percentile
- 100%
- References
- 28
Authors
4Topics & keywords
- Cardinality (data modeling)
- Estimator
- Algorithm
- Byte
- Mathematics
- Sliding window protocol
- Window (computing)
- Probabilistic logic