articleFeb 26, 2008Closed access

Avoiding the disk bottleneck in the data domain deduplication file system

Time Domain (United States) · Princeton University

Abstract

Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment. This paper describes three…

Citation impact

692
total citations
FWCI
46.13
Percentile
100%
References
18
Citations per year

Authors

3

Topics & keywords

Keywords
  • Data deduplication
  • Computer science
  • Bottleneck
  • Throughput
  • Locality
  • Cache
  • File system
  • Operating system
No related works found for this paper.