Avoiding the disk bottleneck in the data domain deduplication file system

Zhu, Benjamin; Kai, Li; Patterson, Hugo

articleFeb 26, 2008Closed access

Avoiding the disk bottleneck in the data domain deduplication file system

BZBenjamin Zhu LKLi Kai HPHugo Patterson

Time Domain (United States) · Princeton University

Abstract

Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment. This paper describes three…

Citation impact

692

total citations

FWCI: 46.13
Percentile: 100%
References: 18

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Data deduplication
Computer science
Bottleneck
Throughput
Locality
Cache
File system
Operating system

No related works found for this paper.