Avoiding the disk bottleneck in the data domain deduplication file system
Time Domain (United States) · Princeton University
Abstract
Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment. This paper describes three…
Citation impact
- FWCI
- 46.13
- Percentile
- 100%
- References
- 18
Authors
3Topics & keywords
- Data deduplication
- Computer science
- Bottleneck
- Throughput
- Locality
- Cache
- File system
- Operating system