articleMay 1, 2010Closed access
The Hadoop Distributed File System
Yahoo (United States) · Yahoo (United Kingdom)
Indexed incrossref
Abstract
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.
Citation impact
4,823
total citations
- FWCI
- 396.71
- Percentile
- 100%
- References
- 22
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Computer science
- Petabyte
- Server
- Distributed File System
- Operating system
- Distributed data store
- File system
- File server
No related works found for this paper.