articleJan 1, 2010Closed access

Hive - a petabyte scale data warehouse using Hadoop

Meta (United States)

Indexed incrossref

Abstract

The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop is a popular open-source map-reduce implementation which is being used in companies like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. In this paper, we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into…

Citation impact

923
total citations
FWCI
160.84
Percentile
100%
References
2
Citations per year

Authors

9

Topics & keywords

Keywords
  • Petabyte
  • Computer science
  • Data warehouse
  • Database
  • Scripting language
  • SQL
  • NoSQL
  • Online analytical processing
UN Sustainable Development Goals
  • Industry, innovation and infrastructure
No related works found for this paper.