Hive - a petabyte scale data warehouse using Hadoop

Thusoo, Ashish; Sarma, Joydeep Sen; Jain, Namit; Shao, Zheng; Chakka, Prasad; Zhang, Ning; Antony, Suresh; Liu, Hao; Murthy, Raghotham

doi:10.1109/icde.2010.5447738

articleJan 1, 2010Closed access

Hive - a petabyte scale data warehouse using Hadoop

ATAshish Thusoo JSJoydeep Sen Sarma NJNamit Jain ZSZheng Shao PCPrasad Chakka

Meta (United States)

Indexed incrossref

Abstract

The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop is a popular open-source map-reduce implementation which is being used in companies like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. In this paper, we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into…

Citation impact

923

total citations

FWCI: 160.84
Percentile: 100%
References: 2

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Petabyte
Computer science
Data warehouse
Database
Scripting language
SQL
NoSQL
Online analytical processing

UN Sustainable Development Goals

Industry, innovation and infrastructure

No related works found for this paper.