articleJun 9, 2008Closed access

Pig latin

Yahoo (United States)

Indexed incrossref

Abstract

There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural map-reduce programming model, and its associated scalable implementations on commodity hardware, is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of…

Citation impact

1,744
total citations
FWCI
166.59
Percentile
100%
References
16
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Scalability
  • Implementation
  • Terabyte
  • Programming paradigm
  • SQL
  • Code reuse
  • Reuse
UN Sustainable Development Goals
  • Industry, innovation and infrastructure
No related works found for this paper.