articleMay 27, 2015Closed access

Spark SQL

Berkeley College · University of California, Berkeley

Indexed incrossref

Abstract

Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API. Built on our experience with Shark, Spark SQL lets Spark programmers leverage the benefits of relational processing (e.g. declarative queries and optimized storage), and lets SQL users call complex analytics libraries in Spark (e.g. machine learning). Compared to previous systems, Spark SQL makes two main additions. First, it offers much tighter integration between relational and procedural processing, through a declarative DataFrame API that integrates with procedural Spark code. Second, it includes a highly extensible optimizer, Catalyst, built using features of the Scala programming…

Citation impact

1,219
total citations
FWCI
175.98
Percentile
100%
References
31
Citations per year

Authors

11

Topics & keywords

Keywords
  • Computer science
  • SQL
  • SPARK (programming language)
  • Programming language
  • Database
  • Scala
  • Language Integrated Query
  • Analytics
No related works found for this paper.

Funding