Spark SQL
Berkeley College · University of California, Berkeley
Abstract
Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API. Built on our experience with Shark, Spark SQL lets Spark programmers leverage the benefits of relational processing (e.g. declarative queries and optimized storage), and lets SQL users call complex analytics libraries in Spark (e.g. machine learning). Compared to previous systems, Spark SQL makes two main additions. First, it offers much tighter integration between relational and procedural processing, through a declarative DataFrame API that integrates with procedural Spark code. Second, it includes a highly extensible optimizer, Catalyst, built using features of the Scala programming…
Citation impact
- FWCI
- 175.98
- Percentile
- 100%
- References
- 31
Authors
11Topics & keywords
- Computer science
- SQL
- SPARK (programming language)
- Programming language
- Database
- Scala
- Language Integrated Query
- Analytics