Hive on Spark

Scale

06/02/2015 - 11:30 to 12:10

Stage 3

long talk (40 min)

Intermediate

Session abstract:

Apache Hive is a popular SQL interface for batch processing and ETL using Apache Hadoop. Until recently, MapReduce was the only Hadoop execution engine for Hive queries. But today, alternative execution engines are available — such as Apache Spark and Apache Tez. The Hive and Spark communities are joining forces to introduce Spark as a new execution engine option for Hive.eins zwo

In this talk we'll discuss the Hive on Spark project. Topics include the motivations, such as improving Hive user experience and streamlining operational management for Spark shops, some background and comparisons of MapRededuce and Spark, and the technical process of porting a complex real-world application from MapReduce to Spark. A demo will also be presented.

Video:

#bbuzz 2015: Szehon Ho - Hive on Spark

00:00

Slide:

szehon_ho-hive_on_spark.pdf

Berlin Buzzwords

Hive on Spark

Session abstract:

Video:

#bbuzz 2015: Szehon Ho - Hive on Spark

Slide:

szehon_ho-hive_on_spark.pdf

Partners

Platinum Partner

Past conferences