Hive on Spark

Scale
06/02/2015 - 11:30 to 12:10
Stage 3
long talk (40 min)
Intermediate

Session abstract: 

Apache Hive is a popular SQL interface for batch processing and ETL using Apache Hadoop.  Until recently, MapReduce was the only Hadoop execution engine for Hive queries. But today, alternative execution engines are available — such as Apache Spark and Apache Tez.  The Hive and Spark communities are joining forces to introduce Spark as a new execution engine option for Hive.eins zwo

In this talk we'll discuss the Hive on Spark project.  Topics include the motivations, such as improving Hive user experience and streamlining operational management for Spark shops, some background and comparisons of MapRededuce and Spark, and the technical process of porting a complex real-world application from MapReduce to Spark.  A demo will also be presented.

Video: 

Slide: 

Corporate-Design: Extragestaltung, Margarethe Hausstätter
Ilustration: cyan, Berlin