Real Time Big Data Analytics with Kafka, Storm & HBase

Scale
06/01/2015 - 15:00 to 15:40
Stage 3
long talk (40 min)
Intermediate

Session abstract: 

Relevance and Personalization is crucial to building personalized local commerce experience at Groupon. We have built infrastructure that processes real time user interaction stream and produces personalized real time analytics that are further enhanced to present relevant personalized experience to hundreds of millions of users of Groupon across the world. This talk covers the use case and use of our Kafka-Storm-HBase-Redis pipeline to ingest over 3 million data points per second in real time which in turn brings in millions of dollars in additional revenue. Specially we will discuss how we scaled this system for hundreds of millions of users including solution choices, different techniques and strategies, traditional and innovative approaches. Solution includes some interesting algorithmic choices to reduce data size such as bloom filters and HyperLogLog, as well as use of big data technologies such as HBase, Kafka & Storm. Attendees can take away learnings from our real-life experience that can help them understand various tuning methods, their tradeoffs and apply them in their solutions  

Video: 

Slide: 

Corporate-Design: Extragestaltung, Margarethe Hausstätter
Ilustration: cyan, Berlin