Computing recommendations at extreme scale with Apache Flink

06/01/2015 - 15:50 to 16:30
Stage 1
long talk (40 min)

Session abstract: 

Recommender Systems are a very successful application of large scale data processing. They are used to recommend new items of interest to users of a service, such as new movies on Netflix, or shopping articles on Amazon. Recommender systems have become an essential part of most web-based services to enhance the user experience. A powerful approach for implementing recommenders are the so called "latent factor models", a special case of the collaborative filtering techniques, which exploit the similarity between user tastes and item characteristics, which are automatically extracted.:

This talk details our experience with implementing three variants of the ALS (Alternating Least Squares) algorithm to train a latent factor model using the Apache Flink system and scaling them to large clusters and data sets. In order to scale these algorithms to extremely large data sets, Flink’s functionality was significantly enhanced to be able to distribute and process very large records efficiently. A preliminary presentation of the results, scaling ALS to 28 billion user ratings can be found here:



Corporate-Design: Extragestaltung, Margarethe Hausstätter
Ilustration: cyan, Berlin