Low latency scalable web crawling on Apache Storm

Search
06/01/2015 - 14:30 to 14:50
Stage 2
short talk (20 min)
Beginner

Session abstract: 

In this talk I will introduce Storm-Crawler https://github.com/DigitalPebble/storm-crawler, a collection of resources for building low-latency, large scale web crawlers on Apache Storm. We will compare with similar projects like Apache Nutch and present several use cases where the storm-crawler is being used.  In particular we will see how the Storm-crawler can be used with ElasticSearch and Kibana for crawling and indexing web pages.

Video: 

Slide: 

Corporate-Design: Extragestaltung, Margarethe Hausstätter
Ilustration: cyan, Berlin