Low latency scalable web crawling on Apache Storm

06/01/2015 - 14:30 to 14:50
In this talk I will introduce Storm-Crawler https://github.com/DigitalPebble/storm-crawler, a collection of resources for building low-latency, large scale web crawlers on Apache Storm. We will compare with similar projects like Apache Nutch and present several use cases where the storm-crawler is being used.  In particular we will see how the Storm-crawler can be used with ElasticSearch and Kibana for crawling and indexing web pages.



