Big Data processing with Elasticsearch and Apache Spark

Track: Data Science and Machine Learning
Skill Level: Intermediate
Room: Room A403
Time Slot: Fri 2/24, 10:30 AM
Tags: nosql , elasticsearch , apache spark
Presentation Link
Abstract

This talk will explore some of the differentiating and compelling features that Elasticsearch offers, newer features added in versions 2.x, and those expected in 5.x. I will share real-world uses of Elasticsearch in processing Big Data in ETL and visualization scenarios. I’ll also cover integrating Elasticsearch with Apache Spark for processing large amounts of data in a parallelized fashion, with best practices in building a scale-out and fault-tolerant architecture deployed on AWS.

Roy Russo

Roy Russo is the co-author of Elasticsearch in Action and Vice President of Engineering at Predikto Analytics. Before joining Predikto, Roy was the Chief Architect at AltiSource Labs, a FinTech startup based in Atlanta, GA. Roy was the Co-Founder and VP of Product Management for Atlanta-based Marketing Automation vendor, LoopFuse; recently acquired by Atlanta-based SalesFusion, Inc. Roy also helped Co-Found JBoss Portal, a JSR-168 compliant enterprise Java Portal, and represented JBoss on the Java Content Repository, JSR-170.