Devnexus 2026

Big Data processing with Elasticsearch and Apache Spark

Track: Data Science and Machine Learning

Skill Level: Intermediate

Room: Room A403

Time Slot: Fri 2/24, 10:30 AM

Tags: nosql , elasticsearch , apache spark

Presentation Link

Abstract

This talk will explore some of the differentiating and compelling features that Elasticsearch offers, newer features added in versions 2.x, and those expected in 5.x. I will share real-world uses of Elasticsearch in processing Big Data in ETL and visualization scenarios. I’ll also cover integrating Elasticsearch with Apache Spark for processing large amounts of data in a parallelized fashion, with best practices in building a scale-out and fault-tolerant architecture deployed on AWS.

Roy Russo

Roy Russo is the co-author of Elasticsearch in Action and Vice President of Engineering at Predikto Analytics. Before joining Predikto, Roy was the Chief Architect at AltiSource Labs, a FinTech startup based in Atlanta, GA. Roy was the Co-Founder and VP of Product Management for Atlanta-based Marketing Automation vendor, LoopFuse; recently acquired by Atlanta-based SalesFusion, Inc. Roy also helped Co-Found JBoss Portal, a JSR-168 compliant enterprise Java Portal, and represented JBoss on the Java Content Repository, JSR-170.