Can we use big data techniques without big data infrastructure? As Java developers, we deal with data processing all the time: analyzing app logs, extracting data from Excel, copying tables between databases, to give some examples. Yet, the “standard” Java falls short in processing capabilities compared to more complex and heavy tools like Spark or Flink.
This talk is about “DataFrame” - a 2-dimensional in-memory table structure that provides filtering, column / row operations, joins, aggregations, window functions, etc. I will use an open source DFLib library (https://dflib.org) and a Jupyter notebook to demonstrate how to do data processing and visualization in a Java app with DataFrames without much fuss.