Large-scale data engineering has gone through a remarkable transformation over the past decade. We have seen wide adoption of Big Data frameworks from Apache Hadoop, and Apache Spark to Apache Flink. Today, Artificial Intelligence (AI) and Machine Learning (ML) have further broadened the scope of data engineering, which demands faster and more integrable systems that can operate on both specialized and commodity hardware.

A data science workflow is a complex interactive process. It starts with data in large data stores. We create structured data sets from these row data using ETL (Extract, Transform Load) tools such as Hadoop or Spark…

Supun Kamburugamuve

High performance data analytics

