PyData Yerevan 2022

Viacheslav Inozemtsev

I am a Software Engineer with 10 years of professional experience in data and backend engineering. During my career I have contributed to the design and development of various data systems, such as data lake, data mesh, lakehouse, batch and streaming data pipelines. My interests: software architecture principles, programming patterns, distributed data processing frameworks, distributed storage systems, file/table formats, queueing systems, databases, resource managers, schedulers, metastores, cloud services. I hold a Specialist degree in Applied Mathematics and Computer Science, and a Master's degree in Computer Science. I speak English, German, and Russian.

The speaker's profile picture

Sessions

08-13
14:30
40min
Building a Lakehouse data platform using Delta Lake, PySpark, and Trino
Viacheslav Inozemtsev

In this talk I would like to present the concept of Lakehouse, which is a novel architecture to resolve problems and combine capabilities of the classical Data Warehouse and Data Lake. I will talk about the Delta Lake table format that resides in the core of Lakehouse. I will demonstrate how Delta Lake integrates with Apache Spark, to build data ingestion pipelines. I will also show how Delta Lake integrates with Apache Trino, to provide a fast SQL-based serving layer. As a result, I will bring all these components together to describe how they enable a modern big data platform. This talk will be useful for an intermediate level audience of data engineers, data analysts, and data scientists.

213W PAB