Date: 2022-03-22
Time: 09:15–10:05
Room: Ballroom
An application with a central database and a series of ETL (Extract, Transform, Load) flows to get data from there to the data warehouse is a familiar pattern in software architecture everywhere. It works very well, but usually, ETLs are single-purpose oriented. Additional targets create dedicated flows, which can turn into too much load and slow things down over time. A more performant alternative is to use Kafka Connect to pick up database changes and pass them to Apache Kafka. Once the data is in Kafka, can be reshaped and pushed to several downstream applications without creating additional load to the source system. This open-source data streaming platform integrates with your existing setup and with a bit of configuration can replace too-much-of-a-good-thing ETL flows and bring simplicity and performance to your data pipeline. This session will show how Apache Kafka operates and how existing data platforms, like a PostgreSQL database, can be integrated with it both as data source and target. Several Kafka Connect options will be explored in order to understand benefits and limitations. The session is intended for everyone who wants to avoid the classic "Spaghetti architecture" and base their data pipeline on Apache Kafka, the main open source data streaming technology.
It is also a good starting point for newcomers, who aren't familiar with all these buzzwords and new technologies and methods, to get a glimpse of what they might expect. Primarily if they are used to the traditional way of rolling out, they can gain new insights into how to innovate in this area.
The following slides have been made available for this session: