Apache Iceberg

Apache Iceberg is a high-performance format for huge analytic tables. It brings database-like reliability (ACID transactions) to data lakes (like S3). Crucially, it is designed to handle concurrent writing from streaming engines (like Flink) and reading from batch engines (like Spark).

‍

Why use it in EDA? It enables the "Lakehouse" architecture. You can stream data into your data lake in real-time without creating "dirty reads" or small-file problems. It allows your event streams to become a permanent, queryable historical record at a fraction of the cost of a data warehouse.

‍

How do we use it?

Streaming Lakehouse: Flink writes events to Iceberg continuously; analysts query them seconds later.‍
Compliance: Efficiently handling GDPR "right to be forgotten" deletes in your data lake.

We value your privacy! We use cookies to enhance your browsing experience and analyse our traffic.
By clicking "Accept All", you consent to our use of cookies.

Preferences Deny Accept

Apache Iceberg

Other Vendors

Other Expertises

Contact Us

Navigation