Apache Iceberg
Apache Iceberg is a high-performance format for huge analytic tables. It brings database-like reliability (ACID transactions) to data lakes (like S3). Crucially, it is designed to handle concurrent writing from streaming engines (like Flink) and reading from batch engines (like Spark).
Why use it in EDA? It enables the "Lakehouse" architecture. You can stream data into your data lake in real-time without creating "dirty reads" or small-file problems. It allows your event streams to become a permanent, queryable historical record at a fraction of the cost of a data warehouse.
How do we use it?
- Streaming Lakehouse: Flink writes events to Iceberg continuously; analysts query them seconds later.
- Compliance: Efficiently handling GDPR "right to be forgotten" deletes in your data lake.



