For a decade, the data lake had a dirty secret: it was a collection of Parquet files with no ACID guarantees, no reliable schema enforcement, and no way to do a transactional update. Dropping a bad batch of data and replacing it with a corrected one was a manual, fragile operation. Running analytics while a write job was in progress could yield corrupted results. GDPR deletion requests were nightmares.
Open table formats — Delta Lake, Apache Iceberg, and Apache Hudi — solved these problems. They sit above your Parquet files and add a metadata and transaction layer that gives you ACID semantics, time travel, schema evolution, and more. Understanding what each one is and how they differ is now fundamental data engineering knowledge.
The problem they solve
Object storage (S3, GCS, Azure Blob) treats files as immutable objects. You can PUT a file, GET a file, or DELETE a file — but you can't atomically update a row within a file. There's no locking, no transactions.
This means writing to a Parquet "table" (a directory of files) is not atomic. If a write job fails halfway through, some new files are written but not all. A reader who queries during the write gets partial data. There's no way to know what the "correct" state of the table is.
Open table formats solve this with a transaction log — a separate metadata structure that tracks which files are the authoritative current version of the table at any point in time.
Delta Lake
Delta Lake (created by Databricks, now Linux Foundation) stores its transaction log in a _delta_log/ directory alongside the data files. Each transaction creates a new JSON (or Parquet for compacted checkpoints) log entry that records what files were added or removed.
s3://data-lake/orders/
├── _delta_log/
│ ├── 00000000000000000000.json ← initial write (adds 3 files)
│ ├── 00000000000000000001.json ← second write (adds 2 files)
│ ├── 00000000000000000002.json ← DELETE operation (removes 1 file, adds 1)
│ └── 00000000000000000010.checkpoint.parquet ← compacted checkpoint
├── part-00000-abc.parquet
├── part-00001-def.parquet
└── ...
When a reader queries the table, it reads the log to determine the current valid set of files, then reads only those files. Files that have been "deleted" (replaced by a new version) are still present on disk but excluded by the log — this is how time travel works.
Time travel with Delta Lake:
-- Read the table as it was 24 hours ago
SELECT * FROM orders TIMESTAMP AS OF '2024-01-14 06:00:00';
-- Read a specific version
SELECT * FROM orders VERSION AS OF 5;
-- Restore the table to a previous version
RESTORE TABLE orders TO VERSION AS OF 3;
Delta Lake is tightly integrated with Databricks and Spark. It has the largest ecosystem of compatible tools and is the default format on Databricks clusters.
Apache Iceberg
Iceberg (created at Netflix, now Apache) has a richer metadata model than Delta Lake. Rather than a single sequential log, Iceberg maintains a hierarchy: snapshots → manifest lists → manifest files → data files.
Iceberg metadata hierarchy:
Table metadata file (JSON)
└── Snapshot (each write creates a snapshot)
└── Manifest list (one per snapshot)
└── Manifest files (track subsets of data files)
└── Data files (Parquet)
This layered structure enables several things Delta Lake can't do as cleanly:
Hidden partitioning. The partition scheme is tracked in Iceberg metadata, not in file paths. You can change how data is partitioned (e.g., from monthly to daily) as a metadata-only operation — existing files don't move. Writers don't even need to know the partition column; Iceberg handles it.
Partition evolution. Change the partition strategy over time without rewriting data. Old data stays in the old partition layout; new data uses the new one. Queries transparently handle both.
Row-level deletes without rewrite. Iceberg supports delete files — a file that records which specific rows in a data file should be treated as deleted — allowing row-level deletes without rewriting the entire Parquet file. (Delta Lake uses a copy-on-write approach by default.)
Iceberg has become the preferred format for multi-engine environments. Every major query engine supports it: Spark, Trino, Flink, Dremio, StarRocks, Snowflake (external tables), BigQuery Omni, Athena, DuckDB.
Apache Hudi
Hudi (created at Uber) was built for a specific problem: frequent row-level updates and deletes on large tables — the kind of workload that CDC streaming generates. Uber needed to apply thousands of row-level updates per second from their CDC streams into a data lake.
Hudi has two storage types:
- Copy-on-Write (CoW): on update, the affected file is rewritten entirely. Reads are fast (clean files), writes are expensive (full rewrite per affected file). Good for read-heavy, infrequent-update workloads.
- Merge-on-Read (MoR): updates are written to small delta log files, merged with the base files at read time. Writes are cheap (small append), reads are slightly more expensive (need to merge). Good for high-update-frequency CDC workloads.
Hudi is less commonly adopted outside of Uber's ecosystem and teams with very high CDC update volumes. For most use cases, Iceberg or Delta Lake is a better choice.
Which to choose
Practical guidance for 2024:
- Using Databricks? Use Delta Lake. It's the native format, best supported, and zero friction.
- Using multiple query engines (Spark + Trino + Athena, or Snowflake + Spark)? Use Iceberg. Its broader engine support and hidden partitioning make it the most future-proof choice.
- High-frequency CDC updates to the lake? Consider Hudi (MoR) or Iceberg with position-delete files.
- Greenfield, cloud-agnostic? Iceberg. It has the most momentum in the broader open-source ecosystem.
All three formats implement the same core ideas: ACID transactions, time travel, schema enforcement, and the ability to update/delete rows. The differences are in partition metadata handling, multi-engine support, update performance characteristics, and ecosystem integration. For most teams, Iceberg or Delta Lake is the right choice — and the decision between them is driven more by your existing tooling than by format capabilities.