

While Snowflake maintains that customers ultimately benefit by using its internal data format, it acknowledges that there are times when the flexibility of an externally defined table will be necessary. Iceberg also delivers better data partioning (Graphic courtesy Apache Iceberg) For these customers, projects such as Apache Iceberg can be especially helpful.” Specifically, some customers have data outside of Snowflake because of hard operational constraints, such as regulatory requirements, or slowly changing technical limitations, such as use of tools that work only on files in a blob store. “Some customers, though, would prefer an open specification table format that is separable from the processing platform because their data may be in many places outside of Snowflake.

“Snowflake was designed from the ground up to offer this functionality, so customers can already get these benefits on Snowflake tables today,” Malone continues. Knowing the table layout, schema, and metadata ahead of time benefits users by offering faster performance (due to better filtering or patitioning), easier schema evolution, the ability to “time travel” across the table, and ACID compliance, Malone writes. According to a January 21 blog post by James Malone, a senior product manager with Snowflake, support for the open Iceberg table format augments Snowflake’s existing support for querying data that resides in external tables, which it added in 2019.Įxternal tables benefit Snowflake users by allowing them explicitly define the schema before the data is queried, as opposed to determining the data schema as it’s being read from the object store, which is how Snowflake traditionally operates. Not to be outdone, Snowflake has also added support for Iceberg. And with time travel capability, you can recover data that was recently deleted using just a SELECT statement.”
BECAUSE THE INTERNET ICEBERG UPDATE
Making manual record corrections can be accomplished via a single UPDATE statement. “Responding to a data erasure request is as simple as issuing a SQL DELETE operation. “Using Athena ACID transactions, you can now make business- and regulatory-driven updates to your data using familiar SQL syntax and without requiring a custom record locking solution,” the cloud giant says. The new service simplifies life for big data users, AWS says. “Built on the Apache Iceberg table format, Athena ACID transactions are compatible with other services and engines such as Amazon EMR and Apache Spark that support the Iceberg table format.” “Athena ACID transactions enables multiple concurrent users to make reliable, row-level modifications to their Amazon S3 data from Athena’s console, API, and ODBC and JDBC drivers,” AWS says in its blog. The new offering, dubbed Amazon Athena ACID transactions, uses Iceberg under the covers to guarantee more reliable data being served from Athena. The open source project won a Datanami Editor’s Choice Award last year.Īt the re:Invent conference in late November, AWS announced a preview of Iceberg working in conjunction with Amazon Athena, its serverless Presto query service. Iceberg also delivered more fine-grained data partitioning and better schema evolution, in addition to atomic consistency. Instead, the data just appears as a regular SQL table. The Java-based Iceberg eliminates the need for developers to build additional constructs in their applications to ensure data consistency in their transactions. Hive was originally built as a distributed SQL store for Hadoop, but in many cases, companies continue to use Hive as a metastore, even though they have stopped using it as a data warehouse.Įngineers at Netflix and Apple developed Iceberg’s table format to ensure that data stored in Parquet, ORC, and Avro formats isn’t corrupted as it’s accessed by multiple users and multiple frameworks, including Hive, Apache Spark, Dremio, Presto, Flink, and others. Customers that use big data cloud services from these vendors stand to benefit from the adoption.Īpache Iceberg emerged as an open source project in 2018 to address longstanding concerns in Apache Hive tables surrounding the correctness and consistency of the data. Apache Iceberg, the table format that ensures consistency and streamlines data partitioning in demanding analytic environments, is being adopted by two of the biggest data providers in the cloud, Snowflake and AWS.
