17 posts tagged with "gravitino"

Apache Gravitino 1.3.0 Release Notes

June 29, 2026 · 7 min read

PMC Member

We are glad to announce the release of Apache Gravitino 1.3.0! This release focuses on logical view management, hierarchical namespaces, AWS Glue catalog and engine connector support, built-in identity provider support, and Iceberg REST Catalog improvements.

It also includes important behavior changes for credential vending, Docker image layout, and Iceberg REST Catalog upgrades, plus broad improvements across authorization, connectors, clients, Web UI, observability, and deployment.

For the complete list of commits and pull requests included in this release, see the full changelog.

Highlights

AWS Glue Catalog and Engine Connector Support

Added AWS Glue Catalog support and Trino/Spark connector adapters, allowing Hive and Iceberg metadata in Glue to be governed and queried through Gravitino.

Unified View Management

Added a unified view definition model and view support across Hive, Iceberg, and Apache Paimon, covering APIs, persistence, Java client support, and Web UI management.

Core Cache and Authorization Consistency

Improved multi-node cache correctness with version-validated authorization caches, entity-change-log tracking, and global cache invalidation.

Enterprise-Grade Iceberg REST Catalog

Added federated Iceberg REST Catalog support, nested multi-level namespaces, vended-credential refresh, freshness-aware table loading, and asynchronous cleanup.

Upgrade Notes / Behavior Changes

Please review these changes before upgrading; they may require deployment configuration updates or may affect existing behavior.

Installation path moved to /opt/gravitino: The Gravitino Docker image home/install path changed from /root/gravitino to /opt/gravitino, and the published Docker images now match the current Dockerfile and Helm defaults. When upgrading a Helm deployment, update env.GRAVITINO_HOME and path references such as extraVolumeMounts, for example log mounts, from /root/gravitino to /opt/gravitino. (#11272, #11312, #11569, #11671)
Iceberg REST Catalog upgrades require a separate database backup: If the IRC service is backed by its own PostgreSQL database, back it up separately before upgrading. The Helm chart does not perform automatic schema migration; back up the database and apply the upgrade SQL scripts manually. (#11093, #11120)
Sensitive catalog properties are hidden from catalog load responses by default: Starting in 1.3.0, credentials such as jdbc-user, jdbc-password, cloud access keys, and similar sensitive catalog properties are excluded from GET /api/metalakes/{metalake}/catalogs/{catalog} responses. Clients and connectors should retrieve them through the credential vending API. For short-term compatibility during migration, gravitino.catalog.credential.backfillToProperties=true can restore the old behavior, but it exposes credentials in catalog properties and should be disabled after clients are upgraded. (#11264, #11554, #11669, #11692, #11741, #11745)
The web-v2 UI is now the default UI. (#11335)
Iceberg REST JDBC catalog now defaults to strict mode, so operations against non-existent namespaces return a 404 rather than silently succeeding. (#11285)
Iceberg table metadata cache is enabled by default, with an increased default capacity. (#11133)
The default Iceberg JDBC schema version is now v1. (#10851)
Iceberg REST config endpoint no longer includes the prefix, per the Iceberg REST spec. (#10640)
Hadoop upgraded from 2.10.2 to 3.3.6, and the legacy hadoop2 dependency line was removed. (#10788)

New Features

Logical View Management: Gravitino now manages logical views as first-class, versioned entities across supported catalogs, with REST APIs, relational persistence, Java client support, Web UI management, and connector support for Hive, Iceberg, Paimon, Spark, and Flink.
Hierarchical Namespaces: Added multi-level nested namespaces in both the core REST server and the Iceberg REST Catalog, enabling deeper catalog structures and more flexible organization for complex business domains.
AWS Glue catalog and engine connectors: Added a new AWS Glue catalog with schema/table CRUD, native Iceberg table support through the Glue SDK and Iceberg SDK, integration tests, and Trino/Spark connector adapters.
Built-in identity provider and local authentication: Added a built-in IdP model, password hashing, user/group/relation storage, REST APIs, and Basic authentication for deployments that do not require an external IdP.
Authorization expansion: Added function authorization, group-aware ownership, group-inherited roles, scoped delegated privilege management with MANAGE_GRANTS, and stronger cache invalidation for authorization state.
Iceberg REST Catalog enhancements: Added asynchronous hard-deletion cleanup, vended-credential refresh for S3/GCS/OSS/ADLS, registerTable credential refresh, ETag-based freshness-aware table loading, federation handling improvements, and REST backend support.
Python Client Enhancements: Added authorization management, metadata-object statistics operations, and relational catalog support to the Python client.
Trino Connector Enhancements: Added CREATE TABLE AS SELECT, UDF adaptation, session-credential forwarding, Iceberg snapshot maintenance procedures, and multi-version integration test coverage.
Flink Connector Enhancements: Added view support for Iceberg and Paimon catalogs and support for Flink 1.19 and 1.20.
Operational Health and Audit Logs: Added health-check endpoints for Gravitino and IRC, plus a JSON formatter for audit logs.
New Hologres JDBC catalog: Added schema and table operations, a frontend, and integration tests for governing Alibaba Cloud Hologres.

Selected Improvements

Core server: Cache consistency was strengthened with entity-change-log polling, CatalogManager cache invalidation, retention cleanup, fuller v2 audit operation mapping, and safer schema/table update validation.
Authorization: Performance and correctness improved through JCasbin cache work, per-request group-owner caching, batch owner lookup, and credential-vending support across Iceberg, Paimon, JDBC, and MySQL connectors.
Iceberg & Iceberg REST Catalog: Server-side Iceberg 1.11.0, backend HTTP timeout configs, federation refactoring, and client io-impl inference from table location.
Catalog & connectors: Glue complex types, Spark Glue support, Hive view support, Paimon view CRUD, Lance dependency upgrades, and reduced Hive Metastore catalog package size.
Web UI: The web-v2 UI became the default and added support for Glue, hierarchical schemas, and relational catalog views.
Table Maintenance Service: Refined the existing optimizer/table-maintenance workflow with documentation, build, dependency, module-structure, and test/CI cleanup.
Dependency & build: Hadoop 3.3.6, Log4j 2.25.4, PostgreSQL JDBC 42.7.11, H2 2.2.224, Hive runtime classpath cleanup, branch-1.3 cherry-pick automation, CI acceleration, JaCoCo PR reporting, Helm chart unit tests, and Helm OCI publishing.

Notable Bug Fixes

Catalogs and connectors: Fixed Glue catalog edge cases, Trino catalog rollback/drop behavior, Trino OAuth2 and remote IRC table handling, Spark PostgreSQL timestamp handling, Flink/Paimon schema and partition handling, Hive metastore hostname/ClassLoader/Kerberos failures, ClickHouse issues, JDBC datetime filtering, Delta table restart metadata, timestamp time-zone consistency, and mixed-case table-name behavior.
Iceberg REST Catalog: Fixed hierarchical namespace drop handling, unsupported hierarchical-schema returns, authorization skip-check errors, connection pool shutdowns, remote IRC create view/table failures, staged create failures with authorization or credential vending, and AWS credential loading for Trino remote IRC.
Authorization and authentication: Fixed multi-admin IdP initialization, stale role bindings after privilege revocation, slow listCatalogs under authorization, table creation with authorization enabled, view authorization/cache behavior, owner-based SELECT checks, first-attempt new-user authorization failures, long-running query reauthorization, OAuth2 token refresh errors, occasional owner-setting failures, and Tag Manager metadata-object lookup errors.
Audit, observability, and health: Fixed audit timestamp precision, internal cross-server audit attribution, audit file/list/client-IP/formatter gaps, Hadoop metrics scheduler cleanup, and /health.html alias handling.
Web UI: Fixed unsupported-view error popups, relational view listing against unsupported catalogs, table/view navigation 404s, copy SQL behavior, view SQL display, bucketed Iceberg table editing, service-admin metalake creation button loading, and invalid metalake enable/disable switches.
Core and Storage: Fixed JDBC strict namespace handling, orphaned schema cleanup, table cache config, fileset credential NPEs, S3 fileset macOS bundle/runtime handling, GVFS long-running write failures, and WebUIFilter NPEs.
Deployment and Runtime: Fixed Docker image build inputs, published Docker image install path, Iceberg REST Docker environment mappings, Helm startup command, and JDK 8 client behavior.
Lance: Fixed declared/materialized table reporting, Web UI column display, runtime dependency size, and purgeTable handling.

Acknowledgements

Thanks to everyone who contributed to the 1.3.0 work — code, reviews, tests, issue triage, design, and feedback.

A0R0P0I7T, a19920714liou, a638011, Abhijeetsng97, Abyss-lord, Aditi102005, ajw711, AlexGritA, AmitaWhite, anfebladi, arjnklc, babumahesh, bbiiaaoo, bharos, ChisomUma, chl-wxp, danhuawang, dennismdejong, diqiu50, FANNG1, flaming-archer, freesinger, gada121982, gauravrudragit, geniusjun, geyanggang, griffonbyte, hdygxsj, hobostay, Jalina2007, JandyTenedora, jarredhj0214, jerryshao, JoegenUSTC, kdyann, lasdf1234, laserninja, LauraXia123, lhjchn, Lucas61000, LuciferYang, LukasDEDD, markhoerth, mchades, mehakmeet, nikitanagar08, ningsh7, Octavi00, pandeysambhi, paultanay, pithecuse527, puchengy, pythaac, qqqttt123, raboof, rameshreddy-adutla, raushanprabhakar1, robertsilen, romanhorilyi, roryqi, Roshan1299, sachinnn99, sekikn, sgedward, shunki-fujita, sunyuhan1998, tanya0793, Tarantula471, Thakkar-Khushang, TimothyDing, tsungchih, Victory-ET, wangxiaojing, whua3, xxubai, YuF-9468, yunhwane, yuqi1129, yuw1, zhoukangcn

_{Apache, Apache Flink, Apache Hive, Apache Hudi, Apache Iceberg, Apache Ranger, Apache Spark, Apache Paimon and Apache Gravitino are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.}

Apache Gravitino 1.2.1

May 12, 2026 · 3 min read

Qi Yu

PMC Member

We are glad to announce the release of Apache Gravitino 1.2.1! This is a patch release that focuses on stability, correctness, and performance improvements. It includes bug fixes across the core server, authorization, Iceberg REST, Trino connector, Hive, and various catalog implementations.

Improvements

Core & Server

Performance: Improve the performance of metadata object retrieval. #10459
Batch operations: Support batch get owner for bundle of metadata objects. #9518
Owner management: Clean zombie VIEW owner relations when deleting schema or catalog. #10458

Authorization

Performance: Optimize JCasbin policy lookup for improved authorization performance when roles with large grants. #10907

Iceberg REST Catalog

Spec compliance: Remove prefix from config endpoint per Iceberg REST spec. #10640
Schema version: Change default Iceberg JDBC schema version to v1. #10851
Error handling: Return JSON error body instead of HTML for pre-JAX-RS errors. #10667

Catalogs & Connectors

Hive: Shrink the package size of Hive Metastore 2 and Hive Metastore 3 catalogs. #10457
Paimon: Exclude unnecessary tomcat-embed-core transitive dependency from Paimon catalog. #10524
Lance: Clean up redundant code in GravitinoLanceTableOperations#createTable. #10496

Helm & Charts

Launch script: Use start-gravitino.sh instead of gravitino.sh to launch the service. #10557
Kubernetes: Add serviceAccountName to pod spec in deployment.yaml. #10573

Build

Package size: Remove testing jars from release package. #10331
Module structure: Rename updater module to updaters for consistency. #10452
Build tooling: Remove release task and centralize JDK 8 compatibility handling. #10262
GitHub Actions: Update actions/upload-artifact from v4 to v7. #10646

Bug Fixes

Core & Server

Fix batchSelect queries missing version-info JOIN and field aliases. #10444
Fix missing partition path in getPartition error messages. #10175
Fix rollback masking the original post-hook exception. #10217
Fix missing @Param("metalakeId") in GroupRoleRelMapper.softDeleteGroupRoleRelByMetalakeId. #10657
Avoid blocking dropCatalog on imported schemas. #10737
Default ifExists to true when deleting a table index. #10380
Close leaked EntityStore lifecycle to fix test flakiness. #10700
Harden retention count assertions in TestFunctionMetaService and TestFilesetMetaService. #10700

Trino Connector

Fix TrinoException when using catalog name with metalake. #10717

Iceberg REST Catalog

Fix table scan planning failure due to incorrect SQL generation. #10841
Fix IcebergTableHookDispatcher failure with NoSuchTableException for staged creates. #10766

Hive

Fix Hive Metastore OOM by closing HiveClientFactory on pool shutdown. #10844
Fix concurrent keytab symlink creation race condition in Hive Kerberos authentication. #10741

ClickHouse

Fix ClickHouse alter-table bugs including autoIncrement issues. #10381

Web UI

Preserve hidden properties when editing a catalog in web-v2. #10837

MCP Server

Fix path encoding issues in MCP REST client for special characters. #10799

Common

Lazy-initialize Version to fix TestVersion without jar task. #10762
Block H2 JDBC URL and driver in catalog datasource creation. #10801

CI & Infrastructure

Pin localstack docker image version to 4.14.0 to fix CI problem. #10527
Pin docker/* GitHub Actions to SHA-based references (v4.0.0) to comply with ASF policy. #10504

Documentation

Clarify verified JDBC compatibility matrix for ClickHouse. #10308

Acknowledgements

Thanks to everyone who contributed to the 1.2.1 work — code, reviews, tests, issue triage, design, and feedback.

babumahesh, bharos, danhuawang, diqiu50, FANNG1, geyanggang, hdygxsj, jerryshao, laserninja, mchades, pandeysambhi, pythaac, roryqi, sachinnn99, yuw1, yuqi1129

Apache Gravitino 1.1.1

March 31, 2026 · 4 min read

Qi Yu

PMC Member

We are glad to announce the release of Apache Gravitino 1.1.1! This is a patch release that focuses on stability, correctness, and performance improvements. It includes bug fixes across the core server, authorization, Iceberg REST, Spark connector, OAuth, and various catalog implementations.

Improvements

Core & Server

Cache: Cache non-existent relational data to avoid repeated backend lookups on missing entities. #9799
Performance: Reduce the number of catalogInUse calls on the server hot path. #9474
Metadata API: Support batchGet for metadata objects to reduce round-trips. #9893
Managed entities: Include lakehouse-generic catalogs in managed entities for proper drop behavior. #9490
Error handling: Preserve the post-hook exception when rollback fails, so the original error is not swallowed. #10217
Partition API: Include partition path parameter in getPartition error messages for easier debugging. #10175

Authorization

Performance: Avoid authorization-plugin overhead when the plugin is not configured. #9170
Performance: Convert Jcasbin internal map to a cache to speed up permission lookups. #9770
Batch authorization: Support preloading table metadata in batch metadata authorization. #9802

Iceberg REST Server

Performance: Improve the performance of loading tables by reducing redundant HMS calls. #9765
Cache: Optimize the catalog wrapper and entity cache expiry strategy. #9782
Rename: Support renaming a table across different namespaces in the Gravitino Iceberg catalog. #9571

Lance REST Server

Column operations: Support drop and rename column for Lance tables. #9113
Empty table: Refine the concept of createEmptyTable in Lance REST for clearer semantics. #9520
Statistics: Add maxStatisticsPerUpdate configuration for Lance partition storage. #9650
Helm: Add a complete Lance REST server Helm chart. #9403

Common

Version parsing: Enhance version parsing to support release candidate tags with validation. #9482
Code quality: Refactor to reduce duplicated code across modules. #9294
Logging: Update log4j2 configuration for Iceberg/Lance REST servers. #9547
Build: Add MCP-server changes handling in build workflow. #9921
Build: Remove release task and centralize JDK 8 compatibility handling. #10262

Bug Fixes

Core & Server

Fix loading table failure caused by incorrect SQL in the fetch-column-info query. #10034
Fix tag association problem that caused tags to be incorrectly linked. #9635
Fix credential issue for filesets with multiple locations. #9500
Fix equals and hashCode missing from Policy.java, causing incorrect policy comparison. #10009
Default ifExists to true when deleting a table index to prevent spurious errors. #10380

Authorization

Fix NoSuchEntityException caused by schema entity not being imported before authorization checks. #10055
Fix schema import to avoid setOwner failures when the schema had not been ingested. #9809
Fix PassThroughAuthorizer user verification logic that incorrectly rejected valid users. #9616

Iceberg

Fix wrong namespaces returned when listing tables or views in multi-level namespace configurations. #10397
Fix URL decoding of table names in Iceberg REST server request paths. #9936
Fix authorization decode issue for table names containing special characters. #9936
Fix migrate procedure by preserving the stageCreate flag. #9666

OAuth

Allow JWKS validators to operate without serverUri or tokenPath being mandatory. #9713

Catalogs

Fix altering the JDBC catalog column default value problem. #9816
Fix UnsupportedOperationException when updating aliases for a model version created without aliases. #9727

Hive

Perform proper resource cleanup in HiveClientPool.close() to prevent connection leaks. #9581

Lance REST

Handle null mode in registerTableRequest to prevent NPE. #9512

Spark Connector

Fix No SLF4J providers warning/error in Spark connector 3.3. #6906

CI & Infrastructure

Pin all docker/* GitHub Actions to SHA-based references (v4.0.0) to comply with ASF policy. #10502
Fix Python CI pipeline failures due to runner image upgrade. #9919
Fix Docker container startup failures due to GitHub CI runner image upgrade. #9990
Fix MCP-server fastmcp version to avoid breaking CI changes from 3.0.x. #10035
Fix UV CI pipeline. HOTFIX
Fix JDK8 compatibility issues across modules. #10373

Documentation

Add documents about the Flink catalog name limitation. #9973
Update OAuth documentation to clarify correct version endpoints for Azure authentication. #9868
Fix the incorrect curl command in the migration guide for set the owner. #10041
Add documentation for docker run command in the Hive section. #9876
Improve lakehouse-paimon-catalog documentation. #9957
Add a guide for Lance REST integration with Spark and Ray. #9622
Add REST catalog backend documentation for Iceberg REST Catalog (IRC). MINOR
Add warehouse documentation for the REST catalog backend in Iceberg. MINOR
Add missing DescribeTable endpoint to lance-rest-service.md. #9662

Acknowledgements

Thanks to everyone who contributed to the 1.1.1 work — code, reviews, tests, issue triage, design, and feedback.

@FANNG1, @agnes-xinyi-lu, @bharos, @chl-wxp, @danhuawang, @echonesis, @jerryshao, @joeyutong, @mchades, @pandeysambhi, @pythaac, @qqqttt123, @roryqi, @tedyu, @yuqi1129

Apache Gravitino 1.2.0

March 13, 2026 · 8 min read

Hui Yu

committer

Apache Gravitino 1.2.0 has been released! This release introduces major new capabilities including a new Table Maintenance Service (TMS), a new ClickHouse catalog, end-to-end UDF management, authorization for Iceberg view operation, a redesigned Web UI, and broad improvements across connectors, authorization, and clients.

Release Date: 2026-03-13 Previous Version: 1.1.0 (2025-12-16)

Highlights

Table Maintenance Service (TMS) foundational framework — Data platforms can now shift from reactive firefighting to proactive table health. Gravitino analyzes your tables and automatically schedules the right maintenance operations, at the right times.
ClickHouse catalog — Teams running real-time analytics on ClickHouse can now govern it alongside their lakehouse, with one metadata layer for both streaming and batch workloads.
Scan planning offload — Query engines like DuckDB and Spark can now offload scan planning to Gravitino's IRC server, reducing query latency and client-side complexity — making Gravitino a more capable catalog server for the growing Iceberg ecosystem.
Ecosystem reach — Multi-version Trino connector (435–478), Flink user authentication, and multi-cluster fileset support and Web UI v2 broaden Gravitino's integration surface across the modern data stack.

New Features

Table Maintenance Service (TMS) #9546, #9652, #9653, #9654, #9983, #9984, #9985, #9986, #10096, #10097, #10098, #10140, #9541, #9543

Maintaining a healthy lakehouse table requires ongoing maintenance work — compaction, rewriting data files, and expiring snapshot cleanup. Gravitino 1.2.0 introduces the Table Maintenance Service (TMS), which analyzes table statistics and automatically schedules the right maintenance operations at the right time. This allows data platforms to move from reactive firefighting to proactive table health management.
ClickHouse Catalog #9738, #9754, #9755, #9756, #9820, #9865

ClickHouse is widely adopted for large-scale real-time analytics. Gravitino 1.2.0 introduces a full-featured ClickHouse catalog with full DDL support including distributed and partitioned cluster modes.
User-Defined Function (UDF) Management #9525, #9527, #9528, #9529, #9530, #9531, #9532, #9561

Teams can now centrally register, update, and govern UDFs, and Spark can automatically discover and invoke these functions through Gravitino's catalog interface. Gravitino 1.2.0 introduces end-to-end UDF management: Java API, server-side REST interface, relational storage backend, Java client support, Spark FunctionCatalog integration, Python client, Web UI management for visual browsing/creation, and complete documentation with OpenAPI specifications.
Iceberg REST Catalog: Support scan planning cache #9048

Adding a cache for scan planning improves performance by caching the results of the scan planning phase, which determines which data files need to be read for a query. Since scan planning involves reading and parsing Iceberg metadata and manifest files, caching avoids repeating this expensive work for similar queries. As a result, it reduces metadata I/O, lowers query latency, and decreases load on object storage and the planning service.
Authorization support for Iceberg REST catalog view operations #9744, #9745, #9746, #9747, #9915

Views are an important abstraction for data access control but have long been a governance blind spot. Gravitino 1.2.0 manages Iceberg views as first-class entities, introducing view management, generic storage, view-level permissions, and full authorization support for IRC view operations. This enables teams to govern views alongside tables and apply fine-grained access control policies to view access while maintaining consistent enforcement of underlying table permissions.
Generic Lakehouse Catalog: Delta Lake External Table Support #9647

Delta Lake is one of the most widely deployed open table formats. Gravitino now supports registering and managing external Delta tables through the generic lakehouse catalog, enabling unified governance for Delta alongside Iceberg, Hudi, and other formats without data migration. This allows organizations to manage Delta Lake workloads through an open catalog architecture, reducing dependence on proprietary catalog services such as Unity Catalog.
Trino Connector Multi-version Support #9718, #9719, #9894, #9952, #9961, #9964, #10091

Gravitino now supports Trino versions 435 through 478, with a connector architecture designed to maintain compatibility with future Trino releases.
Multi-cluster Fileset Support #9568, #9312
Production data platforms often span multiple storage clusters. Through GVFS multi-cluster filesets, Gravitino now allows a single fileset to reference storage locations across different clusters.
Flink Connector: User Authentication #9564
The Gravitino Flink connector now supports user authentication, enabling secure access to Gravitino-managed metadata from Flink jobs.
Web UI v2 Reconstruction #9758 The Gravitino Web UI now supports managing ClickHouse catalogs and UDFs, along with improved views for tags, policies, and task templates. The updated interface provides a more streamlined experience for navigating governance resources. Web v1 remains available during the transition period.

Improvements

Core & Server

Block non-cascading schema deletions when topics still exist (#9078)
Cache non-existent relational data to avoid repeated lookups (#9799)
Optimize in-use status checks for catalogs and metalakes (#9586)
Improve authorization performance by skipping overhead when plugin is empty (#9170)
Support renaming tables to a different schema in ManagedTableOperations (#9477)
Include generic lakehouse catalogs in managed entities for proper drop behavior (#9490)
Add JDBC storage backend for partition statistics (#9838)
Make gravitino-api and related dependencies compileOnly in catalog modules (#10195)
Optimize JDBC driver deregistration logic to avoid possible OOM (#10253)

Authorization

Convert JCasbin internal map to a cache for improved performance (#9770)
Support preloading table metadata in batch for authorization checks (#9802)
Rename model privilege names to follow operation name conventions (#9381)
Support overriding privileges for roles (#9269)
Fileset supports credential vending with correct access privileges (#9506)

Iceberg REST Catalog (IRC)

Upgrade Apache Iceberg to 1.10.1 (#9989)
Optimize IRC catalog wrapper and entity cache expiry strategy (#9782)
Improve IRC table load performance under high concurrency with authorization enabled (#9765)
IRC uses internal catalog fetcher instead of HTTP interface for better performance (#9825)
IRC timely expires catalog wrapper cache (#9966)
Improve table load performance by tuning the cooperation between Iceberg service and Gravitino server (#9277)
Cross-Namespace Table Rename: Gravitino Iceberg catalog now supports renaming tables across different namespaces (#9517)

Catalogs & Connectors

Support Paimon REST backend (#9791)
Support hash distribution in Paimon catalog (#9731)
Refactor Hive and Hudi catalogs to use the shared HiveClient (#9459)
Add support to skip catalog in Trino connector (#9492)
Flink connector supports generic Hive tables (#9504)
Spark connector supports TableWritePrivilege for Spark 3.5+ authorization (#10181)
Upgrade Kyuubi Hive connector from 1.10 to 1.11 (#10040)

OAuth & Authentication

Support JWT tokens with multiple audiences in the aud claim (#9733)
Support regex for user principal mapping in OAuth (#9767)
Allow JWKS validators to work without serverUri or tokenPath (#9713)
Enhance version parsing to support release candidate version strings (#9482)

Clients

Add configurable HTTP connection pool settings for Gravitino client (#9468)
Allow disabling client-server version check via environment variable (#9760)

Lance REST Service

Add Helm chart for deploying Lance REST server as a standalone service on Kubernetes (#9403)
Add dataset version tracking in Lance REST loadTable and createTable (#9792)
Add documentation for Lance REST server setup and Spark/Ray integration (#9169, #9622)
Refine createEmptyTable semantics in Lance REST (#9520)

Web UI

Add global banner to guide users to Web UI v2 (#9996)
Support ClickHouse catalog management in Web UI v2 (#9865)
Support associated roles view for tags, policies, and job templates (#9807)

Bug Fixes

Fix credential vending for filesets with multiple storage locations (#9500)
Fix IRC connection failure after idle timeout (#9383)
Fix tag association cache inconsistency (#9635)
Fix Iceberg migrate procedure failing with "table already exists" for JDBC catalogs (#9666)
Fix clearing column comments in MySQL table ALTER operations (#9694)
Fix UnsupportedOperationException when updating model version aliases (#9727)
Fix table loading failure due to incorrect SQL in fetch column info (#10034)
Fix NoSuchEntityException in authorization when schema entity is not imported (#10055)
Fix owner assignment failure due to unimported schema entity (#9809)
Fix Hive SerDe incompatibility between Gravitino Flink connector and native Flink client (#9508)
Fix JDBC catalog column default value alteration (#9816)
Fix Trino connector distribution to include JARs for all supported versions (#10139)
Fix TMS built-in rewrite adapter resolution by template name (#10311)
Fix JDBC catalog pool size properties not being loaded on catalog creation (#10284)
Fix SLF4J provider not found error in Spark connector 3.3 (#6906)
Fix PassThroughAuthorizer user verification logic blocking new user creation (#9616)
Fix IRC URL decoding for table names containing special characters (#9936)
Fix lazy authorization configuration check for IRC to allow server startup (#9247)
Fix maxStatisticsPerUpdate configuration for Lance partition storage (#9650)

Breaking Changes

Dropped Python 3.9 Support — Python 3.9 has reached end of life. The Gravitino Python client no longer supports this version. Users need to upgrade to Python 3.10 or higher. (#10011)

Acknowledgements

Thanks to everyone who contributed to the 1.2.0 work — code, reviews, tests, issue triage, design, and feedback. Below is a consolidated list of contributor GitHub IDs extracted from issue and PR activity.

Apache Gravitino - 2025 Summary

January 5, 2026 · 6 min read

Introduction

2025 was a landmark year for Apache Gravitino. The project not only graduated as a Top-Level Project (TLP) but also reached its first major stable release, version 1.0.0. Throughout the year, the community focused heavily on "Contextual Engineering" and "AI-native" metadata management, introducing groundbreaking features like the Model Context Protocol (MCP) server, the Lance REST service, and a metadata-driven action system. This article summarizes the milestones and achievements of Apache Gravitino in 2025.

Timeline

Apache Gravitino officially graduated as an Apache Top-Level Project on June 3, 2025, marking a significant maturity milestone.

In 2025, the community released several key versions, including the major 1.0.0 release and significant feature updates in 0.8.0-incubating, 0.9.0-incubating, and 1.1.0.

2025.01.24: Version 0.8.0-incubating released
- Focused on strengthening AI support with the introduction of the Model Catalog.
- Introduced credential vending for Filesets and new connectors for Flink (Iceberg/Paimon).
2025.05.07: Version 0.9.0-incubating released
- Enhanced data governance with a new Data Lineage interface (OpenLineage compliant).
- Added gcli script for better CLI experience and improved security with privilege refinements.
2025.09.24: Version 1.0.0 released
- The first stable major release, themed "From Metadata Management to Contextual Engineering."
- Introduced the Metadata-driven Action System (including Statistics, Policies, and Jobs).
- Launched the MCP (Model Context Protocol) Server, enabling AI Agents/LLMs to interact directly with metadata.
- Implemented unified Role-Based Access Control (RBAC) across catalogs.
2025.11.20: Version 1.0.1 released
- A stability release featuring smarter job templates and improved Python client support.
2025.12.19: Version 1.1.0 released
- Added the Lance REST service to support vector data for AI workloads.
- Introduced a Generic Lakehouse Catalog and support for Hive 3 and multi-cluster HDFS filesets.
- Hardened security for the Iceberg REST service.

Key Features & Improvements

In 2025, Gravitino evolved from a unified catalog to an active metadata control plane. Key technical achievements include:

AI & LLM Integration: The project positioned itself as an AI-native catalog by introducing the Model Catalog for managing ML models and the MCP Server to connect AI agents with data context. The addition of the Lance REST service in v1.1.0 further solidified support for vector datasets.
Metadata-Driven Actions: A new framework allowing users to define policies (e.g., TTL, compaction) and execute jobs based on metadata, moving beyond passive metadata storage.
Unified Governance & Security: Full implementation of RBAC, credential vending for secure data access (S3/GCS/ADLS), and a unified authentication flow for Iceberg REST services.
Ecosystem Expansion: Broadened support with new connectors (Generic Lakehouse, Hive 3, Flink, Paimon) and enhancements to the GVFS (Gravitino Virtual File System) for unified file management.

Community

The Apache Gravitino community saw explosive growth in 2025, evolving from an incubator project into a Top-Level Project (TLP) backed by a rapidly expanding global ecosystem.

Top-Level Graduation: On June 3, 2025, the project officially graduated to an Apache Top-Level Project, a major milestone marking its maturity in community health, vendor-neutral governance, and production readiness.
Community Growth (Year-over-Year):
- Engagement: GitHub stars increased by over 130%, ending the year above 2,600. Forks grew by approximately 150%, reflecting a surge in community-led integrations and local developments.
- Contributor Base: The active developer community expanded by nearly 100%. Recent major releases, such as version 1.1.0, featured contributions from 40+ unique developers representing a wide variety of global organizations.
- Development Velocity: Development pace accelerated significantly, with code commits reaching a lifetime total of over 3,300 commits.
- Post-Graduation Committer Growth: July 7, 2025: Chenxi Pan was added as Committers. December 15, 2025: Junda Yang and Yangyang Zhong were added as Committers.
Global Presence: The project established itself as the standard for federated metadata through featured presentations at Community Over Code (NA & Asia) and QCon Shanghai, gathering critical production feedback from global data engineering teams to shape the future roadmap.

Industry Trends in Metadata Management (2026)

Breaking Lakehouse Silos: As organizations adopt multiple "open" table formats, the risk of "format lock-in" has replaced "vendor lock-in." The trend is toward Universal Lakehouse architectures that provide a single entry point for fragmented data silos.
The Multimodal AI Explosion: AI workloads are moving beyond tabular data to include massive volumes of unstructured assets (images, video, audio). Traditional data stacks are being replaced by AI-Native Multimodal Stacks that can process complex data types with the same governance as SQL tables.
Emergence of Data Agents: AI Agents are becoming the primary consumers of data. These agents require "Context Engineering"—a way to use metadata as an external brain to discover, understand, and act upon data autonomously.
Escalating AI Security Risks: The high-speed nature of AI interactions makes traditional static security (RBAC) obsolete. The industry is moving toward Identity-Centric Zero Trust and Fine-Grained ABAC to prevent data leakage and ensure model safety.

Future Work

1. Universal Lakehouse & Format Interoperability

To solve the data silo problem, Gravitino is expanding its reach to provide a unified management layer for the modern Lakehouse.

Multi-Format Support: We will provide first-class support for Apache Iceberg, Delta Lake, Hudi, and Paimon. By acting as a "Catalog of Catalogs," Gravitino allows users to manage multiple formats through a single interface, significantly reducing vendor lock-in and simplifying cross-format governance.

2. Multimodal Data Stack for the AI Era

Gravitino is evolving to empower a new generation of AI-native data stacks.

Ecosystem Integration: We will focus on deep integration with AI-centric engines like Daft, Ray, and Lance.
Empowering New Scenarios: By providing a unified metadata layer for these engines, Gravitino allows users to "reuse" existing data governance capabilities—like auditing and access control—for modern multimodal scenarios, giving the new AI data stack enterprise-grade maturity from day one.

3. Data Agent Orchestration (Metadata as the "Brain")

Gravitino will serve as the cognitive foundation for autonomous Data Agents.

MCP Server & Action System: Leveraging the Model Context Protocol (MCP) and our Metadata Action System, we are exploring scenario-based capabilities for Data Agents. This allows an AI agent to not only "see" the data but also "act" on it—such as performing a schema update or triggering a compaction job—using metadata as its reasoning context.

4. Advanced Security: KMS & ABAC

As security threats become more sophisticated in the AI era, Gravitino is implementing more granular and automated security controls.

ABAC (Attribute-Based Access Control): We will implement an ABAC engine to enable fine-grained permissions. This allows access decisions to be made based on dynamic tags (e.g., Sensitivity=High) and environmental context rather than just static roles.
KMS & Credential Management: To protect data-at-rest and in-transit, we are integrating with Key Management Services (KMS) .

Apache Gravitino 1.1.0 - An AI-native metadata management platform

December 16, 2025 · 6 min read

Qi Yu

PMC Member

We are glad to announce the release of Apache Gravitino 1.1.0! This release builds upon the solid foundation laid by Apache Gravitino 1.0.0, introducing a range of new features, improvements, and bug fixes that enhance the platform's capabilities, performance, and security.

Highlights

Broader catalog support (initial Lance REST service, a reusable lakehouse-generic catalog, and Hive3) to simplify integration with diverse lakehouse deployments.
Stronger metadata-level authorization and security hardening for the Iceberg REST surface.
Multi-cluster fileset support and Python client improvements for real-world multi-region and migration workflows.
Stability, performance and observability work across the entity-store, caches, scan planning, connectors and CI — reducing operational friction and test flakiness.

New Features

Built for the Future of AI Data: Lance REST service. #8889

As AI and ML workflows become central to data platforms, efficient access to vector data is crucial. The new Lance REST service exposes Lance datasets through a managed HTTP interface. This allows remote clients—such as inference services or notebooks—to access vector data with the high performance of the Lance format, all while adhering to Apache Gravitino's centralized security and governance policies.

Generic lakehouse catalog. #8828

The lakehouse ecosystem is diverse and rapidly evolving, with new table formats and engines emerging frequently. To keep pace, we introduced a generic lakehouse catalog framework. This abstraction reduces the boilerplate code required to integrate new engines, standardizing how capabilities are negotiated and how namespaces are handled. This means faster support for new formats and a more consistent experience for developers and users alike.

Access control for Iceberg REST service. #4290

The Iceberg REST catalog is becoming the standard for open table access, but production use demands robust security. We have hardened the Iceberg REST service with comprehensive authentication and authorization checks. This ensures that data accessed via standard Iceberg clients is fully protected, making Apache Gravitino a secure choice for multi-tenant and public-facing data lake deployments.

Hive 3 catalog support. #5912

Many enterprises still rely on Hive 3 for their core data infrastructure, making migration a risky and complex endeavor. This feature allows users to register existing Hive 3 metastores directly as Apache Gravitino catalogs. By doing so, organizations can instantly bring their legacy data under Apache Gravitino's unified governance and management umbrella without moving data or disrupting existing workloads, paving the way for a smoother transition to modern lakehouse architectures.

Multiple HDFS clusters support. #9117, #9288

In large-scale production environments, data is often distributed across multiple HDFS clusters to ensure isolation and disaster recovery. Previously, Apache Gravitino was limited in how it handled these complex topologies. With this release, users can manage filesets across multiple HDFS clusters within a single Apache Gravitino instance. This capability simplifies cross-cluster data management, improves resource isolation, and provides greater flexibility for multi-tenant architectures.

Metadata authorization for IRC, statistics, tags, jobs, and policies. #4361, #8752, #8944, #8943

True governance requires securing every aspect of the metadata platform. We have expanded fine-grained authorization to cover auxiliary resources like tags, statistics, and background jobs. This enhancement closes previous security gaps, ensuring that all user interactions with the system—whether viewing statistics or managing tags—are strictly governed by least-privilege policies.

New Iceberg REST endpoints. #6336

To support the full range of capabilities expected by modern analytics tools, we have implemented additional endpoints from the Iceberg REST specification. This improves compatibility with the latest query engines and clients, ensuring that users can leverage advanced planning and catalog operations without running into compatibility issues.

Improvements

Core & Server

Entity store and Cache: Fixed several performance and logic issues to improve stability and speed. #8697, #8743, #8815, #8817, #8710, #9148, #7916, #8546
Metrics: Expose more metrics for server and catalogs to enhance observability. #8594
Authorization: Refined permission checks. #7942.
Resource management: Improved resource release and closure mechanisms to prevent leaks. #8981, #9002, #8999
JDBC metric store: Support storing Iceberg metrics in JDBC. #8899
Job system enhancement: Support job alteration. #8638, #8814

Catalogs & Connectors

Iceberg catalog: Support metadata cache. #8314
Upgrade Iceberg to 1.10.0 to support scan planning. #9046
Improve dynamic config provider for better usability. #8970
Fileset catalog: Prevented filesystem instances from hanging for a long time. #9280
Trino connector: Support SQL UPDATE/DELETE/MERGE. #8241
Fix getTableStatistics in GravitinoMetadata. #9100

Clients

GVFS client: Improved stability and error handling. #8752, #8882, #8948, #8953.
Fileset bundle JARs: Refactored for a more detailed delivery strategy. #9106
Python client: Added support for relational catalog. #5198

Developer Experience & Operations

Helm chart: Enhanced configuration options and stability. #8747, #8174
GitHub templates: Added templates to support AI coding. #9227.
Tests: Refactoring and enhancement of test suites. #9223, #9107
Docker: Changed Apache Gravitino Docker base image. #8817
Code Style: Upgrade Google Java Format to support JDK 17. #8792.

Frontend Updates

Added pagination for files list. #8987
Displayed the index type in UI. #6997
Upgraded dependabot affected versions. #9357
Fixed routing issue where path '/' may not route to 'metalakes'. #9354

Bug Fixes

Create topic encounters NoSuchTopicException when Kafka is deployed with 3 brokers on EKS. #4168
Apache Gravitino IRC server returns java.lang.NoSuchMethodError: void org.apache.hadoop.security.HadoopKerberosName.setRuleMechanism. #8754
Several bugs in SQL provider. #8659, #9166
Unknown error when using fsspec through JNI. #8858

Still, there are many bug fixes that have not been listed due to limited space. Please refer to the full list of issues and pull requests merged since the 1.0.0 release for more details.

Acknowledgements

Thanks to everyone who contributed to the 1.1.0 work — code, reviews, tests, issue triage, design, and feedback. Below is a consolidated list of contributor GitHub IDs extracted from issue and PR activity.

Apache Gravitino 1.0.1 - Release Notes

November 14, 2025 · 3 min read

Minghuang Li

PMC Member

We are pleased to announce the release of Gravitino 1.0.1. This version introduces comprehensive support for job template alterations, along with significant improvements and bug fixes across the core engine, various catalogs, and clients.

Major Features & Improvements

Job and Job Template

Supports altering job templates. #8638, #8639, #8781, #8783, #8640, #8641, #8642
Supports placeholders for all job template fields. #8865
Supports running Spark jobs in the local environment. #7962

Gravitino Core

Refactored tag operations by leveraging the entity store's relation operations. #7916
Made several optimizations to the Caffeine cache, including adjusting weight values, resolving a performance issue with reverseIndex, and prioritizing the eviction of tags and policies when the cache is full, and so on. #8697, #8743, #8815, #8871, #8937

Catalogs

Kafka: Fixed an issue where topic creation was asynchronous, ensuring the operation is now synchronous. #4168
Iceberg: Fixed a failure in starting the Iceberg REST server within a Docker environment. #8733
Doris, StarRocks, PostgreSQL: Fixed incorrect parsing of column default values and types for these data sources. #8277

Python Client

Added metadata objects to the Python client. #8627
Fixed an incorrect credential URL and a fileset test issue on GCS. #8935, #8969

Authorization

Authorization is supported for the testCatalogConnection operation. #7893

Web UI

Fixed an issue with reconfiguring submission parameters when creating a catalog. #8694
Added pagination support for the fileset file list. #8987

Bug Fixes

Fixed a Null Pointer Exception (NPE) in TableFormat.java when a user has no roles. #8202
Corrected exception handling in the setPolicy operation. #8661
Fixed missing policy operations in the OpenAPI entry point. #8706
Fixed a build failure in the gvfs-fuse module. #8830
Fixed an issue where the hard deletion of statistics would fail. #9038
Corrected index names for statistics and job names in the database upgrade script. #8979
Fixed deletePolicyAndVersionMetasByLegacyTimeline error. #9031
Fixed role didn't update when the table is deleted. #8824

Credits

We would like to thank the following contributors for their valuable contributions to this release:

@dyrnq @yuqi1129 @LauraXia123 @jerryshao @danhuawang @playasim @keepConcentration @KayMas2808 @jerqi @mchades @HugoSalaDev @FANNG1 @diqiu50 @hdygxsj @tsungchih

Apache Gravitino 1.0.0 - From Metadata Management to Contextual Engineering

September 24, 2025 · 8 min read

Jerry Shao

PMC Member

Apache Gravitino was designed from day one to provide a unified framework for metadata management across heterogeneous sources, regions, and clouds—what we define as the metadata lake (or metalake). Throughout its evolution, Gravitino has extended support to multiple data modalities, including tabular metadata from Apache Hive, Apache Iceberg, MySQL, and PostgreSQL; unstructured assets from HDFS and S3; streaming and messaging metadata from Apache Kafka; and metadata for machine learning models. To further strengthen governance in Gravitino, we have also integrated advanced capabilities, including tagging, audit logging, and end-to-end lineage capture.

After all enterprise metadata has been centralized through Gravitino, it forms a data brain: a structured, queryable, and semantically enriched representation of data assets. This enables not only consistent metadata access but also knowledge grounding, contextual reasoning, tool using and others. As we approach the 1.0 milestone, our focus shifts from pure metadata storage to metadata-driven contextual engineering—a foundation we call the Metadata-driven Action System, to provide the building blocks for the contextual engineering.

The release of Apache Gravitino 1.0.0 marks a significant engineering step forward, with robust APIs, extensible connectors, enhanced governance primitives, improved scalability and reliability in distributed environments. In the following sections, I will dive into the new features and architectural improvements introduced in Gravitino 1.0.0.

Metadata-driven action system

In version 1.0.0, we introduced three new components that enable us to build jobs to accomplish metadata-driven actions, such as table compaction, TTL data management, and PII identification. These three new components are: the statistics system, the policy system, and the job system.

Taking table compaction as an example:

Firstly, users can define the table compaction policy in Gravitino and associate this policy with the tables that need to be compacted.
Then, users can save the statistics of the table to Gravitino.
Also, users can define a job template for the compaction.
Lastly, users can use the statistics with the defined policy to generate the compaction parameters and use these parameters to trigger a compaction job based on the defined job templates.

Statistics system

The statistics system is a new component for the statistics store and retrieval. You can define and store the table/partition level statistics in Gravitino, and also fetch them through Gravitino for different purposes.

For the details of how we design this component, please see #7268. For instructions on using the statistics system, refer to the documentation here.

Policy system

The policy system enables you to define action rules in Gravitino, like compaction rules or TTL rules. The defined policy can be associated with the metadata, which means these rules will be enforced on the dedicated metadata. Users can leverage these enforced polices to decide how to trigger an action on the dedicated metadata.

Please refer to the policy system documentation to know how to use it. For more information on the policy system's implementation details, please refer to #7139.

Job system

The job system is another feature that allows you to submit and run jobs through Gravitino. Users can register a job template, then trigger a job based on the specific job template. Gravitino will help submit the job to the dedicated job executor, such as Apache Airflow. Gravitino can manage the job lifecycle and save the job status in it. With the job system, users can run a self-defined job to accomplish a metadata-driven action system.

In version 1.0.0, we have an initial version to support running the jobs as a local process. If you want to know more about the design details, you can follow issue #7154. Also, a user-facing documentation can be found here.

The whole metadata-driven action system is still in an alpha phase for version 1.0.0. The community will continue to evolve the code and take the Iceberg table maintenance as a reference implementation in the next version. Please stay tuned.

Agent-ready through the MCP server

MCP is a powerful protocol to bridge the gap between human languages and machine interfaces. With MCP, users can communicate with the LLM using natural language, and the LLM can understand the context and invoke the appropriate tools.

In version 1.0.0, the community officially delivered the MCP server for Gravitino. Users can launch it as a remote or local MCP server and connect to various MCP applications, such as Cursor and Claude Desktop. Additionally, we exposed all metadata-related interfaces as tools that MCP clients can call.

With the Gravitino MCP server, users can manage and govern metadata, as well as perform metadata-driven actions using natural language. Please follow issue #7483 for more details. Additionally, you can refer to the documentation for instructions on how to start the MCP server locally or in Docker.

Unified access control framework

Gravitino introduced the RBAC system in the previous version, but it only offers users the ability to grant privileges to roles and users, without enforcing access control when manipulating the secure objects. In 1.0.0, we complete this missing piece in Gravitino.

Currently, users can set access control policies through our RBAC system and enforce these controls when accessing secure objects. For details, you can refer to the umbrella issue #6762.

Add support for multiple locations model management

The model management is introduced in Gravitino 0.9.0. Users have since requested support for multiple storage locations within a single model version, allowing them to select a model version with a preferred location.

In 1.0.0, the community added multiple locations for model management. This feature is similar to the fileset’s support for multiple locations. Users can check the document here for more information. For more information on implementation details, please refer to this issue #7363.

Support the latest Apache Iceberg and Paimon versions

In Gravitino 1.0.0, we have upgraded the supported Iceberg version to 1.9.0. With the new version, we will add more feature support in the next release. Additionally, we have upgraded the supported Paimon version to 1.2.0, introducing new features for Paimon support.

You can see the issue #6719 for Iceberg upgrading and issue #8163 for Paimon upgrading.

Various core features

Core:

Add the cache system in the Gravitino entity store #7175.
Add Marquez integration as a lineage sink in Gravitino #7396.

Server:

Add Azure AD login support for OAuth authentication #7538.

Catalogs:

Support StarRocks catalog management in Gravitino #3302.

Clients:

Adds the custom configurations for clients #7816, #7817, #7670, #7456.

Spark connector:

Upgrade the supported Kyubbi version #7480.

UI:

Add web UI for listing files / directories under a fileset #7477.

Deployment:

Add hem char deployment for Iceberg REST catalog #7159.

Behavior changes

Compatible changes:

Rename the Hadoop catalog to fileset catalog #7184.
Allowing event listener changes Iceberg create table request #6486.
Support returning aliases when listing model version #7307.

Breaking changes:

Change the supported Java version to JDK 17 for the Gravitino server.
Remove the Python 3.8 support for the Gravitino Python client #7491.
Fix the unnecessary double encoding and decoding issue for fileset get location and list files interfaces #8335. This change is incompatible with the old version of Java and Python clients. Using old version clients with a new version server will meet a decoding issue in some unexpected scenarios.

Overall

There are still lots of features, improvements, and bug fixes that are not mentioned here. We thank the community for their continued support and valuable contributions.

Apache Gravitino 1.0.0 opens a new chapter from the data catalog to the smart catalog. We will continue to innovate and build, to add more Data and AI features. Please stay tuned!

Credits

This release acknowledges the hard work and dedication of all contributors who have helped make this release possible.

1161623489@qq.com, Aamir, Aaryan Kumar Sinha, Ajax, Akshat Tiwari, Akshat kumar gupta, Aman Chandra Kumar, AndreVale69, Ashwil-Colaco, BIN, Ben Coke, Bharath Krishna, Brijesh Thummar, Bryan Maloyer, Cyber Star, Danhua Wang, Daniel, Daniele Carpentiero, Dentalkart399, Drinkaiii, Edie, Eric Chang, FANNG, Gagan B Mishra, George T. C. Lai, Guilherme Santos, Hatim Kagalwala, Jackeyzhe, Jarvis, JeonDaehong, Jerry Shao, Jimmy Lee, Joonha, Joonseo Lee, Joseph C., Justin Mclean, KWON TAE HEON, Kang, KeeProMise, Khawaja Abdullah Ansar, Kwon Taeheon, Kyle Lin, KyleLin0927, Lord of Abyss, MaAng, Mathieu Baurin, Maxspace1024, Mikshakecere, Mini Yu, Minji Kim, Minji Ryu, Nithish Kumar S, Pacman, Peidian li, Praveen, Qian Xia, Qiang-Liu, Qiming Teng, Raj Gupta, Ratnesh Rastogi, Raveendra Pujari, Reuben George, RickyMa, Rory, Sambhavi Pandey, Sébastien Brochet, Shaofeng Shi, Spiritedswordsman, Sua Bae, Surya B, Tarun, Tian Lu, Tianhang, Timur, Viral Kachhadiya, Will Guo, XiaoZ, Xiaojian Sun, Xun, Yftach Zur, Yuhui, Yujiang Zhong, Yunchi Pang, Zhengke Zhou, _.mung, ankamde, arjun, danielyyang, dependabot[bot], fad, fanng, gavin.wang, guow34, jackeyzhe, kaghatim, keepConcentration, kerenpas, kitoha, lipeidian, liuxian, liuxian131, lsyulong, mchades, mingdaoy, predator4ann, qbhan, raveendra11, roryqi, senlizishi, slimtom95, taylor.fan, taylor12805, teo, tian bao, vishnu, yangyang zhong, youngseojeon, yuhui, yunchi, yuqi, zacsun, zhanghan, zhanghan18, 梁自强, 박용현, 배수아, 신동재, 이승주, 이준하

Apache Gravitino 0.9.1

July 21, 2025 · 2 min read

Rory Qi

PMC Member

Model Management

Support updating aliases for model versions #6814,#7158

Add file viewer support for Filesets #6860

Implement ListFilesEvent in FilesetEventDispatcher #7314

Support setOwner/getOwner event operations #7646

Trino Connector

Auto-load multiple metalakes in Trino connector #7288

JDBC Validation

Validate JDBC URLs during store initialization #7547

Bug Fixes

Core & Catalogs

Fix H2 backend file lock issues during deletion #7406

Prevent SQL session commit errors #7403

Correct OAuth token refresh in web UI #7426

Validate namespace string conversions #7516

Improve server force-kill shutdown logic #7513

Fix bypass key handling in Hive catalog #7416

Filter empty Hadoop storage locations #7190

Fix model catalog error messages #7346

Connectors

Spark Connector

Remove conflicting slf4j dependency #7287

Fix S3 credential test errors #7432

Trino Connector

Handle unsupported catalog providers #7322

Python Client

Fix storage handler mappings for S3/OSS/ABS #7225

Improve Java client error messages #7344

Filesets

Fix multi-location file paths #7371

Improvements

Core & Catalogs

Optimize column deletion logic (#7415)(https://github.com/apache/gravitino/issues/7415)

Auto-register mappers via SPI #7529

Validate JDBC entity store URLs #7614

Fix catalog index existence checks #7660

CLI & Clients

Remove duplicate owner field in CLI #7639

URL-encode paths in Java client #7686

Testing

Refactor Hadoop catalog test stubbing #7280

Fix precondition message mismatches #7521

Documentation

Add Trino REST catalog example #7121

Iceberg IRC guides for StarRocks/Doris #7368

OpenAPI specs for Fileset/File #6860

Fix access control docs #7195

Update model privilege docs #7555

Typo fixes #7448, #7647

Remove incubating status markers #7492

Add 0.9.1 release notes #7485

Build & Infra

Fix Helm chart versioning #7129, #7134

Upgrade Kyuubi dependency #7480

Credits

FANNG1 Abyss-lord jerqi jerryshao slimtom95 flaming-archer yunchipang KyleLin0927 xiaozcy diqiu50 yuqi1129 ziqiangliang carl239 LauraXia123 guov100 senlizishi fivedragon5 justinmclean Jackeyzhe Spiritedswordsman su8y

Apache Gravitino Graduates as a Top-Level Project at The Apache Software Foundation

June 3, 2025 · 3 min read

Justin Mclean

PMC Member

We’re excited to share that Apache Gravitino is now a Top-Level Project (TLP) at the Apache Software Foundation (ASF)! This milestone marks a major step forward for the project and the community that’s grown around it since entering the Apache Incubator in June 2024.

Gravitino was created to solve a growing pain in today’s data ecosystem: managing metadata across an explosion of systems including data warehouses, datalakes, lakehouses, streaming platforms, and AI tools. With Gravitino, you get a high-performance, open-source metastore that brings all that metadata together into one unified platform. Think of it as the missing layer that makes your data and AI assets easier to manage, discover, and govern.

“Graduation is a major milestone for any Apache project,” said Justin Mclean, Chair of the Apache Incubator. “Apache Gravitino has demonstrated the maturity expected of a top-level project, including a diverse and engaged community and a deep understanding of the ASF’s governance principles."

Built for Today’s Data Stack

Gravitino supports a wide range of systems out of the box, including Apache Iceberg, Apache Hive, Apache Kafka, MySQL, PostgreSQL, and more. Whether you're building a modern lakehouse architecture or trying to wrangle metadata across hybrid cloud and on-prem environments, Gravitino helps break down data silos and streamline metadata governance at scale.

“We’re thrilled to see Gravitino become an Apache Top-Level Project,” said Jerry Shao, Chair of Apache Gravitino. “Our community is deeply committed to building a scalable, extensible metadata system that helps enterprises unify their data ecosystem. Graduation is not the end but a new beginning and we’re just getting started.”

Growing Community, Real-World Adoption

In under a year, Gravitino has built a global community of contributors and gained real adoption from companies with serious data challenges.

“As organizations grapple with increasingly complex and distributed data and AI environments, Gravitino provides the unified metadata layer that’s been missing,” said Junping Du, CEO of Datastrato. “We’re proud to support the project and congratulate the community on this big step forward.”

"Gravitino is uniquely designed to bridge data and AI workloads. We're excited to deploy it across our multi-cloud AI clusters and contribute to many prioritized AI and agent-based use cases," said Jack Song, Director of Uber Data Platform. "Gravitino’s graduation marks its maturity entering the next level, backed by a thriving and engaged community."

“As the first open-source Iceberg REST Catalog, Apache Gravitino has been running in our production environment for quite some time,” said Ang Zhang, Director of Big Data Platform at Pinterest. “Gravitino’s graduation to an ASF Top-Level Project marks an important milestone—it reflects the project's maturity, the strength of its community, and its growing reliability for broader production use.”

Get Involved

Whether you’re looking to contribute, integrate Gravitino into your stack, or just learn more, now’s the perfect time to get involved:

- Website - GitHub

Thanks to everyone in the community who made this possible. Let’s keep growing!

Highlights​

AWS Glue Catalog and Engine Connector Support​

Unified View Management​

Core Cache and Authorization Consistency​

Enterprise-Grade Iceberg REST Catalog​

Upgrade Notes / Behavior Changes​

New Features​

Selected Improvements​

Notable Bug Fixes​

Acknowledgements​

Improvements​

Core & Server​

Authorization​

Iceberg REST Catalog​

Catalogs & Connectors​

Helm & Charts​

Build​

Bug Fixes​

Core & Server​

Trino Connector​

Iceberg REST Catalog​

Hive​

ClickHouse​

Web UI​

MCP Server​

Common​

CI & Infrastructure​

Documentation​

Acknowledgements​

Improvements​

Core & Server​

Authorization​

Iceberg REST Server​

Lance REST Server​

Common​

Bug Fixes​

Core & Server​

Authorization​

Iceberg​

OAuth​

Catalogs​

Hive​

Lance REST​

Spark Connector​

CI & Infrastructure​

Documentation​

Acknowledgements​

Highlights​

New Features​

Improvements​

Core & Server​

Authorization​

Iceberg REST Catalog (IRC)​

Catalogs & Connectors​

OAuth & Authentication​

Clients​

Lance REST Service​

Web UI​

Bug Fixes​

Breaking Changes​

Acknowledgements​

Introduction​

Timeline​

Key Features & Improvements​

Community​

Industry Trends in Metadata Management (2026)​

Future Work​

1. Universal Lakehouse & Format Interoperability​

2. Multimodal Data Stack for the AI Era​

3. Data Agent Orchestration (Metadata as the "Brain")​

4. Advanced Security: KMS & ABAC​

Highlights​

New Features​

Improvements​

Core & Server​

Catalogs & Connectors​

Clients​

Developer Experience & Operations​

Frontend Updates​

Bug Fixes​

Highlights

AWS Glue Catalog and Engine Connector Support

Unified View Management

Core Cache and Authorization Consistency

Enterprise-Grade Iceberg REST Catalog

Upgrade Notes / Behavior Changes

New Features

Selected Improvements

Notable Bug Fixes

Acknowledgements

Improvements

Core & Server

Authorization

Iceberg REST Catalog

Catalogs & Connectors

Helm & Charts

Build

Bug Fixes

Core & Server

Trino Connector

Iceberg REST Catalog

Hive

ClickHouse

Web UI

MCP Server

Common

CI & Infrastructure

Documentation

Acknowledgements

Improvements

Core & Server

Authorization

Iceberg REST Server

Lance REST Server

Common

Bug Fixes

Core & Server

Authorization

Iceberg

OAuth

Catalogs

Hive

Lance REST

Spark Connector

CI & Infrastructure

Documentation

Acknowledgements

Highlights

New Features

Improvements

Core & Server

Authorization

Iceberg REST Catalog (IRC)

Catalogs & Connectors

OAuth & Authentication

Clients

Lance REST Service

Web UI

Bug Fixes

Breaking Changes

Acknowledgements

Introduction

Timeline

Key Features & Improvements

Community

Industry Trends in Metadata Management (2026)

Future Work

1. Universal Lakehouse & Format Interoperability

2. Multimodal Data Stack for the AI Era

3. Data Agent Orchestration (Metadata as the "Brain")

4. Advanced Security: KMS & ABAC

Highlights

New Features

Improvements

Core & Server

Catalogs & Connectors

Clients

Developer Experience & Operations

Frontend Updates

Bug Fixes