Apache Gravitino 0.7.0 - strengthen the cloud support for Apache Gravitino™ (incubating)
Gravitino 0.7.0 is the second major release after entering the ASF. In this release, the community mainly focuses on strengthening cloud support, to make Gravitino work better in the cloud environment.
This release blog will briefly introduce the new features related to cloud support, as well as other significant features and improvements. Please keep reading to learn more about what the community has worked on.
Cloud storage support for Gravitino
As more and more users run their data stacks on the cloud and use cloud object storage, cloud storage support becomes an imperative requirement. In this release, the community mainly focuses on adding cloud storage support for Gravitino and makes sure Gravitino itself and its connectors/sources can work smoothly with cloud storage.
In this release:
- Gravitino Iceberg REST catalog server now supports different cloud storages, including AWS S3, Google GCS, Aliyun OSS. Users can simply configure it to make it work.
- Gravitino Fileset catalog now supports managing files (objects) stored in S3, GCS, and OSS. Gravitino provides both server-side pluggable framework and client-side Java / Python GVFS (Gravitino Virtual File System) SDK. Users can easily use their existing tools with the Gravitino provided bundled packages to access the data in the cloud storage. Besides, Gravitino also provides a pluggable framework for users to implement their own storage support.
- Gravitino’s Hive, Paimon, and Iceberg catalogs also adds and verifies the support with different cloud storage in this release.
- Gravitino’s Spark, Trino connector also verifies to work with cloud storage.
Overall, with 0.7.0 release Gravitino could generally support working with different cloud storages. You can check our issue #4396 to know more. Also, we’re continuing to add more cloud storage support in the following releases, please stay tuned.
Credential vending support in Gravitino
Besides the cloud storage support, credential vending support is also important for Gravitino, especially to work with cloud storage. The traditional way of using AKSK is not convenient and safe, with credential vending technology, Gravitino server will help users to get the temporary tokens for authentication, which will significantly simplify the client side configurations and centralized the authentications.
In Gravitino 0.7.0, we introduce a framework to support Credential vending, also add S3 and GCS token support. Besides, we integrated this framework in Gravitino Iceberg REST catalog service. So users can smoothly access the Iceberg table on S3 and GCS with authentication.
But, this is just the first step of credential vending, we will add more integrations with Gravitino, like fileset support, connector support, etc, in the next release.
For the details of credential vending, please check the issue #4398 and the design document.
Unified access control improvements
In Gravitino 0.6.0, we introduced the alpha version of unified access control with Apache Ranger support (here), but this feature still needs to improve a lot. In the version 0.7.0, we add lots of improvements and fix bunches of bugs to make this access control end to end workable. Now, with the release of 0.7.0, the Gravitino unified access control can work well with Spark and Ranger to secure the table from end to end. To see what we have fixed, please check out our issue #4615. You can also try our playground to experience the unified access control feature.
Centralized audit log support
Thanks to the community, Gravitino now supports centralized audit log. With this feature enabled, users can get the audit log in the centralized place, no matter they’re accessing tables or filesets from various sources.
Gravitino’s audit log framework also supports to plugin different formatter and writer, so users can implement their own log format and output destinations.
Please see the issues #4887 and #4021 to know more about Gravitino’s centralized audit log.
New data sources support
As a unified data catalog, the community always pursue the target of adding more data sources. In this version, Gravitino adds two new data sources, one is Apache Hudi, another is OceanBase. You can now use Gravitino to manipulate Hudi and Oceanbase metadata in a unified manner.
Various core features
Apart from the features listed above, this version also improves a lot in its core, here lists several important features:
- Add PostgreSQL support for storage backend #4101. Gravitino already supports using MySQL, H2 as its backend metadata storage. In 0.7.0, the community adds the PostgreSQL support to enlarge its adoption.
- Unify the catalog and metalake drop behavior #5031. In the previous version, we didn’t enforce the behavior of catalog and metadata drop operation. In this version, we redefine its behavior and make it much safer to use.
- Manage the column in Gravitino #4493. In 0.7.0, we introduce the column entity in Gravitino, and can be managed by Gravitino versionly. With this feature introduced, Gravitino now can support tagging on columns, and in future it can support column level operations.
- Add event listener for Iceberg REST catalog server #5204 and support pre-event for event listener #5112.
Other notable enhancements
Gravitino core
- Supporting storing column metadata in Gravitino #4493.
- Support pre-event for Gravitino #5049.
- Unify drop metalake and catalog behavior #5031.
- Add credential vending support in Gravitino #4398.
- Support audit log in Gravitino #4887.
- Shrink the package size of Gravitino #4513.
Iceberg REST catalog server
- Add credential vending for Iceberg REST server. #4993.
- Add event listener for Iceberg REST server #5204.
- Support pre-event for event listener #5112.
Catalog related
- Add OSS support for fileset catalog #5173.
- Add GCS support for fileset catalog #5074.
- Add S3 support for fileset catalog #3379.
- Add pluggable storage support fro fileset catalog #5019.
- Add S3 support for Paimon catalog #4938.
- Add catalog support for Hudi #4306.
- Add catalog support for OceanBase#4848.
API and client
- Add S3 fileset support for Python GVFS client #5188.
- Add GCS fileset support for Python GVFS client #5139.
- Add OSS fileset support for Python GVFS client #5221.
- Supports unified auditing of Fileset metadata and data operations #4021.
- Support OAuth2 in Python GVFS #3758.
UI
All the resolved issues targeting to the 0.7.0 release can be seen at https://github.com/apache/gravitino/issues?q=is%3Aissue+is%3Aclosed+label%3A0.7.0+.
Overall
Apache Gravitino 0.7.0 is the second ASF release, this version add bunch of new features, we would like to show appreciation to the Gravitino community for their continued support and valuable contributions. Thanks to the feedback of our users, we are able to continue to innovate and build, so thanks to all those reading this!
To explore Gravitino 0.7.0 release, please check the documentation. Your feedback is invaluable to the community and the project.
Credits
This release acknowledges the hard work and dedication of all contributors who have helped make this release possible.
@FANNG1 @LauraXia123 @LindaSummer @LiuQhahah @Naresh-kumar-Thodupunoori @SeanAverS @caican00 @coolderli @diqiu50 @featherchen @hanwxx @jerqi @jerryshao @jingjia88 @justinmclean @koonchen @lsyulong @lw-yang @mchades @noidname01 @puchengy @shaofengshi @theoryxu @xiaozcy @xloya @xunliu @yangyuxia @yaoderek @yuanoOo @yuqi1129