Skip to main content

3 posts tagged with "model"

View All Tags

Apache Gravitino 0.9.1

· 2 min read
Rory Qi
committer

Model Management

Support updating aliases for model versions #6814,#7158

Add file viewer support for Filesets #6860

Implement ListFilesEvent in FilesetEventDispatcher #7314

Support setOwner/getOwner event operations #7646

Trino Connector

Auto-load multiple metalakes in Trino connector #7288

JDBC Validation

Validate JDBC URLs during store initialization #7547

Bug Fixes

Core & Catalogs

Fix H2 backend file lock issues during deletion #7406

Prevent SQL session commit errors #7403

Correct OAuth token refresh in web UI #7426

Validate namespace string conversions #7516

Improve server force-kill shutdown logic #7513

Fix bypass key handling in Hive catalog #7416

Filter empty Hadoop storage locations #7190

Fix model catalog error messages #7346

Connectors

Spark Connector

Remove conflicting slf4j dependency #7287

Fix S3 credential test errors #7432

Trino Connector

Handle unsupported catalog providers #7322

Python Client

Fix storage handler mappings for S3/OSS/ABS #7225

Improve Java client error messages #7344

Filesets

Fix multi-location file paths #7371

Improvements

Core & Catalogs

Optimize column deletion logic (#7415)(https://github.com/apache/gravitino/issues/7415)

Auto-register mappers via SPI #7529

Validate JDBC entity store URLs #7614

Fix catalog index existence checks #7660

CLI & Clients

Remove duplicate owner field in CLI #7639

URL-encode paths in Java client #7686

Testing

Refactor Hadoop catalog test stubbing #7280

Fix precondition message mismatches #7521

Documentation

Add Trino REST catalog example #7121

Iceberg IRC guides for StarRocks/Doris #7368

OpenAPI specs for Fileset/File #6860

Fix access control docs #7195

Update model privilege docs #7555

Typo fixes #7448, #7647

Remove incubating status markers #7492

Add 0.9.1 release notes #7485

Build & Infra

Fix Helm chart versioning #7129, #7134

Upgrade Kyuubi dependency #7480

Credits

FANNG1 Abyss-lord jerqi jerryshao slimtom95 flaming-archer yunchipang KyleLin0927 xiaozcy diqiu50 yuqi1129 ziqiangliang carl239 LauraXia123 guov100 senlizishi fivedragon5 justinmclean Jackeyzhe Spiritedswordsman su8y

Apache Gravitino 0.9.0 - Focus on AI, data governance, and security with multi-dimensional feature upgrade

· 4 min read
Rory Qi
committer

Gravitino 0.9.0 focuses on advancements in AI, data governance, and security. Many of its new features are already being used in production environments. The release has attracted strong interest from users from well-known companies, with AI and security capabilities drawing attention.

In this version, the community optimized the user experience for fileset catalogs and model catalogs, making it easier for users to manage their unstructured AI data and model data.

The community added a new data lineage interface. Users can now implement a custom data lineage plugin to adapt to their own system.

For security, the community has corrected some privilege semantics and fixed authorization plugin corner cases to make the entire system more robust.

Model Catalog

Before 0.9.0, the model catalog was immutable, which was not flexible. In the new version, users can alter models and model versions and add tags #6626 #6222.

Fileset Catalog

Gravitino now supports multiple named storage locations within a single fileset and placeholder-based path generation.

With multiple location support, users can reference data across different file systems (HDFS, S3, GCS, local, etc.) through a unified fileset interface, each with a unique location name.

The placeholder feature allows dynamic storage path generation using the {{placeholder}} syntax, automatically replacing placeholders with corresponding fileset properties.

These enhancements significantly improves the flexibility for multi-cloud environments and complex data organization patterns while maintaining a clean abstraction layer for data assets management #6681.

GVFS (Gravitino Virtual File System)

GVFS has been enhanced to support accessing multiple locations within filesets. Users can now select which location to use through configuration properties, environment variables, or fileset default settings.

GVFS has also been refactored with a pluggable architecture allowing custom operations and hooks. This enables users to extend functionality through operations_class and hook_class configuration options for more flexible integration with their specific infrastructure #6938.

Security

The new version has added privileges for the data model and corrected some privilege semantics. It has also fixed some bugs with the Ranger path-based plugin #6620 #6575 #6821 #6864. All of the user-related, group-related, and role-related events are now supported for the event system #2969.

Data Lineage

The community added a data lineage interface that follows the OpenLineage API specification. Users can implement their custom data lineage plugin to adapt to their system #6617.

Core

The community cared about performance. Performance was improved by reducing the scope of the lock and batch reading data from storage #6744 #6560 #2870.

CLI

Additionally, there is one more change worth mentioning. Users no longer need to rely on the alias command to use the CLI. Instead, the community provided a convenient script located at ./bin/gcli.sh so that a user can directly use the CLI client #5383.

Connector

Both the Flink connector and the Spark connector added JDBC support #6233 #6164.

Chart

Deploying Gravitino on Kubernetes with a fully customizable configuration #6594.

Overall

Gravitino 0.9.0 focuses on advancements in AI, data governance, and security. We thank the Gravitino community for their continued support and valuable contributions. We can continue to innovate and build thanks to all our users' feedback. Thank you for taking the time to read this! To dive deeper into the Gravitino 0.9.0 release, explore the full documentation. Your feedback is greatly valued and helps shape the future of the Gravitino project and community.

Credits

JavedAbdullah AndreVale69 Brijeshthummar02 cool9850311 liuchunhao danhuawang unknowntpo FANNG1 tsungchih jerryshao justinmclean zhoukangcn Abyss-lord amazingLyche yuqi1129 Pranaykarvi puchengy LauraXia123 tengqm rud9192 antony0016 frankvicky TEOTEO520 TungYuChiang sunxiaojian xunliu LuciferYang diqiu50 zhengkezhou1 caican00 granewang yunchipang jerqi mchades rickyma Xander-run flaming-archer waukin lsyulong luoshipeng FourFriends this-user vitamin43 hdygxsj liangyouze

Apache, Apache Fink, Apache Hive, Apache Hudi, Apache Iceberg, Apache Ranger, Apache Spark, Apache Paimon and Apache Gravitino are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

Apache Gravitino 0.8.0 - strengthen the AI support for Apache Gravitino™ (incubating)

· 6 min read
Xiaojing Fang
committer

Apache Gravitino 0.8.0 is the third major release after entering the ASF. In this release, the community provides several exciting features like model catalog, Fuse for Fileset, credential vending for Fileset, Flink Iceberg and Paimon connector, Spark Paimon connector, and security enforcement.

This release blog will briefly introduce the new significant features and improvements. Please keep reading to learn more about what the community has worked on.

Model Catalog

Besides table and messaging metadata, Gravitino supports model metadata management in version 0.8. Gravitino allows a model to have multiple versions, and users can choose the best version. 0.8 provides basic functionality, and more features will be provided in the future, such as tagging models and better integration with machine learning workflows, to help users better manage models and extract more value from data and models.

  • Support model versioning metadata #4783.

Credential vending

Credential vending is a fundamental function in the cloud. In version 0.7, credential vending was supported for the Iceberg REST server. In version 0.8, we offer support for the Gravitino server and integrate it with Fileset. Based on Credential vending, Fileset can be used more securely and conveniently. The Gravitino server will centrally manage the security key and issue a temporary token, which is only valid for the Fileset that needs to be accessed by the request, making it more secure and eliminating the need for the user side to provide information such as AKSK.

In addition to the support for GCS and S3, version 0.8 also has built-in support for OSS and ADLS credential vending, and can support other storage in a pluggable manner.

  • Support credential vending for fileset client #5677.
  • Support credential vending for Gravitino #4398.
  • Support Aliyun OSS credential provider #5625.
  • Support ADLS credential provider #5624.

Fuse for Fileset

With the widespread use of Fileset in AI scenarios, how to improve usability and reduce user usage costs has become a major issue. In AI scenarios, users tend to access remote data in the way of local disks. Fuse for fileset is designed based on this, enabling users to access data managed by Fileset as if they were using local disks. Currently, basic alpha functionality is provided, which allows access to S3 data managed by Fileset. In subsequent versions, metadata caching functionality and support for more storage will be provided. Fuse for fileset is developed in Rust for performance considerations, and everyone is welcome to join the development.

  • Implement GVFS fuse to access Gravitino fileset in the POSIX Protocol #5504.

Lakehouse Federation

Gravitino provides a variety of catalogs, such as Apache Hive, Apache Iceberg, Apache Hudi, and Apache Paimon, etc. How can it be better connected to the surrounding ecosystem to facilitate user use? This iteration provides Flink Paimon connector, Flink Iceberg connector, and Spark Paimon connector to access data from Paimon and Iceberg. More connectors will be supported in the future. Let's look forward to it.

  • Support Iceberg catalog in Flink connector #3515.
  • Support Paimon catalog in Flink connector #5194 #5193 #5192.
  • Support Paimon catalog in Spark connector #5722.

Security enforcement

As a metadata management system, security is of the utmost importance. In this iteration, we managing security policies in chain authorization, and support the push-down of SQL security policies and path-based security policies. Additionally, the privilege policies of Iceberg and Paimon tables can be pushed down to Ranger. Based on Gravitino's security policies, a solid foundation is provided for your business development.

  • Chain authorization multiple underlying data source #5774.
  • SQL based authorization plugin #5530.
  • Add path-based authorization securable object and user-group mapping interface #5966.
  • Use chain authorization to support Hive and HDFS authorization #5956.
  • Ranger Authorization HDFS Plugin #5731.

Other notable enhancements

Iceberg REST server

  • Generate credential according to the data path and metadata path #5648.
  • Integrate audit log framework for Iceberg REST server #5556.
  • Add schema and view event for Iceberg REST server #5438 #5437. Add HTTP header to Iceberg event #5518.

Core

  • Optimization tree lock when drop and load Table/Schema #6044.
  • Support ADLS storage for Gravitino Iceberg catalog and Spark connector #5954.
  • Support pre-event for Gravitino server #5317.

Gravitino Client

  • Add CLI interface for Gravitino #4943.
  • Support Python client for table operations #5198.

BUG FIX

Version 0.8 has fixed a large number of bugs, especially in terms of security and fileset usage. Some are listed below.

  • Can't load filesystem 'gs' when use spark to access Gravitino GCS bundles #5609.
  • Invalid token issue happened in GVFS when Spark job long running #5596.
  • Trino, hive catalog: COMMNET COLUMN with ' ' or NULL has ArrayIndexOutOfBoundsException error, #5533.
  • Correct the behaviors when creating Iceberg table with none distribution #6196
  • Updable to create fileset with minio #6156.
  • Grant privileges to a role, duplicated privilege name with different condition shouldn’t be allowed to grant #6116.
  • The owner of the catalog is incorrect when using Basic Auth and Password is empty #5968.
  • Grant a metalake level privilege won't take effect #5892.

Overall

Apache Gravitino 0.8.0 is the third ASF release. This version adds a bunch of new features. We thank the Gravitino community for their continued support and valuable contributions. Thanks to our users' feedback, we can continue to innovate and build, so thanks to all those reading this!

To further explore the Gravitino 0.8.0 release, please check the documentation. Your feedback is invaluable to the community and the project.

Credits

This release acknowledges the hard work and dedication of all contributors who have helped make this release possible.

@Abyss-lord @Aireed @FANNG1 @LauraXia123 @LindaSummer @LiuQhahah @SophieTech88 @TungYuChiang @caican00 @chenyuan99 @cool9850311 @danhuawang @deeshantk @diqiu50 @featherchen @frankvicky @fsalhi2 @hdygxsj @hienduyph @jerqi @jerryshao @justinmclean @liangyouze @liuchunhao @luoshipeng @mchades @orenccl @pithecuse527 @rud9192 @sunxiaojian @theoryxu @waukin @xloya @xunliu @yuqi1129

Apache, Apache Fink, Apache Hive, Apache Hudi, Apache Iceberg, Apache Ranger, Apache Spark, Apache Paimon and Apache Gravitino are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.