gravitino.catalog.fileset_catalog.FilesetCatalog¶
- class gravitino.catalog.fileset_catalog.FilesetCatalog(namespace: Namespace, name: str | None = None, catalog_type: Type = Type.UNSUPPORTED, provider: str | None = None, comment: str | None = None, properties: Dict[str, str] | None = None, audit: AuditDTO | None = None, rest_client: HTTPClient | None = None)¶
- Bases: - BaseSchemaCatalog- Fileset catalog is a catalog implementation that supports fileset like metadata operations, for example, schemas and filesets list, creation, update and deletion. A Fileset catalog is under the metalake. - __init__(namespace: Namespace, name: str | None = None, catalog_type: Type = Type.UNSUPPORTED, provider: str | None = None, comment: str | None = None, properties: Dict[str, str] | None = None, audit: AuditDTO | None = None, rest_client: HTTPClient | None = None)¶
 - Methods - __init__(namespace[, name, catalog_type, ...])- alter_fileset(ident, *changes)- Update a fileset metadata in the catalog. - alter_schema(schema_name, *changes)- Alter the schema with specified identifier by applying the changes. - Raises: - Return the {@link SupportsSchemas} if the catalog supports schema operations. - Raises: - Returns: - audit_info()- builder([name, catalog_type, provider, ...])- check_fileset_name_identifier(ident)- check_fileset_namespace(namespace)- comment()- The comment of the catalog. - create_fileset(ident, comment, fileset_type, ...)- Create a fileset metadata in the catalog. - create_schema([schema_name, comment, properties])- Create a new schema with specified identifier, comment and metadata. - drop_fileset(ident)- Drop a fileset from the catalog. - drop_schema(schema_name, cascade)- Drop the schema with specified identifier. - format_file_location_request_path(namespace, ...)- format_fileset_request_path(namespace)- format_schema_request_path(ns)- get_file_location(ident, sub_path)- Get the actual location of a file or directory based on the storage location of Fileset and the sub path. - list_filesets(namespace)- List the filesets in a schema namespace from the catalog. - List all the schemas under the given catalog namespace. - load_fileset(ident)- Load fileset metadata by {@link NameIdentifier} from the catalog. - load_schema(schema_name)- Load the schema with specified identifier. - name()- Returns: - The properties of the catalog. - provider()- Returns: - schema_exists(schema_name)- Check if a schema exists. - to_fileset_update_request(change)- to_schema_update_request(change)- type()- Returns: - validate()- Attributes - A reserved property to specify the package location of the catalog. - rest_client- PROPERTY_PACKAGE = 'package'¶
- A reserved property to specify the package location of the catalog. The “package” is a string of path to the folder where all the catalog related dependencies is located. The dependencies under the “package” will be loaded by Gravitino to create the catalog. - The property “package” is not needed if the catalog is a built-in one, Gravitino will search the proper location using “provider” to load the dependencies. Only when the folder is in different location, the “package” property is needed. 
 - class Type(value)¶
- Bases: - Enum- The type of the catalog. - FILESET = 'fileset'¶
- Catalog Type for Fileset System (including HDFS, S3, etc.), like path/to/file 
 - MESSAGING = 'messaging'¶
- Catalog Type for Message Queue, like kafka://topic 
 - RELATIONAL = 'relational'¶
- “Catalog Type for Relational Data Structure, like db.table, catalog.db.table. 
 - UNSUPPORTED = 'unsupported'¶
- Catalog Type for test only. 
 
 - alter_fileset(ident: NameIdentifier, *changes) Fileset¶
- Update a fileset metadata in the catalog. - Args:
- ident: A fileset identifier, which should be “schema.fileset” format. changes: The changes to apply to the fileset. 
- Args:
- IllegalArgumentException If the changes are invalid. NoSuchFilesetException If the fileset does not exist. 
- Returns:
- The updated fileset metadata. 
 
 - alter_schema(schema_name: str, *changes: SchemaChange) Schema¶
- Alter the schema with specified identifier by applying the changes. - Args:
- schema_name: The name of the schema. changes: The metadata changes to apply. 
- Raises:
- NoSuchSchemaException if the schema with specified identifier does not exist. 
- Returns:
- The altered Schema. 
 
 - as_fileset_catalog()¶
- Raises:
- UnsupportedOperationException if the catalog does not support fileset operations. 
- Returns:
- the FilesetCatalog if the catalog supports fileset operations. 
 
 - as_schemas()¶
- Return the {@link SupportsSchemas} if the catalog supports schema operations. - Raises:
- UnsupportedOperationException if the catalog does not support schema operations. 
- Returns:
- The {@link SupportsSchemas} if the catalog supports schema operations. 
 
 - as_table_catalog() TableCatalog¶
- Raises:
- UnsupportedOperationException if the catalog does not support table operations. 
- Returns:
- the {@link TableCatalog} if the catalog supports table operations. 
 
 - as_topic_catalog() TopicCatalog¶
- Returns:
- the {@link TopicCatalog} if the catalog supports topic operations. 
- Raises:
- UnsupportedOperationException if the catalog does not support topic operations. 
 
 - comment() str¶
- The comment of the catalog. Note. this method will return null if the comment is not set for this catalog. - Returns:
- The provider of the catalog. 
 
 - create_fileset(ident: NameIdentifier, comment: str, fileset_type: Type, storage_location: str, properties: Dict[str, str]) Fileset¶
- Create a fileset metadata in the catalog. - If the type of the fileset object is “MANAGED”, the underlying storageLocation can be null, and Gravitino will manage the storage location based on the location of the schema. - If the type of the fileset object is “EXTERNAL”, the underlying storageLocation must be set. - Args:
- ident: A fileset identifier, which should be “schema.fileset” format. comment: The comment of the fileset. fileset_type: The type of the fileset. storage_location: The storage location of the fileset. properties: The properties of the fileset. 
- Raises:
- NoSuchSchemaException If the schema does not exist. FilesetAlreadyExistsException If the fileset already exists. 
- Returns:
- The created fileset metadata 
 
 - create_schema(schema_name: str | None = None, comment: str | None = None, properties: Dict[str, str] | None = None) Schema¶
- Create a new schema with specified identifier, comment and metadata. - Args:
- schema_name: The name of the schema. comment: The comment of the schema. properties: The properties of the schema. 
- Raises:
- NoSuchCatalogException if the catalog with specified namespace does not exist. SchemaAlreadyExistsException if the schema with specified identifier already exists. 
- Returns:
- The created Schema. 
 
 - drop_fileset(ident: NameIdentifier) bool¶
- Drop a fileset from the catalog. - The underlying files will be deleted if this fileset type is managed, otherwise, only the metadata will be dropped. - Args:
- ident: A fileset identifier, which should be “schema.fileset” format. 
- Returns:
- true If the fileset is dropped, false the fileset did not exist. 
 
 - drop_schema(schema_name: str, cascade: bool) bool¶
- Drop the schema with specified identifier. - Args:
- schema_name: The name of the schema. cascade: Whether to drop all the tables under the schema. 
- Raises:
- NonEmptySchemaException if the schema is not empty and cascade is false. 
- Returns:
- true if the schema is dropped successfully, false otherwise. 
 
 - get_file_location(ident: NameIdentifier, sub_path: str) str¶
- Get the actual location of a file or directory based on the storage location of Fileset and the sub path. - Args:
- ident: A fileset identifier, which should be “schema.fileset” format. sub_path: The sub path of the file or directory. 
- Returns:
- The actual location of the file or directory. 
 
 - list_filesets(namespace: Namespace) List[NameIdentifier]¶
- List the filesets in a schema namespace from the catalog. - Args:
- namespace: A schema namespace. This namespace should have 1 level, which is the schema name 
- Raises:
- NoSuchSchemaException If the schema does not exist. 
- Returns:
- A list of NameIdentifier of filesets under the given namespace. 
 
 - list_schemas() List[str]¶
- List all the schemas under the given catalog namespace. - Raises:
- NoSuchCatalogException if the catalog with specified namespace does not exist. 
- Returns:
- A list of schema names under the given catalog namespace. 
 
 - load_fileset(ident: NameIdentifier) Fileset¶
- Load fileset metadata by {@link NameIdentifier} from the catalog. - Args:
- ident: A fileset identifier, which should be “schema.fileset” format. 
- Raises:
- NoSuchFilesetException If the fileset does not exist. 
- Returns:
- The fileset metadata. 
 
 - load_schema(schema_name: str) Schema¶
- Load the schema with specified identifier. - Args:
- schema_name: The name of the schema. 
- Raises:
- NoSuchSchemaException if the schema with specified identifier does not exist. 
- Returns:
- The Schema with specified identifier. 
 
 - name() str¶
- Returns:
- The name of the catalog. 
 
 - properties() Dict[str, str]¶
- The properties of the catalog. Note, this method will return null if the properties are not set. - Returns:
- The properties of the catalog. 
 
 - provider() str¶
- Returns:
- The provider of the catalog. 
 
 - schema_exists(schema_name: str) bool¶
- Check if a schema exists. - If an entity such as a table, view exists, its parent namespaces must also exist. For example, if table a.b.t exists, this method invoked as schema_exists(a.b) must return true. - Args:
- schema_name: The name of the schema. 
- Returns:
- True if the schema exists, false otherwise.