Spark connector Paimon catalog
The Apache Gravitino Spark connector offers the capability to read and write Paimon tables, with the metadata managed by the Gravitino server.
Preparation
- Set
spark.sql.gravitino.enablePaimonSupporttotruein Spark configuration. - Download Paimon Spark runtime jar to Spark classpath.
Capabilities
Support DDL and DML operations:
CREATE NAMESPACEDROP NAMESPACELIST NAMESPACELOAD NAMESPACE- It can not return any user-specified configs now, as we only support FilesystemCatalog in spark-connector now.
CREATE TABLE- Doesn't support distribution and sort orders.
DROP TABLEALTER TABLELIST TABLEDESRICE TABLESELECTINSERT INTO & OVERWRITESchema EvolutionPARTITION MANAGEMENT, such asLIST PARTITIONS,ALTER TABLE ... DROP PARTITION ...
info
Only supports Paimon FilesystemCatalog on HDFS now.
Not supported operations:
ALTER NAMESPACE- Paimon does not support alter namespace.
- Row Level operations, such as
MERGE INTO,DELETE,UPDATE,TRUNCATE - Metadata tables, such as
{paimon_catalog}.{paimon_database}.{paimon_table}$snapshots - Other Paimon extension SQLs, such as
Tag - Call Statements
- View
- Time Travel
- Hive and Jdbc backend, and Object Storage for FilesystemCatalog
SQL example
-- Suppose paimon_catalog is the Paimon catalog name managed by Gravitino
USE paimon_catalog;
CREATE DATABASE IF NOT EXISTS mydatabase;
USE mydatabase;
CREATE TABLE IF NOT EXISTS employee (
id bigint,
name string,
department string,
hire_date timestamp
) PARTITIONED BY (name);
SHOW TABLES;
DESC TABLE EXTENDED employee;
INSERT INTO employee
VALUES
(1, 'Alice', 'Engineering', TIMESTAMP '2021-01-01 09:00:00'),
(2, 'Bob', 'Marketing', TIMESTAMP '2021-02-01 10:30:00'),
(3, 'Charlie', 'Sales', TIMESTAMP '2021-03-01 08:45:00');
SELECT * FROM employee WHERE name = 'Alice';
SHOW PARTITIONS employee;
ALTER TABLE employee DROP PARTITION (`name`='Alice');
Catalog properties
Gravitino spark connector will transform below property names which are defined in catalog properties to Spark Paimon connector configuration.
| Gravitino catalog property name | Spark Paimon connector configuration | Description | Since Version |
|---|---|---|---|
catalog-backend | metastore | Catalog backend type | 0.8.0-incubating |
uri | uri | Catalog backend uri | 0.8.0-incubating |
warehouse | warehouse | Catalog backend warehouse | 0.8.0-incubating |
Gravitino catalog property names with the prefix spark.bypass. are passed to Spark Paimon connector. For example, using spark.bypass.client-pool-size to pass the client-pool-size to the Spark Paimon connector.