Version: 0.7.0-incubating

Manage table partition using Apache Gravitino

Introduction

Although many catalogs inherently manage partitions automatically, there are scenarios where manual partition management is necessary. Usage scenarios like managing the TTL (Time-To-Live) of partition data, gathering statistics on partition metadata, and optimizing queries through partition pruning. For these reasons, Apache Gravitino provides capabilities of partition management.

Requirements and limitations

Partition management is based on the partitioned table, so please ensure that you are operating on a partitioned table.

The following table shows the partition operations supported across various catalogs in Gravitino:

Operation	Hive catalog	Iceberg catalog	Jdbc-Mysql catalog	Jdbc-PostgreSQL catalog	Jdbc-Doris catalog
Add Partition	✔	✘	✘	✘	✔
Get Partition by Name	✔	✘	✘	✘	✔
List Partition Names	✔	✘	✘	✘	✔
List Partitions	✔	✘	✘	✘	✔
Drop Partition	✔	✘	✘	✘	✔

WELCOME FEEDBACK

If you need additional partition management support for a specific catalog, please feel free to create an issue on the Gravitino repository.

Partition operations

Add partition

You must match the partition types you want to add with the table's partitioning types; Gravitino currently supports adding the following partition types:

Partition Type	Description
identity	An identity partition represents a result of identity partitioning.
range	A range partition represents a result of range partitioning.
list	A list partition represents a result of list partitioning.

For JSON examples:

identity
range
list

{
  "type": "identity",
  "name": "dt=2008-08-08/country=us",
  "fieldNames": [
    [
      "dt"
    ],
    [
      "country"
    ]
  ],
  "values": [
    {
      "type": "literal",
      "dataType": "date",
      "value": "2008-08-08"
    },
    {
      "type": "literal",
      "dataType": "string",
      "value": "us"
    }
  ]
}

note

The values of the field values must be the same ordering as the values of fieldNames.

When adding an identity partition to a partitioned Hive table, the specified partition name is ignored. This is because Hive generates the partition name based on field names and values.

{
  "type": "range",
  "name": "p20200321",
  "upper": {
    "type": "literal",
    "dataType": "date",
    "value": "2020-03-21"
  },
  "lower": {
    "type": "literal",
    "dataType": "null",
    "value": "null"
  }
}

{
  "type": "list",
  "name": "p202204_California",
  "lists": [
    [
      {
        "type": "literal",
        "dataType": "date",
        "value": "2022-04-01"
      },
      {
        "type": "literal",
        "dataType": "string",
        "value": "Los Angeles"
      }
    ],
    [
      {
        "type": "literal",
        "dataType": "date",
        "value": "2022-04-01"
      },
      {
        "type": "literal",
        "dataType": "string",
        "value": "San Francisco"
      }
    ]
  ]
}

note

Each list in the lists must have the same length. The values in each list must correspond to the field definitions in the list partitioning.

For Java examples:

Identity
Range
List

Partition partition =
        Partitions.identity(
            "dt=2008-08-08/country=us",
            new String[][] {{"dt"}, {"country"}},
            new Literal[] {
              Literals.dateLiteral(LocalDate.parse("2008-08-08")), Literals.stringLiteral("us")
            },
            Maps.newHashMap());

note

The values are in the same order as the field names.

When adding an identity partition to a partitioned Hive table, the specified partition name is ignored. This is because Hive generates the partition name based on field names and values.

Partition partition =
        Partitions.range(
            "p20200321",
            Literals.dateLiteral(LocalDate.parse("2020-03-21")),
            Literals.NULL,
            Maps.newHashMap());

Partition partition =
        Partitions.list(
            "p202204_California",
            new Literal[][] {
              {
                Literals.dateLiteral(LocalDate.parse("2022-04-01")),
                Literals.stringLiteral("Los Angeles")
              },
              {
                Literals.dateLiteral(LocalDate.parse("2022-04-01")),
                Literals.stringLiteral("San Francisco")
              }
            },
            Maps.newHashMap());

note

Each list in the lists must have the same length. The values in each list must correspond to the field definitions in the list partitioning.

You can add a partition to a partitioned table by sending a POST request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions endpoint or by using the Gravitino Java client. The following is an example of adding an identity partition to a Hive partitioned table:

Shell
Java

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "partitions": [
    {
      "type": "identity",
      "fieldNames": [
        [
          "dt"
        ],
        [
          "country"
        ]
      ],
      "values": [
        {
          "type": "literal",
          "dataType": "date",
          "value": "2008-08-08"
        },
        {
          "type": "literal",
          "dataType": "string",
          "value": "us"
        }
      ]
    }
  ]
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions

GravitinoClient gravitinoClient = GravitinoClient
    .builder("http://127.0.0.1:8090")
    .withMetalake("metalake")
    .build();

// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition addedPartition = 
    gravitinoClient
        .loadCatalog("catalog")
        .asTableCatalog()
        .loadTable(NameIdentifier.of("schema", "table"))
        .supportPartitions()
        .addPartition(
            Partitions.identity(
              new String[][] {{"dt"}, {"country"}},
              new Literal[] {
              Literals.dateLiteral(LocalDate.parse("2008-08-08")), Literals.stringLiteral("us")},
              Maps.newHashMap()));

Get a partition by name

You can get a partition by its name via sending a GET request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions/{partition_name} endpoint or by using the Gravitino Java client. The following is an example of getting a partition by its name:

Shell
Java

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions/p20200321

tip

If the partition name contains special characters, you should use URL encoding. For example, if the partition name is dt=2008-08-08/country=us you should use dt%3D2008-08-08%2Fcountry%3Dus in the URL.

GravitinoClient gravitinoClient = GravitinoClient
    .builder("http://127.0.0.1:8090")
    .withMetalake("metalake")
    .build();

// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition Partition = 
    gravitinoClient
        .loadCatalog("catalog")
        .asTableCatalog()
        .loadTable(NameIdentifier.of("schema", "table"))
        .supportPartitions()
        .getPartition("partition_name");

List partition names under a partitioned table

You can list all partition names under a partitioned table by sending a GET request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions endpoint or by using the Gravitino Java client. The following is an example of listing all the partition names under a partitioned table:

Shell
Java

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions

GravitinoClient gravitinoClient = GravitinoClient
    .builder("http://127.0.0.1:8090")
    .withMetalake("metalake")
    .build();

// Assume that you have a partitioned table named "metalake.catalog.schema.table".
String[] partitionNames = 
    gravitinoClient
        .loadCatalog("catalog")
        .asTableCatalog()
        .loadTable(NameIdentifier.of("schema", "table"))
        .supportPartitions()
        .listPartitionNames();

List partitions under a partitioned table

If you want to get more detailed information about the partitions under a partitioned table, you can list all partitions under a partitioned table by sending a GET request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions endpoint or by using the Gravitino Java client. The following is an example of listing all the partitions under a partitioned table:

Shell
Java

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions?details=true

// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition[] partitions =
        gravitinoClient
            .loadCatalog("catalog")
            .asTableCatalog()
            .loadTable(NameIdentifier.of("schema", "table"))
            .supportPartitions()
            .listPartitions();

Drop a partition by name

You can drop a partition by its name via sending a DELETE request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables/{partitioned_table_name}/partitions/{partition_name} endpoint or by using the Gravitino Java client. The following is an example of dropping a partition by its name:

Shell
Java

curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables/table/partitions/p20200321

tip

GravitinoClient gravitinoClient = GravitinoClient
    .builder("http://127.0.0.1:8090")
    .withMetalake("metalake")
    .build();

// Assume that you have a partitioned table named "metalake.catalog.schema.table".
Partition Partition = 
    gravitinoClient
        .loadCatalog("catalog")
        .asTableCatalog()
        .loadTable(NameIdentifier.of("schema", "table"))
        .supportPartitions()
        .dropPartition("partition_name");

Introduction​

Requirements and limitations​

Partition operations​

Add partition​

Get a partition by name​

List partition names under a partitioned table​

List partitions under a partitioned table​

Drop a partition by name​

Introduction

Requirements and limitations

Partition operations

Add partition

Get a partition by name

List partition names under a partitioned table

List partitions under a partitioned table

Drop a partition by name