Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@
import org.apache.iceberg.IsolationLevel;
import org.apache.iceberg.MetricsConfig;
import org.apache.iceberg.MetricsModes.None;
import org.apache.iceberg.PartitionField;
import org.apache.iceberg.PartitionSpec;
import org.apache.iceberg.RowDelta;
import org.apache.iceberg.RowLevelOperationMode;
Expand Down Expand Up @@ -904,9 +903,12 @@ public void renameColumn(ConnectorSession session, ConnectorTableHandle tableHan
Table icebergTable = getIcebergTable(session, icebergTableHandle.getSchemaTableName());
Transaction transaction = icebergTable.newTransaction();
transaction.updateSchema().renameColumn(columnHandle.getName(), target).commit();
if (icebergTable.spec().fields().stream().map(PartitionField::sourceId).anyMatch(sourceId -> sourceId == columnHandle.getId())) {
transaction.updateSpec().renameField(columnHandle.getName(), target).commit();
}
icebergTable.spec().fields().stream()
.filter(field -> field.sourceId() == columnHandle.getId())
.forEach(field -> {
String transform = field.transform().toString();
transaction.updateSpec().renameField(field.name(), getPartitionColumnName(target, transform)).commit();
});
transaction.commitTransaction();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -213,11 +213,21 @@ protected static String getPartitionColumnName(String columnName, String transfo
return columnName + "_bucket";
}

matcher = ICEBERG_BUCKET_PATTERN.matcher(transform);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing @hantangwangd Is there any reason why we need two matchers for each transform? I see one is for matching with "()", the other is matching "[]". The names are very confusing because the COLUMN_xxx is also in Iceberg:

    private static final Pattern COLUMN_BUCKET_PATTERN = Pattern.compile("bucket\\((\\d+)\\)");
    private static final Pattern COLUMN_TRUNCATE_PATTERN = Pattern.compile("truncate\\((\\d+)\\)");
    private static final Pattern ICEBERG_BUCKET_PATTERN = Pattern.compile("bucket\\[(\\d+)]");
    private static final Pattern ICEBERG_TRUNCATE_PATTERN = Pattern.compile("truncate\\[(\\d+)]");

The ICEBERG_ flavor vars were used in PartitionFields.java:

private static String toPartitionField(PartitionSpec spec, PartitionField field)
    {
        String name = spec.schema().findColumnName(field.sourceId());
        String transform = field.transform().toString();

        switch (transform) {
            case "identity":
                return name;
            case "year":
            case "month":
            case "day":
            case "hour":
                return format("%s(%s)", transform, name);
        }

        Matcher matcher = ICEBERG_BUCKET_PATTERN.matcher(transform);
        if (matcher.matches()) {
            return format("bucket(%s, %s)", name, matcher.group(1));
        }

        matcher = ICEBERG_TRUNCATE_PATTERN.matcher(transform);
        if (matcher.matches()) {
            return format("truncate(%s, %s)", name, matcher.group(1));
        }

        throw new UnsupportedOperationException("Unsupported partition transform: " + field);
    ```
This function converts the "[]" from Iceberg repo's PartitionField to "()", and then will be used as one of the properties when creating ConnectorTableMetadata. 
public ConnectorTableMetadata(SchemaTableName table, List<ColumnMetadata> columns, Map<String, Object> properties, Optional<String> comment)
{
    this(table, columns, properties, comment, emptyList(), Collections.emptyMap());
}
However, when we need to extract the partition column name from ColumnMetadata `column` `getPartitionColumnName(column.getName(), transform)` , we still need to match "[]"? Was it because the ColumnMetadata, when created, didn't convert "[]" to "()"?
Note that the `columns` parameter, which is  List<ColumnMetadata>, was part of the created ConnectorTableMetadata. Such ConnectorTableMetadata object contains a table property "bucket(c,n)", but the ColumnMetadata for that column c in `columns` member was still of "bucket[]"? Is my understanding correct?

public class ConnectorTableMetadata
{
private final SchemaTableName table;
private final Optional comment;
private final List columns; // With "bucket[]"?
private final Map<String, Object> properties; // with "bucket()"
private final TableConstraintsHolder tableConstraintsHolder;

I didn't dig deeper, and your confirmation would be helpful. If this is indeed the case, we may want to unify the representation instead of having two matchers for each transform.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, when we need to extract the partition column name from ColumnMetadata column getPartitionColumnName(column.getName(), transform) , we still need to match "[]"? Was it because the ColumnMetadata, when created, didn't convert "[]" to "()"?

Thanks for bring this for discussing @yingsu00. As I see, the partitioning definition maintained in the properties of ColumnMetadata (which currently appears to exist only when executing add column statements) always be in the form of (), which is consistent with our syntax for add columns:

alter table test_table add column col_new int with (partitioning = 'bucket(4)');

The newly added logic in getPartitionColumnName is used to build new target partition field name when executing alter column name. In this scenario, we know the column's target name and it's related Iceberg transforms, however, the output of transform.toString() is in the form of [] (e.g. bucket[4]). Therefore, we need to parse and identify this form as well.

In summary, one regex is for parsing Presto SQL syntax, and the other is for parsing the toString() output of Iceberg partition transforms. But I agree that it seems a little confuse to mix them all in the method getPartitionColumnName, this could make later readers wonder if we're also maintaining [] form in some other places. Do you think it would be more reasonable to split them into two separate methods with clear responsibilities? Or is there a better way you'd suggest? @yingsu00 @PingLiuPing

Copy link
Contributor Author

@PingLiuPing PingLiuPing Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yingsu00 for your comments.

Thanks @hantangwangd for your explanation. You are correct. ICEBERG_ regex is for parsing the transform string from PartitionField.transform().toString(). And the output string is in the format of bucket[4] or truncate[5].

(which currently appears to exist only when executing add column statements) always be in the form of ()

Yes, this is the reason that I just add ICEBERG_ regex in this method getPartitionColumnName.

I think it is ok to add those two matchers in this method getPartitionColumnName. And from the method name and parameters we can tell that this method's purpose is building a special format of column name based on input. Here we just make it handle an extra format of transform string.

Anyway, happy to separate this method and add a new single purpose method if required.

if (matcher.matches()) {
return columnName + "_bucket";
}

matcher = COLUMN_TRUNCATE_PATTERN.matcher(transform);
if (matcher.matches()) {
return columnName + "_trunc";
}

matcher = ICEBERG_TRUNCATE_PATTERN.matcher(transform);
if (matcher.matches()) {
return columnName + "_trunc";
}

throw new UnsupportedOperationException("Unknown partition transform: " + transform);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ public void testDeleteWithPartitionSpecEvolution()
}

@Test
public void testRenamePartitionColumn()
public void testRenameIdentityPartitionColumn()
{
assertQuerySucceeds("create table test_partitioned_table(a int, b varchar) with (partitioning = ARRAY['a'])");
assertQuerySucceeds("insert into test_partitioned_table values(1, '1001'), (2, '1002')");
Expand All @@ -314,6 +314,100 @@ public void testRenamePartitionColumn()
assertQuerySucceeds("DROP TABLE test_partitioned_table");
}

@DataProvider(name = "transforms")
public String[][] transforms()
{
return new String[][] {
{"a int", "a"},
{"a int", "bucket(a, 3)"},
{"a int", "bucket(a, 3)', 'a"},
{"a int", "truncate(a, 2)"},
{"a int", "truncate(a, 2)', 'a', 'bucket(a, 3)"}
};
}

@DataProvider(name = "dateTimeTransforms")
public String[][] dateTimeTransforms()
{
return new String[][] {
{"a timestamp", "year(a)"},
{"a timestamp", "month(a)"},
{"a timestamp", "day(a)"},
{"a timestamp", "hour(a)"},
{"a timestamp", "a', 'month(a)"}
};
}

@Test(dataProvider = "transforms")
public void testRenamePartitionColumn(String[] transform)
{
assertQuerySucceeds("DROP TABLE IF EXISTS test_partitioned_table");
assertQuerySucceeds(format("create table test_partitioned_table(%s) with (partitioning = ARRAY['%s'])", transform[0], transform[1]));
assertQuerySucceeds("insert into test_partitioned_table values(1), (2)");
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where a = 1").getOnlyValue(), 1L);
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where a = 2").getOnlyValue(), 1L);

assertQuerySucceeds("alter table test_partitioned_table rename column a to d");
assertQuerySucceeds("insert into test_partitioned_table values(1), (2), (3)");
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where d = 1").getOnlyValue(), 2L);
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where d = 2").getOnlyValue(), 2L);
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where d = 3").getOnlyValue(), 1L);
assertQueryFails("select a from test_partitioned_table", "line 1:8: Column 'a' cannot be resolved");

assertQuerySucceeds("alter table test_partitioned_table rename column d to e");
assertQuerySucceeds("insert into test_partitioned_table values (3)");
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where e = 1").getOnlyValue(), 2L);
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where e = 2").getOnlyValue(), 2L);
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where e = 3").getOnlyValue(), 2L);

assertQuerySucceeds("alter table test_partitioned_table rename column e to a");
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where a = 1").getOnlyValue(), 2L);
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where a = 2").getOnlyValue(), 2L);
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where a = 3").getOnlyValue(), 2L);
assertQuerySucceeds("insert into test_partitioned_table values (3)");
assertEquals(getQueryRunner().execute("SELECT count(*) FROM test_partitioned_table where a = 3").getOnlyValue(), 3L);
assertQueryFails("select d from test_partitioned_table", "line 1:8: Column 'd' cannot be resolved");
assertQueryFails("select e from test_partitioned_table", "line 1:8: Column 'e' cannot be resolved");
assertQuerySucceeds("DROP TABLE test_partitioned_table");
}

@Test(dataProvider = "dateTimeTransforms")
public void testRenameDatetimePartitionColumn(String[] transform)
{
Session session = Session.builder(getSession())
.setSystemProperty(LEGACY_TIMESTAMP, "false")
.build();
assertQuerySucceeds("DROP TABLE IF EXISTS test_partitioned_table");
assertQuerySucceeds(format("create table test_partitioned_table(%s) with (partitioning = ARRAY['%s'])", transform[0], transform[1]));
assertQuerySucceeds("insert into test_partitioned_table values(localtimestamp), (localtimestamp)");
assertEquals(getQueryRunner().execute(
session,
"SELECT count(*) FROM test_partitioned_table where a <= localtimestamp").getOnlyValue(), 2L);

assertQuerySucceeds("alter table test_partitioned_table rename column a to d");
assertQuerySucceeds("insert into test_partitioned_table values(localtimestamp), (localtimestamp), (localtimestamp)");
assertEquals(getQueryRunner().execute(
session,
"SELECT count(*) FROM test_partitioned_table where d <= localtimestamp").getOnlyValue(), 5L);
assertQueryFails("select a from test_partitioned_table", "line 1:8: Column 'a' cannot be resolved");

assertQuerySucceeds("alter table test_partitioned_table rename column d to e");
assertQuerySucceeds("insert into test_partitioned_table values (localtimestamp)");
assertEquals(getQueryRunner().execute(
session,
"SELECT count(*) FROM test_partitioned_table where e < localtimestamp").getOnlyValue(), 6L);

assertQuerySucceeds("alter table test_partitioned_table rename column e to a");
assertEquals(getQueryRunner().execute(
session,
"SELECT count(*) FROM test_partitioned_table where a < localtimestamp").getOnlyValue(), 6L);
assertQuerySucceeds("insert into test_partitioned_table values (localtimestamp)");
assertEquals(getQueryRunner().execute(session, "SELECT count(*) FROM test_partitioned_table").getOnlyValue(), 7L);
assertQueryFails("select d from test_partitioned_table", "line 1:8: Column 'd' cannot be resolved");
assertQueryFails("select e from test_partitioned_table", "line 1:8: Column 'e' cannot be resolved");
assertQuerySucceeds("DROP TABLE test_partitioned_table");
}

@Test
public void testAddPartitionColumn()
{
Expand Down
Loading