Skip to content

Conversation

@zvonand
Copy link
Collaborator

@zvonand zvonand commented Oct 9, 2025

Also includes #1053 (ClickHouse#87733)

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Deduce Iceberg metadata from Glue's Location

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

@zvonand zvonand force-pushed the zvonand-fix-glue-again branch from bbd2471 to b1229b7 Compare October 9, 2025 08:48
@github-actions
Copy link

github-actions bot commented Oct 9, 2025

Workflow [PR], commit [0402d8d]

@zvonand zvonand force-pushed the zvonand-fix-glue-again branch 5 times, most recently from 88224fa to 53eb8cc Compare October 14, 2025 10:17
@zvonand zvonand changed the title [WiP] Investigate bug with Glue Glue: Deduce Iceberg table metadata location if metadata_location not specified Oct 14, 2025
@zvonand zvonand force-pushed the zvonand-fix-glue-again branch from 499df74 to 52ffc6c Compare October 14, 2025 13:43
Copy link
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, just a couple of minor adjustments.

Btw, is it possible to add a test? If it's too hard / we are running against the clock, then it is ok.

// Construct path to version-hint.text
String version_hint_path = table_location + "metadata/version-hint.text";

DB::ASTStorage * storage = table_engine_definition->as<DB::ASTStorage>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems to be duplicated with GlueCatalog::classifyTimestampTZ. Instictively I would ask you to consider creating a function for this, but I am under the impression this PR is not going to upstream, right? Extracting it into a function would make rebasing harder, so it is ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, why is it not going into upstream?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd keep it as is for now. Later I will bring this PR to upstream, and then a small refactor will be done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, why is it not going into upstream?

Because we want it in the next 25.8 antalya release ASAP

Comment on lines +544 to +557
String version_hint_object_path = version_hint_path;
if (version_hint_object_path.starts_with("s3://"))
{
version_hint_object_path = version_hint_object_path.substr(5);
// Remove bucket from path
std::size_t pos = version_hint_object_path.find('/');
if (pos != std::string::npos)
version_hint_object_path = version_hint_object_path.substr(pos + 1);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using S3::URI, I think it offers the functionality you need. All you gotta do is to instantiate it passing the path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again, I am keeping the code that is similar to upstream. This is a good note, I will keep it in mind when making a PR to upstream


return table_location + "metadata/v" + version_str + "-metadata.json";
}
catch (...)
Copy link
Collaborator

@arthurpassos arthurpassos Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of nest try-catch blocks, consider creating two auxiliary functions that return a std::optional

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are trying to read objects that may not exist at all -- for we still need to have try/catch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but instead of having nested try catch blocks, you can have two different functions that return std::optional

Wouldn't something like the below work?

std::optional<std::string> resolveApproach1(...)
{
     try
    {
          // stuff that may throw
          return resolved_path;
    }
    catch(...)
    {
         return std::nullopt;
    }
}

std::optional<std::string> resolveApproach2(...)
{
     try
    {
          // stuff that may throw
          return resolved_path;
    }
    catch(...)
    {
         return std::nullopt;
    }
}

String GlueCatalog::resolveMetadataPathFromTableLocation(const String & table_location, const TableMetadata & table_metadata) const
{
     ....
     if (const auto path = resolveApproach1())
    {
          return path;
     }
     if (const auto path = resolveApproach2())
    {
         return path;
    }

   return std::nulopt;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe

But I actually do not think it shall be done here (as it is a style and code beauty issue, though I agree that what you suggest looks better). I will probably refactor some of this code when submitting into upstream (e.g. as suggested in your other comments), so I do not see a good reason to spend time on it now.

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@zvonand
Copy link
Collaborator Author

zvonand commented Oct 15, 2025

Btw, is it possible to add a test?

Not really. IDK if a good Glue emulator exists. But the thing that existing tests work with != AWS Glue. So, it can only be tested against real AWS.

@zvonand zvonand force-pushed the zvonand-fix-glue-again branch from 03ae62a to 0402d8d Compare October 15, 2025 14:41
@Enmk Enmk merged commit 693da52 into antalya-25.8 Oct 16, 2025
308 of 368 checks passed
@zvonand zvonand added the port-antalya PRs to be ported to all new Antalya releases label Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-25.8 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants