Skip to content

Conversation

@simoneves
Copy link
Contributor

@simoneves simoneves commented Nov 3, 2025

There are additional Hive config options which can be used in the Worker but if they are also in the Coordinator, it throws an error.

For now, we just add some Parquet read parameters at their defaults, and is therefore just a convenience for users who wish to tweak these for some specific machine.

IMPORTANT!
You will need to use --overwrite-config when first running with this PR, otherwise it will reuse the existing tree, and there will be a hive.properties file in the old location which will throw a startup error in the Docker mappings. Be sure to copy any edited files aside before doing this, of course, or they will be lost.

- ./config/generated/java/etc_common:/opt/presto-server/etc
- ./config/generated/java/etc_coordinator/config_java.properties:/opt/presto-server/etc/config.properties
- ./config/generated/java/etc_coordinator/node.properties:/opt/presto-server/etc/node.properties
- ./config/generated/java/etc_coordinator/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the updates be in docker-compose.common.yml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can't be because configs are now per-variant

Comment on lines 24 to 25
parquet.reader.chunk-read-limit=0
parquet.reader.pass-read-limit=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These configurations do not appear in the documentation. Can you please add comments that describe what these parameters do and why they are needed?

Copy link
Contributor Author

@simoneves simoneves Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't needed, other than to prove a point that the Coordinator and Worker configs can be different, but apparently @devavret tweaks them on his local laptop system, so asked for them to be exposed.

The values are not documented in Velox itself, but appear to be passed to the cuDF Chunked Parquet Reader, and that documentation is here:

https://docs.rapids.ai/api/libcudf/stable/classcudf_1_1io_1_1chunked__parquet__reader#a49f5549b53257828d50f5fa65114e07a

The values in that API are in bytes, but it appears that the config parser is smart enough to convert (say) 16M into (16 * 1024 * 1024).

I have added comments to the template file based on the parameter descriptions in that documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants