forked from prestodb/presto
-
Notifications
You must be signed in to change notification settings - Fork 0
[ENG-31454] Batch Requests to Reduce Metastore Calls #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vamsikarnika
wants to merge
113
commits into
onehouseinc:master
Choose a base branch
from
vamsikarnika:improve_metastore_calls
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[ENG-31454] Batch Requests to Reduce Metastore Calls #2
vamsikarnika
wants to merge
113
commits into
onehouseinc:master
from
vamsikarnika:improve_metastore_calls
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…iterOperator (prestodb#25846) Summary: Pull Request resolved: prestodb#25846 Pass the Operator Context's Runtime Stats down into the `TableWriteOperator`'s Page Sink. Specifically this diff makes the following changes: a) `TableWriteOperator` passes its `RuntimeStats` into the Page Sink it creates via `PageSinkManager.createPageSink` b) When the `PageSinkManager.createPageSink` is provided `RuntimeStats`, these `RuntimeStats` are passed into the `Session.toConnectorSession` call, which creates a `FullConnectorSession` instance c) When `Session.toConnectorSession` is provided `RuntimeStats`, it passes this into the `FullConnectorSession` instance it constructs d) Add a `Builder` to `FullConnectorSession`, which allows providing a `RuntimeStats` instance to `FullConnectorSession` at construction-time. `FullConnectorSession.getRuntimeStats()` now returns the `RuntimeStats` which was set at construction-time. If no `RuntimeStats` were provided at construction-time, then `FullConnectorSession.getRuntimeStats()` defaults to return the `Session` object's `RuntimeStats`—this preserves backwards compatibility. All changes preserve forward-compatibility. ## Context Without this change, the `FullConnectorSession`'s `RuntimeStats` points to the `Session`'s `RuntimeStat`s. All metrics added to the `Session`'s `RuntimeStats` within an Operator Worker-side are discarded. That is, all Runtime Metrics added to the Connector Session's RuntimeStats when executing `TableWriterOperator` were being completely discarded. Specifically, in Meta, the stats from our internal filesystem implementation were missing. Passing the Operator Context's `RuntimeStats` instance down into Connector Session is the simplest way to fix this. Additionally, since the previous `RuntimeStat`s for `TableWriteOperator`'s `FullConnectorSession` were always discarded, we can be confident that replacing them with the `OperatorContext` `RuntimeStat`s will not break anyone else's code. Differential Revision: D80675849
There is an existing HiveClientConfig property hive.orc.use-column-names to access ORC file by column names, but no session property. This commit moves the existing HiveClientConfig property to HiveCommonClientConfig and introduces a session property in HiveCommonSessionProperties. It also implements changes accordingly in DwrfAggregatedPageSourceFactory, OrcAggregatedPageSourceFactory, OrcSelectivePageSourceFactory and OrcBatchPageSourceFactory. Constructors in those classes do not take boolean useOrcColumnNames anymore. Tests where those are used have also been changed. Hive connector documentation has been changed. An integration test has been added to TestHiveDistributedQueries.java. Helper function created in HiveTestUtils to replace function in TestHiveIntegrationSmokeTest. Remove superfluous constructors that have hiveClientConfig in parameter list from DwrfAggregatedPageSourceFactory.java and OrcAggregatedPageSourceFactory.java and change explicit calls in HiveTestUtils.java. Closes-Issue: prestodb#24134 Remove superfluous constructors that have hiveClientConfig in parameter list from DwrfAggregatedPageSourceFactory.java and OrcAggregatedPageSourceFactory.java and change explicit calls in HiveTestUtils.java. Add additional test with different column names to TestHiveDistributedQueries.java
The test framework client now receives statement executing results with `clearTransactionId` and `startTransactionId` flags embedded.
Velox provides a function to install the Arrow library. We don’t need to copy and paste the same code here and can re-use it. There is an EXTRA_ARROW_OPTIONS variable that allows custom Arrow library build options to be able to pass along that Arrow Flight should be built.
Reuse the existing Velox VarcharType to implement the type Char(n) in protocol. Add a SystemConfig "char-n-type-enabled" to guard this feature. Note this will make Char(n) type carry the behavior of VarcharType type. It is a different behavior from Char(n) type in Presto today, where it has a fixed number of characters. We suppose the user could call rpad() if today's behavior is needed.
…#25902) ## Description This PR update the github action to publish maven artifacts with central publishing method, since maven repo doesn't allow executable jar(with shell script) to be published, so we will create a github release and publish the jars Need fix in release branch: prestodb#25900 Sample release for executable jars https://github.com/unix280/presto/releases ## Motivation and Context ## Impact Release 0.294 ## Test Plan Tested the github release in myrepo: https://github.com/unix280/presto/actions/runs/17272968441 Tested the maven publishing in local env ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes ``` == NO RELEASE NOTE == ```
@pdabre12 has been voted as module committer for the Presto sidecar module. Also, I fixed a bug that project committers could not approve some C++ code. Per our contributing guide, project committers must be capable of approving all code (although C++ module committers are preferred for approving and merging C++ code).
…restodb#25687) Summary: Similar to cpp worker added the endpoint for java. We won't be using the worker-load as going forward we will be focussing on cpp worker only Differential Revision: D79471792
WriteMapping support for decimal type is already present for writing values but is missing from the query builder. This PR adds the write function to the query builder buildSql function
…bc write mappings These types are missing in the new write mapping interface. If implemented, this will add them back.
Added the `iceberg.engine.hive.lock-enabled` to enable or disable table locks when iceberg accesses a hive table. This can be overridden with the table property `engine.hive.lock-enabled`
The map function will not sort a json object by its keys, despite the json_parse function sorting the same input. If implemented, this will sort json objects. Resolves prestodb#24207
Summary: - Add abstract class BuiltInSpecialFunctionNamespaceManager - Add BuiltInNativeFunctionNamespaceManager - Refactor BuiltInPluginFunctionNamespaceManager to extend the abstract class - Deduplicate sidecar function registry logic by moving some of it to presto-main-base module from presto-native-sidecar-plugin module - Add function name conflict logic to FunctionAndTypeManager that overrides SQL built in functions but does not override Java built in functions. - Add retry logic in to fetch function registry from worker: retry interval is every 1 minute Note: `show functions` will show both built in functions in the same namespace. This is already similar behavior to regular Native sidecar namespace enabled with default presto.default prefix. The `show functions` logic is not addressed in this change. Can add some unit tests for show functions as well Tests: Added unit tests that enable to flag for this feature, and it is overriding the SQL function implementation properly. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```
Summary:
Fix to support GCC14 build
- Replace `{}` with explicit empty container to avoid the following error within optionals.
error: converting to 'std::in_place_t' from list would use explicit contructor
`{}` leads to copy initialization which is not allowed since in_place_t is marked explicit
- Add Import `chrono` in `Duration.h` as gcc14 mandates having it
- Correct include directory path for proxygen
- Ignore errors associated with template-id-cdtor as gcc14 fails build for constructors having template support
Rollback Plan:
```
== NO RELEASE NOTE ==
```
Differential Revision: D80784416
Pulled By: pratikpugalia
Presto-main was split into presto-main and presto-main-base. Update paths in codeowners file to reflect the change.
…stodb#25750) Summary: I added threshold for logging memory pool allocations": facebookincubator/velox#14437 In this adding I'm adding corresponding session property to configure the threshold. Differential Revision: D80066283
Co-authored-by: Christian Zentgraf <[email protected]>
## Description This PR is [the fix from branch release-0.294]( prestodb#25900), to fix maven release issues ## Motivation and Context Merge the fix from release branch into master branch ## Impact Newer releases ## Test Plan Tested with release 0.294 ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes ``` == NO RELEASE NOTE == ```
…5357 Fix prestodb#25357 Added type mapping table for Delta Lake to PrestoDB Co-Authored-By: Steve Burnett <[email protected]> Co-Authored-By: Jalpreet Singh Nanda <[email protected]>
Summary: Adds output row stats for sapphire-velox related sink operators Properly close write file on broadcast write Reviewed By: singcha Differential Revision: D81271224
Tracking disabled test fixed in prestodb#25511
This commit introduces mutual TLS authentication for the Arrow Flight connector, including necessary configuration options for both the client and server. It also includes fixes to the CI pipeline and C++ tests to ensure the new mTLS functionality is properly validated. co-authored-by: Ajas Mangal <[email protected]> co-authored-by: Elbin Pallimalil <[email protected]> co-authored-by: Thanzeel Hassan <[email protected]>
…source node (prestodb#26031) Summary: Sapphire-Velox might send multiple task sources with the same source node. Task manager doesn't expect this and directly send splits of each task source to velox task. Since Sapphire-Velox send all splits once for each velox task, then all such task sources have no more splits set. This hit the check failure in the recent added no-more split check in Velox task split add API. This PR fixes the issue by merge the splits from multiple task sources if they share the same source node id. Reviewed By: zacw7, tanjialiang Differential Revision: D82367224
…stoTask Previously, prestoTask->createFinishTimeMs was set after the lock scope, potentially not reflecting the actual task creation finish time. Now, the assignment is moved inside the lock, right after the task is created and assigned, to more accurately capture when the task creation completes.
## Description 1. Enables features for prestissimo image by default, added flags below when building the image: ``` -DPRESTO_ENABLE_REMOTE_FUNCTIONS=ON -DPRESTO_ENABLE_JWT=ON -DPRESTO_STATS_REPORTER_TYPE=PROMETHEUS -DPRESTO_MEMORY_CHECKER_TYPE=LINUX_MEMORY_CHECKER -DPRESTO_ENABLE_SPATIAL=ON ``` 2. Use cache mount on ccache directory to accelerate local build 3. Added ARM_BUILD_TARGET for arm build 4. Fixed error in centos dependency image when building arrow ## Motivation and Context By default the image is built with ``` -DPRESTO_ENABLE_TESTING=OFF -DPRESTO_ENABLE_PARQUET=ON -DPRESTO_ENABLE_S3=ON ``` Add more features so that user can try without rebuild the image ## Impact Release ## Test Plan Build and test ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```
…traction
This commit introduces a new overloaded functions
1. array_sort() that accepts an array and a lambda expression to extract sort keys, then sorts the array in ascending order based on those keys.
2. array_sort_desc() that accepts an array and a lambda expression to extract sort keys, then sorts the array in descending order based on those keys.
Such as,
array_sort(ARRAY['hello', 'hi', 'world'], x -> length(x))
-- Returns: ['hi', 'hello', 'world']
array_sort(ARRAY[row('apples', 23), row('bananas', 12)], x -> x[2])
-- Returns: [row('bananas', 12), row('apples', 23)]
array_sort_desc(ARRAY['hello', 'hi', 'world'], x -> length(x))
-- Returns: ['hello', 'world', 'hi']
array_sort_desc(ARRAY[row('apples', 23), row('bananas', 12)], x -> x[2])
-- Returns: [row('apples', 23), row('bananas', 12)]
The implementation leverages the same code generation approach to optimize key extraction based on element and key types.
Upgrade org.jdbi:jdbi3-core:3.4.0 to org.jdbi:jdbi3-core:3.49.5 org.jdbi:jdbi3-sqlobject:3.4.0 to org.jdbi:jdbi3-sqlobject:3.49.5 This upgrade will fix below vulnerabilities CVE-2024-1597, CVE-2023-32697 CVE-2023-2976, CVE-2022-41946 CVE-2022-41853,CVE-2022-31197 CVE-2022-26520,CVE-2022-23221 CVE-2022-21724,CVE-2021-42392 CVE-2020-8908, CVE-2020-13692 CVE-2018-10237.
Upgrade org.glassfish.jaxb:jaxb-runtime:2.3.1 to :4.0.5 Addresses CVE-2020-15250.
Cherry-pick of https://github.com/trinodb/trino/pull/7465 Co-authored-by: Praveen Krishna <[email protected]>
Changes adapted from trino/PR#11336, 12951, 14175 Original commit: d4c73389bbdb6b48c24a0969b259286b05a99ade 565700985baff0c4b29fdb1e3e26139a29318b9e ec8b9fd2b2cc9c8bc78c0ca1317dc34fcf2c48c7 98fc1ee8b29fca86f2a1b3abe4989524940333a6 1aea489884346822c812b1a242acc286e3e1248e 8bd17171a8469b9351e2fd7d9f2f49f4af9ea209 Author: kasiafi Modifications were made to adapt to Presto including: Change CatalogName to ConnectorId Change Symbol to VariableReferenceExpression TableFunctionNode extends InternalPlanNode instead of PlanNode. Add applyTableFunction to all implementations of Metadata Add empty ConnectorTableLayoutHandle to TableHandle in MetadataManger::applyTableFunction Removal of PlannerContext and replaced with Metadata Co-authored-by: kasiafi <[email protected]> Co-authored-by: Pratik Joseph Dabre <[email protected]> Co-authored-by: Xin Zhang <[email protected]>
Partial cherry pick of - trinodb/trino@693cfb6 trinodb/trino@0031497 trinodb/trino@ceb2d1b trinodb/trino@fe5335b trinodb/trino@050d089 Co-authored-by: Stephen Yugel <[email protected]> Co-authored-by: David Phillips <[email protected]> Co-authored-by: s2lomon <[email protected]> Co-authored-by: Mateusz Gajewski <[email protected]>
Partial cherry-pick but contains the following commits trinodb/trino@e8a8b5ab trinodb/trino@7b98764a Co-authored-by: Stephen Yugel <[email protected]> Co-authored-by: Szymon Homa <[email protected]> Co-authored-by: Mateusz Gajewski <[email protected]>
Partial cherry-pick of the following commits - trinodb/trino@3879f455 trinodb/trino@cd3da24c trinodb/trino@dcb6f0bf trinodb/trino@7cdd1336 trinodb/trino@8b8b0bec trinodb/trino@15e53ffd Co-authored-by: Stephen Yugel <[email protected]> Co-authored-by: lukasz-walkiewicz <[email protected]> Co-authored-by: Nik Hodgkinson <[email protected]>
Co-authored-by: Stephen Yugel <[email protected]> Co-authored-by: lukasz-walkiewicz <[email protected]>
PrestoTask can be created in different endpoints: - getTaskStatus - getTaskInfo - receive task update etc PrestoTask can be created in getTaskStatus, but it won't be able to create velox plan and start. It has to wait until receiving taskUpdate Make taskCreationTime represent the time between receiving first taskUpdate and task creation time
2fe233d to
7b45616
Compare
300a6f6 to
cbfeefa
Compare
698a75d to
6f1fd8a
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Motivation and Context
Improves performance by reducing number of calls to metastore
Release Notes