Skip to content

Conversation

@bdice
Copy link

@bdice bdice commented Apr 20, 2025

Updates from Velox main, now including first upstream PR with OrderBy.

Changes:
Fix deprecated use of 0/NULL in velox/tpch/gen/dbgen/bm_utils.cpp + 1 ([#12612](https://github.com/facebookincubator/velox/pull/12612))
feat: Add parse_duration Presto function ([#12500](https://github.com/facebookincubator/velox/pull/12500))
Allow exact buffer allocation ([#12594](https://github.com/facebookincubator/velox/pull/12594))
Update SS index source to project key columns from lookup result ([#12618](https://github.com/facebookincubator/velox/pull/12618))
fix: Subfield prunings in ArrayWithOffsets and SlidingWindowMap ([#12587](https://github.com/facebookincubator/velox/pull/12587))
misc(fuzzer): Enable window fuzzer test with FB-only functions ([#12515](https://github.com/facebookincubator/velox/pull/12515))
misc(test_utils): Add file and directory operations to HiveConnectorTestBase ([#12415](https://github.com/facebookincubator/velox/pull/12415))
build: Add support for `find_package(Arrow REQUIRED)` ([#12599](https://github.com/facebookincubator/velox/pull/12599))
misc: Improve OutputBufferManager initialization ([#11350](https://github.com/facebookincubator/velox/pull/11350))
feat: Extract hive connector ID from task meta ([#12602](https://github.com/facebookincubator/velox/pull/12602))
Add ip_prefix_collapse function [7/n] ([#11778](https://github.com/facebookincubator/velox/pull/11778))
refactor: Cleanup old arbitration time config ([#12626](https://github.com/facebookincubator/velox/pull/12626))
fix(dwio): Buffer corruption when new capacity is less than old size ([#12601](https://github.com/facebookincubator/velox/pull/12601))
fix: Enable presto verification for RowNumberFuzzer CI ([#12436](https://github.com/facebookincubator/velox/pull/12436))
fix(fuzzer): Fix ApproxDistinctResultVerifier for approx_set ([#12635](https://github.com/facebookincubator/velox/pull/12635))
feat: Add Spark get_struct_field function ([#12166](https://github.com/facebookincubator/velox/pull/12166))
Fix(json): Unescape unicode when casting to json ([#12627](https://github.com/facebookincubator/velox/pull/12627))
misc: Throw unsupported type error for unregistered type. ([#12629](https://github.com/facebookincubator/velox/pull/12629))
fix: The new LLVM library for time zones is slow with DST in the distant future ([#12636](https://github.com/facebookincubator/velox/pull/12636))
feat: BingTile construction and property functions ([#12580](https://github.com/facebookincubator/velox/pull/12580))
feat: Add support for Unknown Value in map_entries ([#12622](https://github.com/facebookincubator/velox/pull/12622))
build(ci): Refactor benchmark jobs into a single workflow ([#12605](https://github.com/facebookincubator/velox/pull/12605))
feat: Add filter pushdown to sequence storage table handle ([#12643](https://github.com/facebookincubator/velox/pull/12643))
fix(re2): Add support for RE2 2023-06-01 ([#12631](https://github.com/facebookincubator/velox/pull/12631))
docs(fuzzer): Move Expression Fuzzer document to its own page ([#12641](https://github.com/facebookincubator/velox/pull/12641))
docs(fuzzer): Add document about custom input generator for expression fuzzer ([#12642](https://github.com/facebookincubator/velox/pull/12642))
fix: The new LLVM library for time zones is slow with transitions prior to 2037 ([#12646](https://github.com/facebookincubator/velox/pull/12646))
misc: Add runtime stats to track and report the index lookup wait time ([#12653](https://github.com/facebookincubator/velox/pull/12653))
misc: Include runtime stats in planNodeStats and log in dpp and ss load tests ([#12666](https://github.com/facebookincubator/velox/pull/12666))
misc: Support tracking blocked waiting time for sequential task execution ([#12663](https://github.com/facebookincubator/velox/pull/12663))
Remove unused exception parameter from velox/common/base/Pointers.h ([#12670](https://github.com/facebookincubator/velox/pull/12670))
misc: Allow to set plan stats output fields ([#12677](https://github.com/facebookincubator/velox/pull/12677))
misc: Improve metrics event logging ([#12676](https://github.com/facebookincubator/velox/pull/12676))
refactor: Clean up some error messages and file structure ([#12657](https://github.com/facebookincubator/velox/pull/12657))
fix:velox map_top_n_keys - values are unnecessarily orderable ([#12655](https://github.com/facebookincubator/velox/pull/12655))
Remove unused private from velox/dwio/common/DirectBufferedInput.h +12 ([#12687](https://github.com/facebookincubator/velox/pull/12687))
Remove unused exception parameter from velox/common/memory/SharedArbitrator.cpp ([#12688](https://github.com/facebookincubator/velox/pull/12688))
misc: Throw unsupported error for reduce function limitation ([#12650](https://github.com/facebookincubator/velox/pull/12650))
Add Parse Presto Data Size in Velox ([#12423](https://github.com/facebookincubator/velox/pull/12423))
fix(json): Compute capacity correctly when casting to complex jsons ([#12686](https://github.com/facebookincubator/velox/pull/12686))
feat: Support prefetch in index lookup join ([#12611](https://github.com/facebookincubator/velox/pull/12611))
misc(fuzzer): Log input test cases in expression fuzzer ([#12689](https://github.com/facebookincubator/velox/pull/12689))
std::vector constructor requires type to be complete ([#12640](https://github.com/facebookincubator/velox/pull/12640))
fix: Use memory pool to track raw_vector in selective column readers ([#12685](https://github.com/facebookincubator/velox/pull/12685))
misc: Remove duplicate index lookup runtime stats ([#12696](https://github.com/facebookincubator/velox/pull/12696))
feat: Allow user to customize the get prefered size for memory efficiency ([#12695](https://github.com/facebookincubator/velox/pull/12695))
fix(fuzzer): Wrap like SQL conversion ([#12610](https://github.com/facebookincubator/velox/pull/12610))
fix: The new LLVM library for time zones is slow with transitions after 2037 ([#12690](https://github.com/facebookincubator/velox/pull/12690))
fix: Fix the memory pool test build ([#12709](https://github.com/facebookincubator/velox/pull/12709))
Back out "feat: Combine low selectivity vectors generated by the hash join filter" ([#12711](https://github.com/facebookincubator/velox/pull/12711))
misc: Throw generic spill error in validateSpillBytesSize ([#12665](https://github.com/facebookincubator/velox/pull/12665))
feat: Make kRequestDataSizesMaxWait in ExchangeClient configurable through session props ([#12682](https://github.com/facebookincubator/velox/pull/12682))
misc(dwio): Fix error message in DWIO_ENSURE_NE/DWIO_ENSURE_EQ ([#12648](https://github.com/facebookincubator/velox/pull/12648))
feat(json): Expand support for non alphanumeric characters after dot in json path ([#12698](https://github.com/facebookincubator/velox/pull/12698))
refactor: Move json_extract_scalar back to simple function ([#12608](https://github.com/facebookincubator/velox/pull/12608))
feat(fuzzer): Add comman eval execution logging ([#12706](https://github.com/facebookincubator/velox/pull/12706))
fix: MergeJoin fails when adding nulls for a miss to an input column that is constant ([#12713](https://github.com/facebookincubator/velox/pull/12713))
fix: Duplicate operator<< of chrono types in date library ([#12723](https://github.com/facebookincubator/velox/pull/12723))
fix(function): Remove unused input types in Spark central moments functions ([#12645](https://github.com/facebookincubator/velox/pull/12645))
fix(build): Arrow bundled dependency build may fail on macOS ([#12658](https://github.com/facebookincubator/velox/pull/12658))
docs: Update build metrics report ([#12668](https://github.com/facebookincubator/velox/pull/12668))
Remove unused exception parameter from velox/common/caching/tests/AsyncDataCacheTest.cpp
feat(re2): Upgrade bundled RE2 to 2024-07-02 ([#12678](https://github.com/facebookincubator/velox/pull/12678))
feat: Register Spark array_min/max functions with orderable types ([#12576](https://github.com/facebookincubator/velox/pull/12576))
misc(memory) - Enable preferred memory size override for MemoryPool ([#12712](https://github.com/facebookincubator/velox/pull/12712))
feat: Add encodedVectorCopy ([#12588](https://github.com/facebookincubator/velox/pull/12588))
feat: Implement parent/child BingTile functions ([#12708](https://github.com/facebookincubator/velox/pull/12708))
fix: Fix the async direct coalesce load memory leak in case of early task failure in pyspark ([#12729](https://github.com/facebookincubator/velox/pull/12729))
docs(website): Update VeloxCon banner on website ([#12742](https://github.com/facebookincubator/velox/pull/12742))
Round when converting from json double to integral types ([#12741](https://github.com/facebookincubator/velox/pull/12741))
refactor: velox/functions/prestosql/aggregates/BitwiseAggregates.cpp: suppress llvm-19-exposed -Wmissing-template-arg-list-after-template-kw warning ([#12744](https://github.com/facebookincubator/velox/pull/12744))
fix: Improve global arbitration to have fairer abort ([#12624](https://github.com/facebookincubator/velox/pull/12624))
refactor: velox/exec/tests/utils/PlanBuilder.cpp: fix llvm-19-exposed -Wunused-but-set-variable warnings ([#12743](https://github.com/facebookincubator/velox/pull/12743))
misc: Remove unused gflags dependency ([#12720](https://github.com/facebookincubator/velox/pull/12720))
feat: Add Geometry Presto type ([#12274](https://github.com/facebookincubator/velox/pull/12274))
fix: Add Parquet reserved keywords ([#12625](https://github.com/facebookincubator/velox/pull/12625))
feat: Implement split_to_multimap ([#12727](https://github.com/facebookincubator/velox/pull/12727))
build(ci): Enable CCache for Manylinux builds ([#12710](https://github.com/facebookincubator/velox/pull/12710))
misc: Remove deprecated OutputBufferManager initialization functions ([#12660](https://github.com/facebookincubator/velox/pull/12660))
refactor: Make tests composable ([#12758](https://github.com/facebookincubator/velox/pull/12758))
fix: Add sanity checks to PrestoBatchVectorSerializer::serialize ([#12754](https://github.com/facebookincubator/velox/pull/12754))
fix: Ensure proper lifetime of stats in DirectCoalescedLoad ([#12756](https://github.com/facebookincubator/velox/pull/12756))
feat: Custom fuzzer input generator for phone number inputs ([#12724](https://github.com/facebookincubator/velox/pull/12724))
feat(s3): Add log location configuration ([#12534](https://github.com/facebookincubator/velox/pull/12534))
misc: Add fine-grained stats for index connector stats tracking ([#12759](https://github.com/facebookincubator/velox/pull/12759))
feat: Refactor TaskTraceReader ([#12669](https://github.com/facebookincubator/velox/pull/12669))
Remove unused private from velox/common/base/tests/PointersTest.cpp +6
misc: Avoid data copy for duplicate result and simplify the impementation ([#12765](https://github.com/facebookincubator/velox/pull/12765))
misc: Fix a comment for kUnsupportedInputUncatchable error code ([#12768](https://github.com/facebookincubator/velox/pull/12768))
fix: Fix partition filters with timestamp value ([#12368](https://github.com/facebookincubator/velox/pull/12368))
feat: Add a method to specify a custom `InsertTableHandle` in TableWriterBuilder ([#12259](https://github.com/facebookincubator/velox/pull/12259))
fix: Fix NaN values in Spark collect_set aggregate function ([#12335](https://github.com/facebookincubator/velox/pull/12335))
build: GCC13 compilation failures in Experimental CI ([#12517](https://github.com/facebookincubator/velox/pull/12517))
feat: Impl split_to_multimap Null Args ([#12779](https://github.com/facebookincubator/velox/pull/12779))
fix: Fix flaky innerJoinWithTableScan ([#12787](https://github.com/facebookincubator/velox/pull/12787))
fix: Expand the range of supported years ([#12694](https://github.com/facebookincubator/velox/pull/12694))
fix: Move only one region to prefetch if the regions overlap ([#12776](https://github.com/facebookincubator/velox/pull/12776))
fix: Untangle some strange code in `PlanBuilder::finalAggregation()` ([#12466](https://github.com/facebookincubator/velox/pull/12466))
fix(parquet): SkippedStrides runtime stats ([#12777](https://github.com/facebookincubator/velox/pull/12777))
fix(fuzzer): Make less ambiguous function not execution by adding to unary operator map ([#12786](https://github.com/facebookincubator/velox/pull/12786))
feat: Add file format setting for LocalRunnerTestBase ([#12783](https://github.com/facebookincubator/velox/pull/12783))
feat: Add Merge TDigest for Velox ([#12615](https://github.com/facebookincubator/velox/pull/12615))
feat: Add API DataSource::cancel() to allow implementation run cleanup logic while TableScan::close() is called ([#12701](https://github.com/facebookincubator/velox/pull/12701))
feat: Optimize deserialize UnsafeRows to RowVector ([#11936](https://github.com/facebookincubator/velox/pull/11936))
refactor: Rename Geospatial* to BingTile* ([#12778](https://github.com/facebookincubator/velox/pull/12778))
feat: Full MAP_CONCAT support in Prism connector ([#12649](https://github.com/facebookincubator/velox/pull/12649))
fix: Change sanity check in TDigest::mergeDeserialized() ([#12785](https://github.com/facebookincubator/velox/pull/12785))
fix: PrestoSerde should be using chain length during serialization ([#12784](https://github.com/facebookincubator/velox/pull/12784))
fix(build): Update deprecated declarations in adapters Dockerfile ([#12789](https://github.com/facebookincubator/velox/pull/12789))
feat: Custom input generator for fuzzing fbCanonicalizePhoneNumber ([#12769](https://github.com/facebookincubator/velox/pull/12769))
fix(function): Fix Spark `json_object_keys` function to return NULL for invalid json ([#12679](https://github.com/facebookincubator/velox/pull/12679))
fix: Deserializing nulls in Row type in empty Presto Page causes UBSan error ([#12798](https://github.com/facebookincubator/velox/pull/12798))
build: Improve breeze support for RTC ([#12792](https://github.com/facebookincubator/velox/pull/12792))
feat: Add processedStrides and processedSplits runtime statistics ([#12647](https://github.com/facebookincubator/velox/pull/12647))
fix: Fix the bug related to sorting of boolean in array_sort_desc ([#12770](https://github.com/facebookincubator/velox/pull/12770))
misc: Decrease row count for presto serializer large buffer test ([#12809](https://github.com/facebookincubator/velox/pull/12809))
refactor(parquet): Use velox parquet reader in StatisticsTest ([#12192](https://github.com/facebookincubator/velox/pull/12192))
feat: Add ip_prefix_subnets function for ipprefix ([#12801](https://github.com/facebookincubator/velox/pull/12801))
refactor: Remove unused bufferEnd_ field from parquet decoders ([#12736](https://github.com/facebookincubator/velox/pull/12736))
refactor: Add logs to arbitration participant reclaim result ([#12797](https://github.com/facebookincubator/velox/pull/12797))
fix(json): Special case certain unicode characters to escape when casting from json ([#12806](https://github.com/facebookincubator/velox/pull/12806))
test: Replace GTEST_SKIP with DISABLED for cases that are expected or we don't have intention to fix ([#12818](https://github.com/facebookincubator/velox/pull/12818))
fix(tpch): Concurrency issues in dbgen ([#12819](https://github.com/facebookincubator/velox/pull/12819))
fix: Prevent null-deref in complex vector during estimateFlatSize and prepareForReuse ([#12767](https://github.com/facebookincubator/velox/pull/12767))
feat(parser): Support INTERVAL DAYS ([#12826](https://github.com/facebookincubator/velox/pull/12826))
feat(python): Generate multiple files from TPC-H dbgen ([#12760](https://github.com/facebookincubator/velox/pull/12760))
feat: Add is_private_ip to check if ipaddress is private ([#12807](https://github.com/facebookincubator/velox/pull/12807))
fix: Fully qualify which operator we are using ([#12824](https://github.com/facebookincubator/velox/pull/12824))
feat: Implement map_top_n_values function ([#12822](https://github.com/facebookincubator/velox/pull/12822))
refactor(parquet): Remove arrow parquet reader in PageIndexTest ([#12802](https://github.com/facebookincubator/velox/pull/12802))
refactor(iceberg): Simplify positional delete read ([#12590](https://github.com/facebookincubator/velox/pull/12590))
fix: Fix incorrect result from casting double to decimal ([#12600](https://github.com/facebookincubator/velox/pull/12600))
feat(UUID): Add between function ([#12634](https://github.com/facebookincubator/velox/pull/12634))
build: Clang build fix ([#12791](https://github.com/facebookincubator/velox/pull/12791))
build: Add missing include for uint64_t type ([#12790](https://github.com/facebookincubator/velox/pull/12790))
feat: Add memory order to breeze atomics ([#12793](https://github.com/facebookincubator/velox/pull/12793))
feat(orc): Adding StripeFooter Wrapper to for accessing orc and dwrf StripeFooter ([#12737](https://github.com/facebookincubator/velox/pull/12737))
feat(orc): Adding ORC functionality into StripeStream Info ([#12738](https://github.com/facebookincubator/velox/pull/12738))
feat(orc): Adding missing support for reading orc files ([#12739](https://github.com/facebookincubator/velox/pull/12739))
refactor(parquet): Use velox parquet reader in FileDeserializeTest ([#12799](https://github.com/facebookincubator/velox/pull/12799))
fix(json): Fix casting of NaN's in json ([#12825](https://github.com/facebookincubator/velox/pull/12825))
feat: Add decimal support in histogram aggregate ([#12811](https://github.com/facebookincubator/velox/pull/12811))
feat(parser): Support INTERVAL YEAR/MONTH ([#12828](https://github.com/facebookincubator/velox/pull/12828))
docs(blog): Add Velox Primer Part 2 ([#12831](https://github.com/facebookincubator/velox/pull/12831))
fix(fuzzer): Update ConcatTypedExpr SQL generation to use ROW instead of CONCAT ([#12832](https://github.com/facebookincubator/velox/pull/12832))
misc: Add 1 to Type::hashKind() to reduce collisions ([#12834](https://github.com/facebookincubator/velox/pull/12834))
feat: Expose memory pool alloc tracking to external API ([#12757](https://github.com/facebookincubator/velox/pull/12757))
feat: Implement BingTile(quadkey) function ([#12821](https://github.com/facebookincubator/velox/pull/12821))
fix(json): Improve json extract performance for large jsons ([#12796](https://github.com/facebookincubator/velox/pull/12796))
fix(function): Support Spark legacy behavior for central moments functions ([#12566](https://github.com/facebookincubator/velox/pull/12566))
fix(json): Ensure json_array_get always canonicalizes the output ([#12814](https://github.com/facebookincubator/velox/pull/12814))
fix: Print CMAKE_CXX_FLAGS at the final stage in CMakeLists.txt ([#12728](https://github.com/facebookincubator/velox/pull/12728))
build: Remove redundant `CACHE` for `FetchContent` ([#12751](https://github.com/facebookincubator/velox/pull/12751))
feat: Add function prefix flags in TraceReplayRunner ([#12838](https://github.com/facebookincubator/velox/pull/12838))
feat: Implement array_max_by and array_min_by in Velox ([#12652](https://github.com/facebookincubator/velox/pull/12652))
feat(s3): Add config to support registering custom AWSCredentialsProvider ([#12774](https://github.com/facebookincubator/velox/pull/12774))
refactor: Allow specializing projectNode description ([#12833](https://github.com/facebookincubator/velox/pull/12833))
fix(fuzzer): Wrap is null call to remove query ambiguity ([#12820](https://github.com/facebookincubator/velox/pull/12820))
feat: Index join lookup constant filter condition support for ss connector ([#12839](https://github.com/facebookincubator/velox/pull/12839))
misc: Add stats to help scale writer scheduling delay debug at worker side ([#12847](https://github.com/facebookincubator/velox/pull/12847))
fix: Re-use output across probe rows for NestedLoopJoin ([#12519](https://github.com/facebookincubator/velox/pull/12519))
refactor: Cleanup toConstantSql ([#12855](https://github.com/facebookincubator/velox/pull/12855))
refactor: Harden is-null check in toConstantSql ([#12856](https://github.com/facebookincubator/velox/pull/12856))
refactor: Capitalize types consistently in toTypeSql ([#12858](https://github.com/facebookincubator/velox/pull/12858))
Add unknown to map union sum ([#12850](https://github.com/facebookincubator/velox/pull/12850))
fix: Implicit conversions to narrow types in wave ([#12844](https://github.com/facebookincubator/velox/pull/12844))
feat(orc): Adding reader tests for orc files ([#12740](https://github.com/facebookincubator/velox/pull/12740))
refactor: Do not use toString() API in toConstantSql ([#12865](https://github.com/facebookincubator/velox/pull/12865))
refactor: Do not require shared_ptr in toConstantSql ([#12866](https://github.com/facebookincubator/velox/pull/12866))
feat(python): Add hash join API ([#12835](https://github.com/facebookincubator/velox/pull/12835))
feat(python): Add utility script to generate TPC-H data ([#12829](https://github.com/facebookincubator/velox/pull/12829))
refactor: Remove unused SelectiveColumnReaderFactory ([#12842](https://github.com/facebookincubator/velox/pull/12842))
docs: Add AWSCredentialsProvider ([#12857](https://github.com/facebookincubator/velox/pull/12857))
fix: Fix potential hanging issue in case of a right/full join and empty build table ([#12862](https://github.com/facebookincubator/velox/pull/12862))
fix: DeduplicatedMapColumnReader need to duduplicate on lengths in addition to offsets ([#12860](https://github.com/facebookincubator/velox/pull/12860))
refactor: Reduce map lookups ([#12868](https://github.com/facebookincubator/velox/pull/12868))
Add UDF: inverse_gamma_cdf ([#12867](https://github.com/facebookincubator/velox/pull/12867))
feat: Makes sequence storage connector configurable ([#12873](https://github.com/facebookincubator/velox/pull/12873))
feat(python): TPC-H query runner ([#12880](https://github.com/facebookincubator/velox/pull/12880))
fix(fuzzer): Add support SQL generation support for type DATE ([#12888](https://github.com/facebookincubator/velox/pull/12888))
test: PrestoSerializerTest.basicLarge timeout ([#12883](https://github.com/facebookincubator/velox/pull/12883))
fix(UUID): Incorrect results when cast between UUID and VARBINARY ([#12633](https://github.com/facebookincubator/velox/pull/12633))
build: Remove usage of cstdint in CUDA code when using RTC ([#12823](https://github.com/facebookincubator/velox/pull/12823))
feat: Implement bing_tile_quadkey function ([#12863](https://github.com/facebookincubator/velox/pull/12863))
feat: Add PlanNodeVisitor interface ([#12885](https://github.com/facebookincubator/velox/pull/12885))
feat: Add Spark date_trunc function ([#11340](https://github.com/facebookincubator/velox/pull/11340))
Pass key_indices and value indices to RowReaderOptions ([#12900](https://github.com/facebookincubator/velox/pull/12900))
test: Add IntermediateTypeTransforms for supporting intermediate types in PrestoQueryRunner ([#12746](https://github.com/facebookincubator/velox/pull/12746))
fix: Attribute cardinality change in FilterProject to Filter, not Project node ([#12895](https://github.com/facebookincubator/velox/pull/12895))
feat: Implement murmur3_x64_128 hash function in velox ([#12889](https://github.com/facebookincubator/velox/pull/12889))
Fix CQS signal. Id] 73702785 -- readability-redundant-string-init in fbcode/velox/expression/benchmarks ([#12905](https://github.com/facebookincubator/velox/pull/12905))
Fix CQS signal. Id] 72858836 -- readability-redundant-string-init in fbcode/velox/duckdb/conversion ([#12906](https://github.com/facebookincubator/velox/pull/12906))
fix: Impl ends_with Null Args ([#12898](https://github.com/facebookincubator/velox/pull/12898))
fix: Impl split_to_map Null Args ([#12897](https://github.com/facebookincubator/velox/pull/12897))
refactor: Update Base64 as non-throwing API ([#11149](https://github.com/facebookincubator/velox/pull/11149))
misc: Add index lookup operator async blocked time metrics ([#12901](https://github.com/facebookincubator/velox/pull/12901))
feat(python): Add file schema inspection API ([#12884](https://github.com/facebookincubator/velox/pull/12884))
feat: Streamline toConstantSql for BIGINT and VARCHAR ([#12904](https://github.com/facebookincubator/velox/pull/12904))
feat: Estimate input bytes for Filter node using cardinality change ([#12907](https://github.com/facebookincubator/velox/pull/12907))
feat: Report input/output batches for Filter node ([#12909](https://github.com/facebookincubator/velox/pull/12909))
feat: Add TopNRowNumberFuzzer runs to GitHub workflows ([#12662](https://github.com/facebookincubator/velox/pull/12662))
fix: Account for sorted array structs memory in aggregation unspill ([#12899](https://github.com/facebookincubator/velox/pull/12899))
Fix inverseGammaCdf function ([#12914](https://github.com/facebookincubator/velox/pull/12914))
refactor(fuzzer): Rename ToSQLUtil to PrestoSql ([#12890](https://github.com/facebookincubator/velox/pull/12890))
fix: Deadlock/Stuckness during exchange ([#12908](https://github.com/facebookincubator/velox/pull/12908))
fix(function): Fix Spark json_object_keys function to return NULL for invalid json ([#12877](https://github.com/facebookincubator/velox/pull/12877))
misc: Add task batch processing time under sequential execution mode ([#12912](https://github.com/facebookincubator/velox/pull/12912))
update skiplist issue
Fix CQS signal. Id] 8047553 -- readability-redundant-string-init in fbcode/velox/core ([#12929](https://github.com/facebookincubator/velox/pull/12929))
Fix CQS signal. Id] 81507216 -- readability-redundant-string-init in fbcode/velox/parse ([#12930](https://github.com/facebookincubator/velox/pull/12930))
fix: Fix the crash cause by NestedLoopJoinBuild sum overflow ([#12923](https://github.com/facebookincubator/velox/pull/12923))
feat(aggregate): Support type DECIMAL for function map_union_sum ([#12921](https://github.com/facebookincubator/velox/pull/12921))
fix: ExpressionFuzzer never selects function signatures that return a type variable ([#12933](https://github.com/facebookincubator/velox/pull/12933))
Fix CQS signal. Id] 83559169 -- readability-redundant-string-init in fbcode/velox/functions/lib ([#12940](https://github.com/facebookincubator/velox/pull/12940))
refactor: Update ReferenceQueryRunners to use PlanNodeVisitor to generate SQL ([#12887](https://github.com/facebookincubator/velox/pull/12887))
build(cmake): Remove Profiler from core Velox library ([#12913](https://github.com/facebookincubator/velox/pull/12913))
feat: Wave Hash Join ([#12936](https://github.com/facebookincubator/velox/pull/12936))
build(pyvelox)!: Build System Modernization and Package Architecture Improvements ([#12747](https://github.com/facebookincubator/velox/pull/12747))
fix(util): Update default case for SQL type generation ([#12947](https://github.com/facebookincubator/velox/pull/12947))
feat: Add percentage based min reclaim for shared arbitrator ([#12945](https://github.com/facebookincubator/velox/pull/12945))
fix(velox): Resolve runtime error when using ARRAY<UNKNOWN> as checksum argument ([#12946](https://github.com/facebookincubator/velox/pull/12946))
fix: Offsets buffer not enough when the repeated column is empty list among all rows ([#12845](https://github.com/facebookincubator/velox/pull/12845))
fix: Add missing return in checksum function ([#12956](https://github.com/facebookincubator/velox/pull/12956))
Re-sync with internal repository ([#12961](https://github.com/facebookincubator/velox/pull/12961))
fix: Fix Expr to allow peeling on inputs of deterministic functions in non-deterministic expression ([#12960](https://github.com/facebookincubator/velox/pull/12960))
fix(tableScan): Convert remainingFilterSubfields to member variable ([#12948](https://github.com/facebookincubator/velox/pull/12948))
misc: Add TypeUtil to CMakeLists ([#12966](https://github.com/facebookincubator/velox/pull/12966))
fix(util): Generate SQL for function subscript
test(util): Create unit tests for function toTypeSql of PrestoSql ([#12920](https://github.com/facebookincubator/velox/pull/12920))
feat: Create FileNameGenerator to abstract naming strategy from HiveDataSink ([#12925](https://github.com/facebookincubator/velox/pull/12925))
build(pyvelox): Moving some code from velox/python to python/ ([#12967](https://github.com/facebookincubator/velox/pull/12967))
test: Add VELOX_DCHECK asserts for BingTile validity ([#12962](https://github.com/facebookincubator/velox/pull/12962))
feat: Create a StringVectorBuffer class for managing a Flatvector buffer that can grow dynamically ([#12944](https://github.com/facebookincubator/velox/pull/12944))
Fix deprecated unittest asserts
fix: Back out "[velox][PR] feat: Optimize deserialize UnsafeRows to RowVector" ([#12978](https://github.com/facebookincubator/velox/pull/12978))
refactor: Refactor/Cleanup include headers file ([#12977](https://github.com/facebookincubator/velox/pull/12977))
feat(python): Plan serialization/deserialization ([#12931](https://github.com/facebookincubator/velox/pull/12931))
misc(fuzzer): Add custom data generation for JSON path ([#12312](https://github.com/facebookincubator/velox/pull/12312))
feat(fuzzer): Add input generator for json_extract in expression fuzzer ([#12367](https://github.com/facebookincubator/velox/pull/12367))
add SequenceStorageIndexSourceTest test case for partitions
Fix CQS signal. Id] 93054880 -- readability-redundant-string-init in fbcode/velox/expression/fuzzer ([#12997](https://github.com/facebookincubator/velox/pull/12997))
build(duckdb): Ensure DDB has cmake version compatibility set to 3.5 ([#12987](https://github.com/facebookincubator/velox/pull/12987))
docs: Update substr usage in Velox developer doc vector section ([#12979](https://github.com/facebookincubator/velox/pull/12979))
build(docs): Fix documentation job ([#13001](https://github.com/facebookincubator/velox/pull/13001))
feat(tpch): Add flag velox_tpch_text_pool_size_mb for text pool size ([#12499](https://github.com/facebookincubator/velox/pull/12499))
refactor: Handle in-output errors for string functions (Part 1) ([#11854](https://github.com/facebookincubator/velox/pull/11854))
Remove public from VELOX_DEFINE_CLASS_NAME ([#13000](https://github.com/facebookincubator/velox/pull/13000))
feat: Add unified compression API and lz4_frame/lz4_raw/lz4_hadoop codec ([#7589](https://github.com/facebookincubator/velox/pull/7589))
misc: Add task split count metrics ([#12943](https://github.com/facebookincubator/velox/pull/12943))
Fix CQS signal. Id] 96803229 -- readability-redundant-string-init in fbcode/velox/functions/prestosql/window ([#13009](https://github.com/facebookincubator/velox/pull/13009))
Fix CQS signal. Id] 96009368 -- readability-redundant-string-init in fbcode/velox/expression
feat: Add Aggregate::addRawClusteredInput and streaming_aggregation_eager_flush ([#12975](https://github.com/facebookincubator/velox/pull/12975))
fix: Optimize RowType::hashKind ([#12999](https://github.com/facebookincubator/velox/pull/12999))
Fix crashes caused by ClassificationAggregation ([#12949](https://github.com/facebookincubator/velox/pull/12949))
fix(expression): Remove unused compare function in ITypedExpr ([#12952](https://github.com/facebookincubator/velox/pull/12952))
feat: Create an identity ParitionFunction to allow bucket values to be computed as part of the query ([#12974](https://github.com/facebookincubator/velox/pull/12974))
feat: Add inverse_binomial_cdf ([#12983](https://github.com/facebookincubator/velox/pull/12983))
feat: Add inverse_poisson_cdf ([#12984](https://github.com/facebookincubator/velox/pull/12984))
misc: Add clarifying comments to ExchangeClient c-tor ([#12942](https://github.com/facebookincubator/velox/pull/12942))
docs: Format `array_union` function in array.rst ([#12991](https://github.com/facebookincubator/velox/pull/12991))
feat: Support date type for Spark from_json function ([#12848](https://github.com/facebookincubator/velox/pull/12848))
build: Fix usage of declareNamed ([#12870](https://github.com/facebookincubator/velox/pull/12870))
build: Add support for Arm Neoverse V2 CPU ([#13006](https://github.com/facebookincubator/velox/pull/13006))
feat: Use CUDA 12.8 atomics in breeze when available ([#12932](https://github.com/facebookincubator/velox/pull/12932))
feat: Add -num_random_ints flag to CudaTest.addRandom ([#12875](https://github.com/facebookincubator/velox/pull/12875))
feat: Add Spark array_prepend function ([#12730](https://github.com/facebookincubator/velox/pull/12730))
fix(parquet): Fix INT64 timestamp precision conversion ([#12953](https://github.com/facebookincubator/velox/pull/12953))
refactor: Merge join code cleanup ([#13015](https://github.com/facebookincubator/velox/pull/13015))
fix: Add batched deserialization in Exchange for CompactRowr in exchange ([#13017](https://github.com/facebookincubator/velox/pull/13017))
refactor: Clean up RowSerializer.h and add RowSerializer.cpp ([#13018](https://github.com/facebookincubator/velox/pull/13018))
Fix CQS signal. Id] 99599938 -- readability-redundant-string-init in fbcode/velox/exec/fuzzer ([#13027](https://github.com/facebookincubator/velox/pull/13027))
Fix CQS signal. Id] 98192128 -- readability-redundant-string-init in fbcode/velox/dwio/parquet/reader ([#13026](https://github.com/facebookincubator/velox/pull/13026))
ss index source test
feat(python): Add index lookup join ([#13014](https://github.com/facebookincubator/velox/pull/13014))
docs(s3fs): Add metrics ([#13012](https://github.com/facebookincubator/velox/pull/13012))
build(pyvelox): Build pyvelox wheels for 3.10 through 3.13 ([#13008](https://github.com/facebookincubator/velox/pull/13008))
fix: Limit scan threads to member mask ([#12869](https://github.com/facebookincubator/velox/pull/12869))
misc: Fix a typo in CMakeLists.txt ([#12675](https://github.com/facebookincubator/velox/pull/12675))
feat(cudf): Add cuDF based OrderBy operator ([#12735](https://github.com/facebookincubator/velox/pull/12735))
fix: Fix merge source promise setting out of lock ([#13021](https://github.com/facebookincubator/velox/pull/13021))
fix: Fix merge join issues ([#13022](https://github.com/facebookincubator/velox/pull/13022))
build: Remove usage of C++ placement new operator in device code ([#12874](https://github.com/facebookincubator/velox/pull/12874))
misc: Refactor table scan and add batch size metric for monitoring ([#13020](https://github.com/facebookincubator/velox/pull/13020))
refactor(fuzzer): Unlink JoinFuzzer from GTest ([#12399](https://github.com/facebookincubator/velox/pull/12399))
docs(pyvelox): Adding pyvelox sphinx documentation ([#13010](https://github.com/facebookincubator/velox/pull/13010))
Fix nulls reading in byte column with fast decoder
refactor: Refactor ArrayRemoveNullFunction and register it for Spark array_compact function ([#12700](https://github.com/facebookincubator/velox/pull/12700))
misc: Avoid acquiring lock in QueryCtx::checkUnderArbitration if unnecessary ([#13013](https://github.com/facebookincubator/velox/pull/13013))
Put Prestissimo OOM top users at the top ([#12951](https://github.com/facebookincubator/velox/pull/12951))
Fix CQS signal. Id] 13636428 -- readability-redundant-string-init in fbcode/velox/common/base
fix: Support small value type in dictionary encoding ([#13038](https://github.com/facebookincubator/velox/pull/13038))
fix: Fix initialization of Spark date_trunc function ([#12922](https://github.com/facebookincubator/velox/pull/12922))
feat: Add tdigest to fuzzer. Unblock value_at_quantile ([#12964](https://github.com/facebookincubator/velox/pull/12964))
Fix CQS signal. Id] 16452762 -- readability-redundant-string-init in fbcode/velox/common/fuzzer
refactor: Generalize batched deserialization in RowSerializer ([#13032](https://github.com/facebookincubator/velox/pull/13032))
refactor(tpch): Handle in-output errors for string functions (Part 2) ([#12064](https://github.com/facebookincubator/velox/pull/12064))
Fix `-Wambiguous-reversed-operator` in velox/dwio/dwrf/common/Common.h + 1
fix: Inverse_poisson_cdf doesn't handle case where results exceed bounds of int32_t ([#13048](https://github.com/facebookincubator/velox/pull/13048))
misc: Exclude inverse_poisson_cdf from the ExpressionFuzzer ([#13049](https://github.com/facebookincubator/velox/pull/13049))
Remove unnecessaryTDigest.h sum check ([#13050](https://github.com/facebookincubator/velox/pull/13050))
feat: Add mathematical operators for IntervalYearMonth type ([#11612](https://github.com/facebookincubator/velox/pull/11612))
Populate fileSize in sst writer (1/2)
fix: MergeJoin is buggy with RIGHT OUTER JOINs where NULLs are present in the keys ([#13039](https://github.com/facebookincubator/velox/pull/13039))
feat(parquet): Add config for writer page size and batch size ([#12755](https://github.com/facebookincubator/velox/pull/12755))
fix(fuzzer): Solved the StringWriter overflows StringView len issue ([#13040](https://github.com/facebookincubator/velox/pull/13040))
Fix `-Wambiguous-reversed-operator` in velox/core/Expressions.h + 1
test: Improve BingTile(quadkey) tests ([#12864](https://github.com/facebookincubator/velox/pull/12864))

assignUser and others added 30 commits March 19, 2025 19:18
Summary:
Integrate new shared data and overall refactor.

Pull Request resolved: facebookincubator#12668

Reviewed By: kKPulla

Differential Revision: D71413018

Pulled By: kagamiori

fbshipit-source-id: 9627bb500876a607d4cde6b4d9928f1a4d2236c6
…ncDataCacheTest.cpp

Summary:
`-Wunused-exception-parameter` has identified an unused exception parameter. This diff removes it.

This:
```
try {
    ...
} catch (exception& e) {
    // no use of e
}
```
should instead be written as
```
} catch (exception&) {
```

If the code compiles, this is safe to land.

Reviewed By: meyering

Differential Revision: D71503173

fbshipit-source-id: 37beb04da69370b72cde0a227a63d5863de497d4
Summary:
We need Abseil 20240116 or later for RE2 2024-07-02. Because RE2 uses `ABSL_LOG(DFATAL)` and it needs this commit:
abseil/abseil-cpp@70172ad

Fixes facebookincubator#12674

Pull Request resolved: facebookincubator#12678

Reviewed By: bikramSingh91

Differential Revision: D71531109

Pulled By: xiaoxmeng

fbshipit-source-id: e9a9a93df802c5752d1f8aaf21ef6eafa0b1365a
…acebookincubator#12712)

Summary:
Pull Request resolved: facebookincubator#12712

Enable overrides for getPreferredSize behavior on a per memory pool basis

Reviewed By: xiaoxmeng

Differential Revision: D71427371

fbshipit-source-id: ce4e1c9682de7d00af97ca7eb9a6323c4c8ffd02
Summary:
Pull Request resolved: facebookincubator#12588

Implement `encodedVectorCopy`, a generic vector copy utility that preserves
encodings for memory saving purpose.

## Encoding Preservation

There are mainly 2 use cases for this new function.  One is to merge multiple
encoded vectors (`source`s) into one large encoded vector (`target`); the other
is to update specific rows (`source`) in a large vector (`target`), while
keeping the encodings.  Both use cases requires us to keep the encoding on
`target`, so it is decided as the behavior of this function.

There are some exceptions to this rule:

- We merge multiple adjacent layers of dictionary and constant wrappers into
  one.
- When the values type size in dictionary is no larger than the index type,
   we flatten the vector to save memory.
- When `target` is constant, we convert it to dictionary to allow different
  values in `source`.
- When `target` is flat ROW, MAP, or ARRAY, and `source` is constant or
  dictionary encoded, the result will be dictionary encoded, to avoid flattening
  the child vectors.  Once the target becomes dictionary, it can stay that way
  and we can keep adding new content to it while keeping the encoding, this is a
  typical use case for encoding preserved merging.

## Inner Vector Compaction

Other than encoding, we also pay special attention to avoid holding on memory
that is no longer needed.  This is especially important for the merging use
case, as the `target` gets updated, majority rows of its inner vectors will be
dereferenced and no longer used.  There are 2 cases where we need to take care
of this.

The first is for dictionary encoding, some rows in the alphabet (base/value)
vector become no longer referenced by the indices, so we should recycle them.
This is done properly that when we translate the copy ranges on dictionary
indices to the copy ranges on alphabet, we overwrite the unused rows in alphabet
using the new alphabet rows from source.  This way we efficiently reuse the
memory in alphabet without both reallocation and memory leaking.

The second case is for `ARRAY`/`MAP`, the elements/keys/values vector can have
rows that are no longer referenced from the parent.  This is a little harder to
solve than in the dictionary case, since the nested rows need to be contiguous
for one parent row (offset/size pair), which means we cannot move them around
easily.  Our approach is to allow some unused nested rows, but keep track of the
percentage of them, and once they exceed certain threshold (50% by default and configurable), we make a new copy of the nested vector and copy only the used
rows over.  This allows us to reuse the nested rows to a certain degree while
keep some bounds on the memory usage.

Reviewed By: mbasmanova

Differential Revision: D70867237

fbshipit-source-id: 0cddd37fd7188d89ea541fb89324aa9a10745415
)

Summary:
Pull Request resolved: facebookincubator#12708

These functions allow users to find the parent or children tiles of a BingTile.

Reviewed By: bikramSingh91

Differential Revision: D71413543

fbshipit-source-id: bcfa7f5a175362d22ff3542d1b3e0b922eb1ff18
…task failure in pyspark (facebookincubator#12729)

Summary:
Pull Request resolved: facebookincubator#12729

Here is the race condition can cause memory leak in race between async direct coalesce load and early task failure:
T1. file reader triggers stripe load which does coalesce prefetch
T2. coalesce prefetch kicks off and pass the cancellation check
T3. coalesce load does the memory allocations and do read from storage
T4. before table scan do on-demand load, the task fails or prefetch data is skipped by filtering and task finishes.
T5. task destruction frees the memory pool hits the memory leak check failure.
T6. if we disable the memory leak check failure (with memory leak metric reporting in Meta production), then
the buffer free will throw with bad memory pool pointer.

Verified the fix with unit test that reproduce the race condition. This is exposed by pyspark use case.

Reviewed By: tanjialiang, oerling

Differential Revision: D71529929

fbshipit-source-id: de1c545c48dc23602a79238b878020ce2f29c4f4
…2742)

Summary:
VeloxCon 2025 registration banner

Pull Request resolved: facebookincubator#12742

Reviewed By: xiaoxmeng

Differential Revision: D71580661

Pulled By: pedroerp

fbshipit-source-id: 7815d4ad835cf4b79c7ed81ce0383b9b0050bb36
…ubator#12741)

Summary:
Pull Request resolved: facebookincubator#12741

Currently when given a query like below, velox doesnt round , but rounds down which gives us different results compared to java.

```
SELECT
    "json_extract"(metrics, '$.eventTimings'),
    CAST("json_extract"(metrics, '$.eventTimings') AS ARRAY(ROW(inputdelay BIGINT))),
    ANY_MATCH(
        CAST(
            "json_extract"(metrics, '$.eventTimings') AS ARRAY(ROW(inputdelay BIGINT))
        ),
        (x) -> (x.inputDelay >= 300)
    )
FROM (
    VALUES
        '{ "eventTimings": [ { "inputDelay": 299.5999999996274} ] }'
) AS t(metrics)
```

Java returns 300 and velox 299. This PR fixes this.

Reviewed By: kevinwilfong

Differential Revision: D71570565

fbshipit-source-id: e7493578760e8797c7268cd9864954e0db3e3bb7
… suppress llvm-19-exposed -Wmissing-template-arg-list-after-template-kw warning (facebookincubator#12744)

Summary:
Pull Request resolved: facebookincubator#12744

This avoids the following errors:

  velox/functions/prestosql/aggregates/BitwiseAggregates.cpp:91:47: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]

Reviewed By: pedroerp, dtolnay

Differential Revision: D71579788

fbshipit-source-id: 4d9107f4086aed060ceae64bd1348cc37da402dd
…or#12624)

Summary:
Pull Request resolved: facebookincubator#12624

When a small participant requesting for a large amount of capacity, we shall take into consideration of the requesting capacity make it more prone for being killed by global arbitration. This would make global arbitration victim selection more fair.

Reviewed By: xiaoxmeng

Differential Revision: D71076051

fbshipit-source-id: 153cf40ecc06829dee20b4006305943f3fbb84af
… -Wunused-but-set-variable warnings (facebookincubator#12743)

Summary:
Pull Request resolved: facebookincubator#12743

This avoids the following errors:

  velox/exec/tests/utils/PlanBuilder.cpp:2117:14: error: variable 'constant' set but not used [-Werror,-Wunused-but-set-variable]
  velox/exec/tests/utils/PlanBuilder.cpp:2113:14: error: variable 'field' set but not used [-Werror,-Wunused-but-set-variable]

Reviewed By: pedroerp, dtolnay

Differential Revision: D71579752

fbshipit-source-id: 8ab46b889c8193984dd4d30cc9eb92c0982381ed
Summary: Pull Request resolved: facebookincubator#12720

Reviewed By: pedroerp

Differential Revision: D71557515

Pulled By: xiaoxmeng

fbshipit-source-id: a2faf0c8e76313d118719520378bb7c5a15695cd
Summary:
Summary

This PR is a split of facebookincubator#12053 to introduce Geometry Presto type in Velox. It focuses on the type declaration without serde code. As suggested, it uses Varbinary as the underlying type. Cast operators can be added later if needed, as Presto Java handles the conversion by functions `ST_GeometryFromText`(`ST_GeomFromBinary`), and `ST_AsText` (`ST_AsBinary`).

Pull Request resolved: facebookincubator#12274

Reviewed By: amitkdutta

Differential Revision: D71557550

Pulled By: xiaoxmeng

fbshipit-source-id: f824a6dcba61df0fee180edc740db124a4a48bc5
Summary:
Summary
This PR adds Parquet reserved keywords to handle Parquet files generated by the Presto Java workers.

Fixes: facebookincubator#12620.

References
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java#L154
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java#L78

Pull Request resolved: facebookincubator#12625

Reviewed By: pedroerp

Differential Revision: D71557476

Pulled By: xiaoxmeng

fbshipit-source-id: babeffad086049e9c52cfb16c93a8c2aed1cf029
Summary:
Pull Request resolved: facebookincubator#12727

Implement split_to_multimap in velox

Reviewed By: zacw7, yuandagits

Differential Revision: D71442997

fbshipit-source-id: cd60442d161ad96c70e2c67e859c48aab8dcf165
Summary:
Fixes facebookincubator#12526

Pull Request resolved: facebookincubator#12710

Reviewed By: zation99

Differential Revision: D71600558

Pulled By: xiaoxmeng

fbshipit-source-id: 37dadf9eced59dcb7fe34105c6e28f82c9e1431c
…acebookincubator#12660)

Summary:
Prestissimo code using these APIs was advanced in prestodb/presto#24730

Pull Request resolved: facebookincubator#12660

Reviewed By: Yuhta

Differential Revision: D71341214

Pulled By: kagamiori

fbshipit-source-id: 278c1cbc797a75a030e5611f3ab8dfcc290eb020
Summary:
Inherits the gtest test virtually to llow combining test base classes. Allows specifying an extrermal path for LocalRunnerTestBase.

Pull Request resolved: facebookincubator#12758

Reviewed By: Yuhta

Differential Revision: D71648474

Pulled By: oerling

fbshipit-source-id: d17e318fcb6dfa7a2cd459f1a3b5e54ed5cbd6fc
…ebookincubator#12754)

Summary:
Pull Request resolved: facebookincubator#12754

Some users were incorrectly using PrestoBatchVectorSerializer::serialize which lead to
difficult to trace bugs.  Adding some simple sanity checks to guard against these.

Reviewed By: xiaoxmeng

Differential Revision: D71637356

fbshipit-source-id: 5a1f545a3b5f516e02ac52b9a0d08561769d4ef1
…incubator#12756)

Summary:
Pull Request resolved: facebookincubator#12756

There is a crash in HiveDataSource during the merging of stats because the stats may still be used by the executor in DirectBufferedInput. This means we need the stats to have lifetime exceed that of the executor inside DirectBufferedInput. The simple fix here is to pass a shared_ptr of the stats to DirectBufferedInput so that DirectCoalescedLoad owns a reference to the stats.

DwioCoalescedLoadBase was updated to also cache the stats shared_pointer since it can go on the same code path.

Reviewed By: Yuhta

Differential Revision: D71641158

fbshipit-source-id: cd114c020c88eac3b8c9921b9df41c55f4bdaa28
…incubator#12724)

Summary:
Pull Request resolved: facebookincubator#12724

Adding custom input generator for phone numbers

Reviewed By: duxiao1212

Differential Revision: D71375244

fbshipit-source-id: 5d85de7671c3e592e6fe6b98573640b7ad712351
Summary:
The S3 filesystem uses the default logging configuration which creates the file in the current directory where an executable is called from. Most deployments have a custom location to write logs.
This change allows the user to specify a path that is used as the new log prefix which results in an absolute path being used for the log file creation. The default AWS log file prefix is used.

Pull Request resolved: facebookincubator#12534

Reviewed By: zation99

Differential Revision: D71669762

Pulled By: xiaoxmeng

fbshipit-source-id: 727623cf071b10b548f87c48100cad36bfe4aa0b
…bookincubator#12759)

Summary:
Pull Request resolved: facebookincubator#12759

Add runtime stats to track the wall/cpu times spent in different stages during index lookup processing and integrate
with velox operator stats
Add metrics to monitor index lookup and table scan batch processing time
distribution to ease monitoring an
online service

Reviewed By: wenqiwooo

Differential Revision: D71599185

fbshipit-source-id: 7da229b3ad0726bf6ec7a2f5d0a0dc820f0c5f1d
Summary:
Refactor `TaskTraceReader` and consolidate task metadata parsing,
We need to extract the node name, plan, configs, etc. from it.

Pull Request resolved: facebookincubator#12669

Reviewed By: gggrace14

Differential Revision: D71680632

Pulled By: xiaoxmeng

fbshipit-source-id: 0cbe030e12d767931ad39ef2b9df2996c34d11e4
Summary:
`-Wunused-private-field` has identified an unused private field. This diff removes it.

If the code compiles, this is safe to land.

Reviewed By: meyering

Differential Revision: D71512007

fbshipit-source-id: 4db078641e7de1e132f32a3dfaa231026aab83e8
…tion (facebookincubator#12765)

Summary:
Pull Request resolved: facebookincubator#12765

We don't need to copy data for duplicate result for inputs with the same key. Extend row serde to support serialize string view.
This optimize cpu/memory and simplify the implementation
Also cover the key-dedup in index join test.

Reviewed By: wenqiwooo

Differential Revision: D71670857

fbshipit-source-id: d3257ef201fe072e8fba0f2931cd3a6101b8210a
…bookincubator#12768)

Summary:
Pull Request resolved: facebookincubator#12768

Timestamp beyond 2037 is supported now. Fixing a comment.

Reviewed By: kgpai

Differential Revision: D71689375

fbshipit-source-id: 84eee938c543ad0f7dc497decfd46ff4408c7110
)

Summary:
When using partition filters with timestamp value, use configuration
 to control whether to interpret it as local time or UTC.

Follow-up for facebookincubator#11957

Pull Request resolved: facebookincubator#12368

Reviewed By: kKPulla

Differential Revision: D71047729

Pulled By: kagamiori

fbshipit-source-id: 471011591f5c8ff70507d49e425b9749a06b02d4
anandamideShakyan and others added 11 commits April 17, 2025 00:20
…facebookincubator#12064)

Summary:
# PR Description:

Detect and address input/output errors that can result in undefined behavior. Ensure proper error handling for I/O functions that may fail and leave variables uninitialized. Neglecting to check the status of these functions before using their outputs (e.g., memory buffers, file descriptors, etc.) can lead to undefined program behavior. This update enforces checks on commonly used I/O functions to validate their return values and prevent improper usage.

This is the second PR of the several changes for this refactoring.

Pull Request resolved: facebookincubator#12064

Reviewed By: pedroerp

Differential Revision: D73005009

Pulled By: kgpai

fbshipit-source-id: 5cce140545fb9f682b036ce86875bd2c1b1b5a06
…h + 1

Summary:
`-Wambiguous-reversed-operator` warns about ambiguous reversed operators, e.g. `a < b` and `b > a` are both valid. Such operators are disallowed in C++20. This codemod fixes the warnings.

#buildsonlynotests - If this diff compiles, it works.

 - If you approve of this diff, please use the "Accept & Ship" button :-)

Reviewed By: dmm-fb

Differential Revision: D73131836

fbshipit-source-id: 726d37acc1afd6ffd0c68da95f16eb181ca5b48d
…nds of int32_t (facebookincubator#13048)

Summary:
Pull Request resolved: facebookincubator#13048

inverse_poisson_cdf doesn't handle the case where results exceed the bounds of int32_t, so
it throws an exception (at least with sanitizers enabled) when converting the result to an
int32_t.

Presto Java handles this by bounding the result to Integer.MAX_VALUE. The change applies
the same restriction to Velox's implementation.

Reviewed By: skyelves

Differential Revision: D73143995

fbshipit-source-id: 9bb2224ae37ccae8e5bf4c8a6f3872882b51469c
…incubator#13049)

Summary:
Pull Request resolved: facebookincubator#13049

inverse_poisson_cdf has a known issue (coming from the Boost library it invokes) where it can
end up in a long running or possibly infinite loop with very large inputs.

facebookincubator#13047

Exclude it from fuzzing until this is addressed.

Reviewed By: kunigami

Differential Revision: D73144904

fbshipit-source-id: 9b6a5032cb07213bf64bf1f8281c3c35a92b1e45
Summary:
Pull Request resolved: facebookincubator#13050

Remove unnecessaryTDigest.h sum check. This is too strict and not required
Issue: https://www.internalfb.com/tasks/?t=221381362

Reviewed By: spershin

Differential Revision: D73147059

fbshipit-source-id: eca749b9528ac2116e9defebb19eea5ab33544ad
…incubator#11612)

Summary:
Add support for mathematical functions `plus`, `minus`, `multiply`, and `divide`
with `IntervalYearMonth` type. The function signatures added match that of [Presto](https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/type/IntervalYearMonthOperators.java),
accounting for the Presto function signature changes in prestodb/presto#24089.

Pull Request resolved: facebookincubator#11612

Reviewed By: pedroerp

Differential Revision: D72997442

Pulled By: kKPulla

fbshipit-source-id: 5fcf40639ed0af63e6ec198994aeaef49ba367cb
Summary:
[FileSize](https://www.internalfb.com/code/fbsource/[e45d6835b38c]/fbcode/velox/connectors/hive/HiveDataSink.cpp?lines=669) is reported by `IoStatistics` in Velox.  [IoStatistics](https://www.internalfb.com/code/fbsource/fbcode/velox/dwio/common/FileSink.cpp?lines=81) is a member of `FileSink`. The `rawBytesWritten` metric is increased per write using FileSink.

For DRWF writer and Parquet writer, it uses the [fileSink](https://www.internalfb.com/code/fbsource/[e45d6835b38c467d939c6176cec555e5b53f4a3e]/fbcode/velox/dwio/dwrf/writer/Writer.cpp?lines=157) to write out files. The IO statistics are populated correctly. For SSTable writer, we are using [ZippyDBSstFileWriter](https://www.internalfb.com/code/fbsource/[e45d6835b38c467d939c6176cec555e5b53f4a3e]/fbcode/fb_velox/sst/writer/SSTWriter.cpp?lines=41) which builds another writer using the file name. The write does not go through FileSink.

In this diff, I added a new method to get IoStatistics from FileSink. The next diff will use IoStatistics to populate file size.

Reviewed By: harsharastogi

Differential Revision: D73061995

fbshipit-source-id: cd1399f2cef1e68fb25cf38331db2a3256c86896
…t in the keys (facebookincubator#13039)

Summary:
Pull Request resolved: facebookincubator#13039

While working on another change I discovered that JoinFuzzer does not test MergeJoin with
RIGHT OUTER JOINs. Enabling it locally (I'll publish a change to enable it separately) I
discovered a bug. While trying to add a unit test for it, I discovered a few more.

They fit into two classes:
1) Skipping over rows on the right side with NULL keys. This is the correct thing to do for
INNER and LEFT OUTER JOINs but we need to output misses for these rows in RIGHT OUTER
JOINs (they can't hit given our NULL semantics).
2) Writing off the end of the output buffer trying to write out this misses. We need to make
sure the size of output_ hasn't yet reached outputBatchSize_ before writing misses to it.

This diff fixes the bugs I found and adds unit tests covering NULL keys (I didn't see any prior
to this change).

Reviewed By: xiaoxmeng

Differential Revision: D73077550

fbshipit-source-id: 82d914d38835b51f52676cfa2317fdce164ee0fc
…okincubator#12755)

Summary:
In this issue, I found that some options in the `WriterOptions` are hardcoded but there are setters for them so it does not make sense that they are not configurable. **In this PR, I mainly made the page size and the batch size configurable.** The page size is configurable in Presto [here](https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/ParquetFileWriterConfig.java#L49-L59), so we could make it configurable in velox as well.

However I think it is not enough to just make the page size configurable, because batch size could also affect the page size since velox parquet writer writes the data in batch, and each batch at most take one page. So if the batch size is too big, the actual page size could be far larger than the page size set by user. So user might also want to set the batch size.

Fixes: facebookincubator#12734

Pull Request resolved: facebookincubator#12755

Reviewed By: pedroerp

Differential Revision: D72576367

Pulled By: kgpai

fbshipit-source-id: e8e3fd3a660c89a47c84e16845d9ea8109076470
@bdice bdice requested review from a team, devavret and karthikeyann as code owners April 20, 2025 19:33
@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@bdice
Copy link
Author

bdice commented Apr 20, 2025

/ok to test 8ee066e

@rapidsai rapidsai deleted a comment from copy-pr-bot bot Apr 20, 2025
@bdice
Copy link
Author

bdice commented Apr 20, 2025

/ok to test da0be93

@bdice
Copy link
Author

bdice commented Apr 20, 2025

/ok to test 903ccdf

@bdice
Copy link
Author

bdice commented Apr 21, 2025

/ok to test 7ed1fc5

@bdice
Copy link
Author

bdice commented Apr 21, 2025

/ok to test 2926e7e

@bdice bdice merged commit 5ec5b0b into rapidsai:velox-cudf Apr 21, 2025
2 checks passed
VinithKrishnan pushed a commit to VinithKrishnan/velox-rapidsai that referenced this pull request Jun 29, 2025
…ger-overflow (facebookincubator#13831)

Summary:
Pull Request resolved: facebookincubator#13831

This avoids the following errors:

```
fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_abs.h:56:41: runtime error: negation of -9223372036854775808 cannot be represented in type 'long'; cast to an unsigned type to negate this value to itself
    #0 0x000000346ce5 in std::abs(long) fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_abs.h:56
    rapidsai#1 0x000000345879 in std::shared_ptr<facebook::velox::BiasVector<facebook::velox::test::EvalTypeHelper<long>::Type>> facebook::velox::test::VectorMaker::biasVector<long>(std::vector<std::optional<long>, std::allocator<std::optional<long>>> const&) fbcode/velox/vector/tests/utils/VectorMaker-inl.h:58
    rapidsai#2 0x000000344d34 in facebook::velox::test::BiasVectorErrorTest::errorTest(std::vector<std::optional<long>, std::allocator<std::optional<long>>>) fbcode/velox/vector/tests/BiasVectorTest.cpp:39
    rapidsai#3 0x00000033ec99 in facebook::velox::test::BiasVectorErrorTest_checkRangeTooLargeError_Test::TestBody() fbcode/velox/vector/tests/BiasVectorTest.cpp:44
    rapidsai#4 0x7fe0a2342c46 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) fbsource/src/gtest.cc:2727
    rapidsai#5 0x7fe0a234275d in testing::Test::Run() fbsource/src/gtest.cc:2744
    rapidsai#6 0x7fe0a2345fb3 in testing::TestInfo::Run() fbsource/src/gtest.cc:2890
    rapidsai#7 0x7fe0a234c8eb in testing::TestSuite::Run() fbsource/src/gtest.cc:3068
    rapidsai#8 0x7fe0a237b52b in testing::internal::UnitTestImpl::RunAllTests() fbsource/src/gtest.cc:6059
    rapidsai#9 0x7fe0a237a0a2 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) fbsource/src/gtest.cc:2727
    rapidsai#10 0x7fe0a23797f5 in testing::UnitTest::Run() fbsource/src/gtest.cc:5599
    rapidsai#11 0x7fe0a2239800 in RUN_ALL_TESTS() fbsource/gtest/gtest.h:2334
    rapidsai#12 0x7fe0a223952c in main fbcode/common/gtest/LightMain.cpp:20
    rapidsai#13 0x7fe09ec2c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    rapidsai#14 0x7fe09ec2c717 in __libc_start_main@GLIBC_2.2.5 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
    rapidsai#15 0x00000033d8b0 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116

UndefinedBehaviorSanitizer: signed-integer-overflow fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_abs.h:56:41
```
Avoid overflow by using the expression (static_cast<uint64_t>(1) + ~static_cast<uint64_t>(min)) to calculate the absolute value of min without using std::abs

Reviewed By: dmm-fb, peterenescu

Differential Revision: D76901449

fbshipit-source-id: 7eb3bd0f83e42f44cdf34ea1759f3aa9e1042dae
copy-pr-bot bot pushed a commit that referenced this pull request Sep 10, 2025
copy-pr-bot bot pushed a commit that referenced this pull request Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.