Skip to content

Conversation

@ovr
Copy link
Contributor

@ovr ovr commented Aug 4, 2022

Which issue does this PR close?

This PR introduces support for binary bitwise shift operators like >> and <<.

Refs #1619

Are there any user-facing changes?

It's new functionality, and there are no breaking changes.

Thanks!

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sql SQL Planner labels Aug 4, 2022
@ovr ovr force-pushed the binary-bitwise-shifts-pr branch 2 times, most recently from a5a06ff to 6449570 Compare August 4, 2022 17:29
@ovr
Copy link
Contributor Author

ovr commented Aug 4, 2022

Right now, DF uses Generic Dialect for sqlparser, but it's allowed only for PostgreSqlDialect

Token::ShiftLeft if dialect_of!(self is PostgreSqlDialect) => {
    Some(BinaryOperator::PGBitwiseShiftLeft)
}
Token::ShiftRight if dialect_of!(self is PostgreSqlDialect) => {
    Some(BinaryOperator::PGBitwiseShiftRight)
}

@alamb what should I do with it? Should I Introduce new methods for Dialect to allow these operators with Generic dialect?

Thanks

@alamb
Copy link
Contributor

alamb commented Aug 4, 2022

@alamb what should I do with it? Should I Introduce new methods for Dialect to allow these operators with Generic dialect?

What about allowing BinaryOperator::PGBitwiseShiftRight in GenericDialect ? I think the GenericDialect in sqlparser is sort of a "anything goes" dialect so it should be fine to extend it to allow these operators.

I can make another sql-parser release soon if you need one as well

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the PR and I think the basic direction is very nice 👍 thanks @ovr

}
}

pub(crate) fn bitwise_shift_right(left: ArrayRef, right: ArrayRef) -> Result<ArrayRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@liukun4515
Copy link
Contributor

@alamb what should I do with it? Should I Introduce new methods for Dialect to allow these operators with Generic dialect?

What about allowing BinaryOperator::PGBitwiseShiftRight in GenericDialect ? I think the GenericDialect in sqlparser is sort of a "anything goes" dialect so it should be fine to extend it to allow these operators.

I can make another sql-parser release soon if you need one as well

LGTM, if we can merge the sql-rs first

@ovr
Copy link
Contributor Author

ovr commented Aug 5, 2022

Looks likes I forget about overflowing here. I think I can use overflowing_shl to handle it correctly to the Error https://doc.rust-lang.org/std/primitive.i64.html#method.overflowing_shl

❯ select 2 << 1024;
thread 'main' panicked at 'attempt to shift left with overflow', datafusion/physical-expr/src/expressions/binary.rs:524:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@ovr ovr force-pushed the binary-bitwise-shifts-pr branch from 4e1dfde to 36aff67 Compare August 5, 2022 14:46
@ovr
Copy link
Contributor Author

ovr commented Aug 5, 2022

Panic on overflow was fixed in 474ed5a

Without error to be similar with PostgreSQL

image

@alamb
Copy link
Contributor

alamb commented Aug 6, 2022

https://crates.io/crates/sqlparser/0.20.0 is released, FYI

@ovr
Copy link
Contributor Author

ovr commented Aug 8, 2022

Awaiting #3072

@alamb
Copy link
Contributor

alamb commented Aug 8, 2022

Awaiting #3072

Is merged 🎉

@ovr ovr force-pushed the binary-bitwise-shifts-pr branch from 474ed5a to 636be02 Compare August 11, 2022 20:56
/// The binary_bitwise_array_op macro only evaluates for integer types
/// like int64, int32.
/// It is used to do bitwise operation on an array with a scalar.
macro_rules! binary_bitwise_array_scalar {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed binary_bitwise_array_scalar to pass function as expr to do casting, because wrapping_shr and wrapping_shl requires u32. I tried to do it with let right: $TYPE = scalar.try_into().unwrap(); but I found that it will show a error because it's not possible to cast all numeric types to u32.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense -- thank you

@ovr ovr marked this pull request as ready for review August 11, 2022 20:59
@ovr ovr force-pushed the binary-bitwise-shifts-pr branch from 636be02 to 17fb689 Compare August 11, 2022 21:06
@ovr ovr marked this pull request as draft August 11, 2022 21:14
@ovr ovr force-pushed the binary-bitwise-shifts-pr branch from 17fb689 to 92fa53c Compare August 11, 2022 21:17
@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2022

Codecov Report

Merging #3037 (810f4b0) into master (48f9b7a) will decrease coverage by 0.03%.
The diff coverage is 63.15%.

@@            Coverage Diff             @@
##           master    #3037      +/-   ##
==========================================
- Coverage   85.95%   85.91%   -0.04%     
==========================================
  Files         291      291              
  Lines       52382    52475      +93     
==========================================
+ Hits        45025    45084      +59     
- Misses       7357     7391      +34     
Impacted Files Coverage Δ
...on/physical-expr/src/expressions/binary/kernels.rs 52.17% <44.44%> (-10.99%) ⬇️
datafusion/physical-expr/src/expressions/binary.rs 97.71% <93.10%> (-0.17%) ⬇️
datafusion/core/tests/sql/expr.rs 99.84% <100.00%> (+<0.01%) ⬆️
datafusion/expr/src/binary_rule.rs 84.61% <100.00%> (ø)
datafusion/expr/src/operator.rs 96.00% <100.00%> (+0.16%) ⬆️
datafusion/sql/src/planner.rs 81.96% <100.00%> (+0.01%) ⬆️
datafusion/core/src/physical_plan/metrics/value.rs 86.93% <0.00%> (-0.51%) ⬇️
datafusion/expr/src/logical_plan/plan.rs 77.60% <0.00%> (-0.18%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is looking pretty good 👍

Please mark it as "ready for review" when it is ready.

Thanks !

@ovr ovr marked this pull request as ready for review August 12, 2022 12:32
@ovr
Copy link
Contributor Author

ovr commented Aug 12, 2022

Marked PR as ready for review ✅

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me @ovr - thank you for the contribution!

I had one tiny testing suggestion, but otherwise I think this PR is ready to go

let modules = Arc::new(Int32Array::from(vec![Some(100)])) as ArrayRef;
let result = bitwise_shift_left(input.clone(), modules.clone())?;

let expected = Int32Array::from(vec![Some(32)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@liukun4515 liukun4515 Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2<<100 =>> 2<<(100%bit_width(i32)) =>> 2 << 4?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am surprised that this is rotational shifting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am surprised that this is rotational shifting

me too...

#[test]
fn bitwise_shift_array_test() -> Result<()> {
let input = Arc::new(Int32Array::from(vec![Some(2), None, Some(10)])) as ArrayRef;
let modules =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a test for when the modules is Null (you cover NULL for the input already)

#[test]
fn bitwise_shift_scalar_test() -> Result<()> {
let input = Arc::new(Int32Array::from(vec![Some(2), None, Some(4)])) as ArrayRef;
let module = ScalarValue::from(10i32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, here a test for null handling might be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing unit test here, I did a simple SQL test which test the whole pipeline, and I found that DF doesn't handle nulls for bitwise operators.

Fixed in:

810f4b0

/// The binary_bitwise_array_op macro only evaluates for integer types
/// like int64, int32.
/// It is used to do bitwise operation on an array with a scalar.
macro_rules! binary_bitwise_array_scalar {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense -- thank you

@ovr ovr force-pushed the binary-bitwise-shifts-pr branch from 92fa53c to 624e8b3 Compare August 14, 2022 22:01
use arrow::datatypes::DataType::*;

if !is_numeric(left_type) || !is_numeric(right_type) {
if !both_numeric_or_null_and_numeric(left_type, right_type) {
Copy link
Contributor Author

@ovr ovr Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

To be similar with default SQL behaviour for null.

@andygrove
Copy link
Member

Can we update the user guide to show this new functionality? This can be a follow-on issue.

@ovr
Copy link
Contributor Author

ovr commented Aug 15, 2022

@andygrove

Can we update the user guide to show this new functionality? This can be a follow-on issue.

https://github.com/apache/arrow-datafusion/blob/master/docs/source/user-guide/sql/sql_status.md#supported-sql

I didn't find any place where I should update the documentation, as I can see there is a small comment about binary expressions, but it doesn't include all binary expressions.

Many mathematical unary and binary expressions such as +, /, sqrt, tan, >=.

I updated the description on this PR to reference #1619

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @ovr

@alamb alamb merged commit ee55d89 into apache:master Aug 15, 2022
@ursabot
Copy link

ursabot commented Aug 15, 2022

Benchmark runs are scheduled for baseline = 5ddad47 and contender = ee55d89. ee55d89 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ovr ovr deleted the binary-bitwise-shifts-pr branch August 16, 2022 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants