Skip to content

Conversation

@monkeyboy123
Copy link
Contributor

@monkeyboy123 monkeyboy123 commented Feb 26, 2022

What changes were proposed in this pull request?

To fix canonicalization NPE, maybe is a proof of spark-35742,just as SPARK-23731 says

Why are the changes needed?

It is a bug:
run this sql in yarn client mode,it will generate DynamicPruningExpression:

drop table if exists test_a_pt;
create table  test_a_pt(col1 int, col2 int,pt string) USING parquet PARTITIONED BY (pt);
insert into table test_a_pt values(1,2,'20220101'),(3,4,'20220101'),(1,2,'20220101'),(3,4,'20220101'),(1,2,'20220101'),(3,4,'20220101');

drop table if exists test_b;
create table test_b as select 1 as `搜索demo_uv` ,2 as `搜索demo_gmv`, 'gogo' as scenes, '2021-03-04' as date1;

drop table if exists dest;
create table dest as 
SELECT  a.pt,
        a.scenes
FROM    (
            SELECT   '20220101' as pt
                     ,'comeon' AS scenes
            FROM    test_b where scenes='gogo' and exists(array(date1),x-> x =='2021-03-04')
            UNION ALL
            SELECT  pt as pt
                     ,'comeon' AS scenes
            FROM    (
                        SELECT  pt,COUNT( distinct col2) AS buy_tab_uv
                        FROM    test_a_pt
                        where pt='20220101'
                        GROUP BY pt 
                    ) a
        ) a
JOIN    (
            SELECT  pt ,COUNT(distinct col2) AS buy_tab_uv
                    FROM  test_a_pt
                    where pt='20220101'
                    GROUP BY pt 
        ) b
ON      a.pt = b.pt
;

BTW, function: exists extends CodegenFallback.

The root cause is addExprTree funtion in EquivalentExpressions:


def addExprTree(
expr: Expression,
addFunc: Expression => Boolean = addExpr): Unit = {
val skip = expr.isInstanceOf[LeafExpression] ||
// `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
// loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
// `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
// can cause error like NPE.
(expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null)

if (!skip && !addFunc(expr)) {
childrenToRecurse(expr).foreach(addExprTree(_, addFunc))
commonChildrenToRecurse(expr).filter(.nonEmpty).foreach(addCommonExprs(, addFunc))

as DPP will contains expressions : DynamicPruningExpression(InSubqueryExec(value, broadcastValues, exprId),
then executor will compile code, NPE will appears.

so, we should iterator all children,
(expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined && TaskContext.get != null),
if PlanExpression found, such as InSubqueryExec, we should skip addExprTree, then NPE will disappears.

Does this PR introduce any user-facing change?

Yes,
before this pr:
NPE will throw, like this:

Caused by: java.lang.NullPointerException
 at org.apache.spark.sql.execution.DataSourceScanExec.$init$(DataSourceScanExec.scala:57)
 at org.apache.spark.sql.execution.FileSourceScanExec.<init>(DataSourceScanExec.scala:172)
 at org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:635)
 at org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:162)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doCanonicalize(Exchange.scala:57)
 at org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doCanonicalize(Exchange.scala:51)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doCanonicalize(BroadcastExchangeExec.scala:89)
 at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doCanonicalize(BroadcastExchangeExec.scala:72)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doCanonicalize(Exchange.scala:57)
 at org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doCanonicalize(Exchange.scala:51)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.SubqueryBroadcastExec.doCanonicalize(SubqueryBroadcastExec.scala:66)
 at org.apache.spark.sql.execution.SubqueryBroadcastExec.doCanonicalize(SubqueryBroadcastExec.scala:41)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.InSubqueryExec.canonicalized$lzycompute(subquery.scala:165)
 at org.apache.spark.sql.execution.InSubqueryExec.canonicalized(subquery.scala:162)
 at org.apache.spark.sql.execution.InSubqueryExec.canonicalized(subquery.scala:113)
 at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$canonicalized$1(Expression.scala:229)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:229)
 at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:228)
 at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$canonicalized$1(Expression.scala:229)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:229)
 at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:228)
 at org.apache.spark.sql.catalyst.expressions.Expression.semanticHash(Expression.scala:248)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions$Expr.hashCode(EquivalentExpressions.scala:41)
 at scala.runtime.Statics.anyHash(Statics.java:122)
 at scala.collection.mutable.HashTable$HashUtils.elemHashCode(HashTable.scala:416)
 at scala.collection.mutable.HashTable$HashUtils.elemHashCode$(HashTable.scala:416)
 at scala.collection.mutable.HashMap.elemHashCode(HashMap.scala:44)
 at scala.collection.mutable.HashTable.findEntry(HashTable.scala:136)
 at scala.collection.mutable.HashTable.findEntry$(HashTable.scala:135)
 at scala.collection.mutable.HashMap.findEntry(HashMap.scala:44)
 at scala.collection.mutable.HashMap.get(HashMap.scala:74)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExpr(EquivalentExpressions.scala:55)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.$anonfun$addExprTree$default$2$1(EquivalentExpressions.scala:143)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.$anonfun$addExprTree$default$2$1$adapted(EquivalentExpressions.scala:143)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTree(EquivalentExpressions.scala:152)
 at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.$anonfun$proxyExpressions$1(SubExprEvaluationRuntime.scala:89)
 at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.$anonfun$proxyExpressions$1$adapted(SubExprEvaluationRuntime.scala:89)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.proxyExpressions(SubExprEvaluationRuntime.scala:89)
 at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.<init>(predicates.scala:53)
 at org.apache.spark.sql.catalyst.expressions.Predicate$.createInterpretedObject(predicates.scala:92)
 at org.apache.spark.sql.catalyst.expressions.Predicate$.createInterpretedObject(predicates.scala:85)
 at org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:56)
 at org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:101)
 at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2(basicPhysicalOperators.scala:246)
 at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2$adapted(basicPhysicalOperators.scala:245)
 at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:885)
 at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:885)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
 at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:106)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at org.apache.spark.scheduler.Task.run(Task.scala:131)
 ... 3 more (state=,code=0)

after this pr,
everything is ok

How was this patch tested?

New UT

@github-actions github-actions bot added the SQL label Feb 26, 2022
@LuciferYang
Copy link
Contributor

Does this issue only exist in Spark 3.1? If yes, please add [3.1] to the pr title. If not, please send pr to the master branch first.

@monkeyboy123 monkeyboy123 changed the title [SPARK-38333][SQL] DPP cause DataSourceScanExec java.lang.NullPointer… [SPARK-38333][SQL] [3.1]DPP cause DataSourceScanExec java.lang.NullPointer… Feb 27, 2022
@monkeyboy123
Copy link
Contributor Author

Does this issue only exist in Spark 3.1? If yes, please add [3.1] to the pr title. If not, please send pr to the master branch first.

Yes, only exists in Spark 3.1.Updated

@LuciferYang
Copy link
Contributor

Yes, only exists in Spark 3.1.Updated

Cloud you explain why the master/3.2 does not have this issue?

@monkeyboy123
Copy link
Contributor Author

monkeyboy123 commented Feb 27, 2022

Cloud you explain why the master/3.2 does not have this issue?

It fix by SPARK-35798 and the code related to DPP ,in spark master/3.2 the sql will be translated to normal join, instead of DPP

@LuciferYang
Copy link
Contributor

LuciferYang commented Feb 27, 2022

For a bug fix pr, we need to add at least one UT. The new UT should fail before this pr and passed after this pr, which also helps to ensure that the change of other pr in the future will not keep this fix.

@LuciferYang
Copy link
Contributor

LuciferYang commented Feb 27, 2022

On the other hand, if we backport SPARK-35798 to branch-3.1, can this issue be solved?

@monkeyboy123
Copy link
Contributor Author

monkeyboy123 commented Feb 27, 2022

On the other hand, if we backport SPARK-35798 to branch-3.1, can this issue be solved?

After backport SPARK-35798 ,
new NullPointerException will throws:

case class FileSourceScanExec(
    @transient relation: HadoopFsRelation,
    output: Seq[Attribute],
    requiredSchema: StructType,
    partitionFilters: Seq[Expression],
    optionalBucketSet: Option[BitSet],
    optionalNumCoalescedBuckets: Option[Int],
    dataFilters: Seq[Expression],
    tableIdentifier: Option[TableIdentifier],
    disableBucketedScan: Boolean = false)
  extends DataSourceScanExec {

  // Note that some vals referring the file-based relation are lazy intentionally
  // so that this plan can be canonicalized on executor side too. See SPARK-23731.
  override lazy val supportsColumnar: Boolean = {
    relation.fileFormat.supportBatch(relation.sparkSession, schema)
  }

because relation is null

@monkeyboy123
Copy link
Contributor Author

EquivalentExpressions

For a bug fix pr, we need to add at least one UT. The new UT should fail before this pr and passed after this pr, which also helps to ensure that the change of other pr in the future will not keep this fix.

It is hard to add a unit test,as it only happen in the runtime

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@LuciferYang
Copy link
Contributor

cc @cloud-fan

@weixiuli
Copy link
Contributor

It is hard to add a unit test,as it only happen in the runtime

Good catch, we may add a small end-to-end test?

@LuciferYang
Copy link
Contributor

Good catch, we may add a small end-to-end test?

Agree with @weixiuli +1

@monkeyboy123
Copy link
Contributor Author

Good catch, we may add a small end-to-end test?

I will add a test later

@AngersZhuuuu
Copy link
Contributor

On the other hand, if we backport SPARK-35798 to branch-3.1, can this issue be solved?

After backport SPARK-35798 , new NullPointerException will throws:

case class FileSourceScanExec(
    @transient relation: HadoopFsRelation,
    output: Seq[Attribute],
    requiredSchema: StructType,
    partitionFilters: Seq[Expression],
    optionalBucketSet: Option[BitSet],
    optionalNumCoalescedBuckets: Option[Int],
    dataFilters: Seq[Expression],
    tableIdentifier: Option[TableIdentifier],
    disableBucketedScan: Boolean = false)
  extends DataSourceScanExec {

  // Note that some vals referring the file-based relation are lazy intentionally
  // so that this plan can be canonicalized on executor side too. See SPARK-23731.
  override lazy val supportsColumnar: Boolean = {
    relation.fileFormat.supportBatch(relation.sparkSession, schema)
  }

because relation is null

I think we need to find which pr fix this issue correctly. then backport to spark 3.1 is the best way? or the code path is not same between 3.1 and master?

@LuciferYang
Copy link
Contributor

So we should add a new test case and then we can use the new case to find which patches need to be backport faster

@monkeyboy123
Copy link
Contributor Author

monkeyboy123 commented Feb 28, 2022

I think we need to find which pr fix this issue correctly. then backport to spark 3.1 is the best way? or the code path is not same between 3.1 and master?

It fix by #32947 and the code related to DPP ,in spark master/3.2 the sql will be translated to normal join, instead of DPP.
And, The Error is more like SPARK-29239,in spark master/3.2 the skip function in addExprTree should encounter same question, but the demo sql i pasted does not trigger it.

// `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
// can cause error like NPE.
(expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null)
(expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined && TaskContext.get != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fix it at the master branch as well, as the code does not match the comment.

We can also add a UT in SubexpressionEliminationSuite, which tests addExprTree directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, i will open another pr to fix it at the master branch and add a UT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit test added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan How should this pr title be named? Maybe this is a potential problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a master branch PR now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monkeyboy123 we can replace expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined with TreeNode.exists api now

Copy link
Contributor Author

@monkeyboy123 monkeyboy123 Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a master branch PR now?

Sorry for late reply,i will open a pr right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monkeyboy123 we can replace expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined with TreeNode.exists api now
ok

Copy link
Contributor Author

@monkeyboy123 monkeyboy123 Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a master branch PR now?

done, new pr SPARK-38333

@monkeyboy123 monkeyboy123 force-pushed the dpp-canonicalization-NPE branch 3 times, most recently from 862edb4 to 9b3a11c Compare March 1, 2022 01:16
@monkeyboy123 monkeyboy123 force-pushed the dpp-canonicalization-NPE branch from 9b3a11c to 708a168 Compare March 1, 2022 04:05
@AngersZhuuuu
Copy link
Contributor

It fix by #32947 and the code related to DPP ,in spark master/3.2 the sql will be translated to normal join, instead of DPP. And, The Error is more like SPARK-29239,in spark master/3.2 the skip function in addExprTree should exits same question, but the demo sql i pasted does not trigger it.

So I think we should also find out which pr it can't be translated to DPP? And backport to branch-3.1. Then we can fix current pr's issue in both master/3.2/3.1. WDYT? @LuciferYang @cloud-fan

@LuciferYang
Copy link
Contributor

So I think we should also find out which pr it can't be translated to DPP? And backport to branch-3.1. Then we can fix current pr's issue in both master/3.2/3.1. WDYT? @LuciferYang @cloud-fan

+1 agree with you

cloud-fan pushed a commit that referenced this pull request Mar 31, 2022
…function in Executor

### What changes were proposed in this pull request?

It is master branch pr [SPARK-38333](#35662)

### Why are the changes needed?

Bug fix, it is potential issue.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

UT

Closes #36012 from monkeyboy123/spark-38333.

Authored-by: Dereck Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Mar 31, 2022
…function in Executor

### What changes were proposed in this pull request?

It is master branch pr [SPARK-38333](#35662)

### Why are the changes needed?

Bug fix, it is potential issue.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

UT

Closes #36012 from monkeyboy123/spark-38333.

Authored-by: Dereck Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a40acd4)
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Mar 31, 2022
…function in Executor

It is master branch pr [SPARK-38333](#35662)

Bug fix, it is potential issue.

No

UT

Closes #36012 from monkeyboy123/spark-38333.

Authored-by: Dereck Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a40acd4)
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Mar 31, 2022
…function in Executor

It is master branch pr [SPARK-38333](#35662)

Bug fix, it is potential issue.

No

UT

Closes #36012 from monkeyboy123/spark-38333.

Authored-by: Dereck Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a40acd4)
Signed-off-by: Wenchen Fan <[email protected]>
@monkeyboy123
Copy link
Contributor Author

Thanks for review all of you

kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…function in Executor

It is master branch pr [SPARK-38333](apache#35662)

Bug fix, it is potential issue.

No

UT

Closes apache#36012 from monkeyboy123/spark-38333.

Authored-by: Dereck Li <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a40acd4)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants