[SPARK-38333][SQL] [3.1]DPP cause DataSourceScanExec java.lang.NullPointer… #35662

monkeyboy123 · 2022-02-26T00:22:14Z

What changes were proposed in this pull request?

To fix canonicalization NPE, maybe is a proof of spark-35742,just as SPARK-23731 says

Why are the changes needed?

It is a bug:
run this sql in yarn client mode,it will generate DynamicPruningExpression:

drop table if exists test_a_pt;
create table  test_a_pt(col1 int, col2 int,pt string) USING parquet PARTITIONED BY (pt);
insert into table test_a_pt values(1,2,'20220101'),(3,4,'20220101'),(1,2,'20220101'),(3,4,'20220101'),(1,2,'20220101'),(3,4,'20220101');

drop table if exists test_b;
create table test_b as select 1 as `搜索demo_uv` ,2 as `搜索demo_gmv`, 'gogo' as scenes, '2021-03-04' as date1;

drop table if exists dest;
create table dest as 
SELECT  a.pt,
        a.scenes
FROM    (
            SELECT   '20220101' as pt
                     ,'comeon' AS scenes
            FROM    test_b where scenes='gogo' and exists(array(date1),x-> x =='2021-03-04')
            UNION ALL
            SELECT  pt as pt
                     ,'comeon' AS scenes
            FROM    (
                        SELECT  pt,COUNT( distinct col2) AS buy_tab_uv
                        FROM    test_a_pt
                        where pt='20220101'
                        GROUP BY pt 
                    ) a
        ) a
JOIN    (
            SELECT  pt ,COUNT(distinct col2) AS buy_tab_uv
                    FROM  test_a_pt
                    where pt='20220101'
                    GROUP BY pt 
        ) b
ON      a.pt = b.pt
;

BTW, function: exists extends CodegenFallback.

The root cause is addExprTree funtion in EquivalentExpressions:


def addExprTree(
expr: Expression,
addFunc: Expression => Boolean = addExpr): Unit = {
val skip = expr.isInstanceOf[LeafExpression] ||
// `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
// loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
// `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
// can cause error like NPE.
(expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null)

if (!skip && !addFunc(expr)) {
childrenToRecurse(expr).foreach(addExprTree(_, addFunc))
commonChildrenToRecurse(expr).filter(.nonEmpty).foreach(addCommonExprs(, addFunc))

as DPP will contains expressions : DynamicPruningExpression(InSubqueryExec(value, broadcastValues, exprId),
then executor will compile code, NPE will appears.

so, we should iterator all children,
(expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined && TaskContext.get != null),
if PlanExpression found, such as InSubqueryExec, we should skip addExprTree, then NPE will disappears.

Does this PR introduce any user-facing change?

Yes,
before this pr:
NPE will throw, like this:

Caused by: java.lang.NullPointerException
 at org.apache.spark.sql.execution.DataSourceScanExec.$init$(DataSourceScanExec.scala:57)
 at org.apache.spark.sql.execution.FileSourceScanExec.<init>(DataSourceScanExec.scala:172)
 at org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:635)
 at org.apache.spark.sql.execution.FileSourceScanExec.doCanonicalize(DataSourceScanExec.scala:162)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doCanonicalize(Exchange.scala:57)
 at org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doCanonicalize(Exchange.scala:51)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$1(QueryPlan.scala:387)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:387)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doCanonicalize(BroadcastExchangeExec.scala:89)
 at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doCanonicalize(BroadcastExchangeExec.scala:72)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doCanonicalize(Exchange.scala:57)
 at org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doCanonicalize(Exchange.scala:51)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.SubqueryBroadcastExec.doCanonicalize(SubqueryBroadcastExec.scala:66)
 at org.apache.spark.sql.execution.SubqueryBroadcastExec.doCanonicalize(SubqueryBroadcastExec.scala:41)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
 at org.apache.spark.sql.execution.InSubqueryExec.canonicalized$lzycompute(subquery.scala:165)
 at org.apache.spark.sql.execution.InSubqueryExec.canonicalized(subquery.scala:162)
 at org.apache.spark.sql.execution.InSubqueryExec.canonicalized(subquery.scala:113)
 at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$canonicalized$1(Expression.scala:229)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:229)
 at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:228)
 at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$canonicalized$1(Expression.scala:229)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at scala.collection.TraversableLike.map(TraversableLike.scala:238)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
 at scala.collection.immutable.List.map(List.scala:298)
 at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:229)
 at org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:228)
 at org.apache.spark.sql.catalyst.expressions.Expression.semanticHash(Expression.scala:248)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions$Expr.hashCode(EquivalentExpressions.scala:41)
 at scala.runtime.Statics.anyHash(Statics.java:122)
 at scala.collection.mutable.HashTable$HashUtils.elemHashCode(HashTable.scala:416)
 at scala.collection.mutable.HashTable$HashUtils.elemHashCode$(HashTable.scala:416)
 at scala.collection.mutable.HashMap.elemHashCode(HashMap.scala:44)
 at scala.collection.mutable.HashTable.findEntry(HashTable.scala:136)
 at scala.collection.mutable.HashTable.findEntry$(HashTable.scala:135)
 at scala.collection.mutable.HashMap.findEntry(HashMap.scala:44)
 at scala.collection.mutable.HashMap.get(HashMap.scala:74)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExpr(EquivalentExpressions.scala:55)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.$anonfun$addExprTree$default$2$1(EquivalentExpressions.scala:143)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.$anonfun$addExprTree$default$2$1$adapted(EquivalentExpressions.scala:143)
 at org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.addExprTree(EquivalentExpressions.scala:152)
 at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.$anonfun$proxyExpressions$1(SubExprEvaluationRuntime.scala:89)
 at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.$anonfun$proxyExpressions$1$adapted(SubExprEvaluationRuntime.scala:89)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at org.apache.spark.sql.catalyst.expressions.SubExprEvaluationRuntime.proxyExpressions(SubExprEvaluationRuntime.scala:89)
 at org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.<init>(predicates.scala:53)
 at org.apache.spark.sql.catalyst.expressions.Predicate$.createInterpretedObject(predicates.scala:92)
 at org.apache.spark.sql.catalyst.expressions.Predicate$.createInterpretedObject(predicates.scala:85)
 at org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:56)
 at org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:101)
 at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2(basicPhysicalOperators.scala:246)
 at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2$adapted(basicPhysicalOperators.scala:245)
 at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:885)
 at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:885)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
 at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:106)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at org.apache.spark.scheduler.Task.run(Task.scala:131)
 ... 3 more (state=,code=0)

after this pr,
everything is ok

How was this patch tested?

New UT

…Exception

LuciferYang · 2022-02-26T13:35:23Z

Does this issue only exist in Spark 3.1? If yes, please add [3.1] to the pr title. If not, please send pr to the master branch first.

monkeyboy123 · 2022-02-27T01:02:06Z

Does this issue only exist in Spark 3.1? If yes, please add [3.1] to the pr title. If not, please send pr to the master branch first.

Yes, only exists in Spark 3.1.Updated

LuciferYang · 2022-02-27T02:34:15Z

Yes, only exists in Spark 3.1.Updated

Cloud you explain why the master/3.2 does not have this issue?

monkeyboy123 · 2022-02-27T05:33:59Z

Cloud you explain why the master/3.2 does not have this issue?

It fix by SPARK-35798 and the code related to DPP ,in spark master/3.2 the sql will be translated to normal join, instead of DPP

LuciferYang · 2022-02-27T05:58:58Z

For a bug fix pr, we need to add at least one UT. The new UT should fail before this pr and passed after this pr, which also helps to ensure that the change of other pr in the future will not keep this fix.

LuciferYang · 2022-02-27T06:05:23Z

On the other hand, if we backport SPARK-35798 to branch-3.1, can this issue be solved?

monkeyboy123 · 2022-02-27T11:00:50Z

On the other hand, if we backport SPARK-35798 to branch-3.1, can this issue be solved?

After backport SPARK-35798 ,
new NullPointerException will throws:

case class FileSourceScanExec(
    @transient relation: HadoopFsRelation,
    output: Seq[Attribute],
    requiredSchema: StructType,
    partitionFilters: Seq[Expression],
    optionalBucketSet: Option[BitSet],
    optionalNumCoalescedBuckets: Option[Int],
    dataFilters: Seq[Expression],
    tableIdentifier: Option[TableIdentifier],
    disableBucketedScan: Boolean = false)
  extends DataSourceScanExec {

  // Note that some vals referring the file-based relation are lazy intentionally
  // so that this plan can be canonicalized on executor side too. See SPARK-23731.
  override lazy val supportsColumnar: Boolean = {
    relation.fileFormat.supportBatch(relation.sparkSession, schema)
  }

because relation is null

monkeyboy123 · 2022-02-27T11:45:12Z

EquivalentExpressions

For a bug fix pr, we need to add at least one UT. The new UT should fail before this pr and passed after this pr, which also helps to ensure that the change of other pr in the future will not keep this fix.

It is hard to add a unit test，as it only happen in the runtime

AmplabJenkins · 2022-02-27T14:17:37Z

Can one of the admins verify this patch?

LuciferYang · 2022-02-27T15:54:46Z

cc @cloud-fan

weixiuli · 2022-02-28T01:42:07Z

It is hard to add a unit test，as it only happen in the runtime

Good catch, we may add a small end-to-end test?

LuciferYang · 2022-02-28T02:54:44Z

Good catch, we may add a small end-to-end test?

Agree with @weixiuli +1

monkeyboy123 · 2022-02-28T05:06:44Z

Good catch, we may add a small end-to-end test?

I will add a test later

AngersZhuuuu · 2022-02-28T07:08:40Z

On the other hand, if we backport SPARK-35798 to branch-3.1, can this issue be solved?

After backport SPARK-35798 , new NullPointerException will throws:

case class FileSourceScanExec(
    @transient relation: HadoopFsRelation,
    output: Seq[Attribute],
    requiredSchema: StructType,
    partitionFilters: Seq[Expression],
    optionalBucketSet: Option[BitSet],
    optionalNumCoalescedBuckets: Option[Int],
    dataFilters: Seq[Expression],
    tableIdentifier: Option[TableIdentifier],
    disableBucketedScan: Boolean = false)
  extends DataSourceScanExec {

  // Note that some vals referring the file-based relation are lazy intentionally
  // so that this plan can be canonicalized on executor side too. See SPARK-23731.
  override lazy val supportsColumnar: Boolean = {
    relation.fileFormat.supportBatch(relation.sparkSession, schema)
  }

because relation is null

I think we need to find which pr fix this issue correctly. then backport to spark 3.1 is the best way? or the code path is not same between 3.1 and master?

LuciferYang · 2022-02-28T07:33:50Z

So we should add a new test case and then we can use the new case to find which patches need to be backport faster

monkeyboy123 · 2022-02-28T12:19:58Z

I think we need to find which pr fix this issue correctly. then backport to spark 3.1 is the best way? or the code path is not same between 3.1 and master?

It fix by #32947 and the code related to DPP ,in spark master/3.2 the sql will be translated to normal join, instead of DPP.
And, The Error is more like SPARK-29239,in spark master/3.2 the skip function in addExprTree should encounter same question, but the demo sql i pasted does not trigger it.

cloud-fan · 2022-02-28T13:58:33Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala

      // `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
      // can cause error like NPE.
-      (expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null)
+      (expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined && TaskContext.get != null)


I think we should fix it at the master branch as well, as the code does not match the comment.

We can also add a UT in SubexpressionEliminationSuite, which tests addExprTree directly.

OK, i will open another pr to fix it at the master branch and add a UT

unit test added

@cloud-fan How should this pr title be named? Maybe this is a potential problem.

Do we have a master branch PR now?

@monkeyboy123 we can replace expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined with TreeNode.exists api now

Do we have a master branch PR now?

Sorry for late reply，i will open a pr right now.

@monkeyboy123 we can replace expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined with TreeNode.exists api now
ok

Do we have a master branch PR now?

done, new pr SPARK-38333

AngersZhuuuu · 2022-03-01T06:22:11Z

It fix by #32947 and the code related to DPP ,in spark master/3.2 the sql will be translated to normal join, instead of DPP. And, The Error is more like SPARK-29239,in spark master/3.2 the skip function in addExprTree should exits same question, but the demo sql i pasted does not trigger it.

So I think we should also find out which pr it can't be translated to DPP? And backport to branch-3.1. Then we can fix current pr's issue in both master/3.2/3.1. WDYT? @LuciferYang @cloud-fan

LuciferYang · 2022-03-01T07:07:09Z

So I think we should also find out which pr it can't be translated to DPP? And backport to branch-3.1. Then we can fix current pr's issue in both master/3.2/3.1. WDYT? @LuciferYang @cloud-fan

+1 agree with you

…function in Executor ### What changes were proposed in this pull request? It is master branch pr [SPARK-38333](#35662) ### Why are the changes needed? Bug fix, it is potential issue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #36012 from monkeyboy123/spark-38333. Authored-by: Dereck Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…function in Executor ### What changes were proposed in this pull request? It is master branch pr [SPARK-38333](#35662) ### Why are the changes needed? Bug fix, it is potential issue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #36012 from monkeyboy123/spark-38333. Authored-by: Dereck Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a40acd4) Signed-off-by: Wenchen Fan <[email protected]>

…function in Executor It is master branch pr [SPARK-38333](#35662) Bug fix, it is potential issue. No UT Closes #36012 from monkeyboy123/spark-38333. Authored-by: Dereck Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a40acd4) Signed-off-by: Wenchen Fan <[email protected]>

monkeyboy123 · 2022-03-31T23:39:48Z

Thanks for review all of you

…function in Executor It is master branch pr [SPARK-38333](apache#35662) Bug fix, it is potential issue. No UT Closes apache#36012 from monkeyboy123/spark-38333. Authored-by: Dereck Li <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a40acd4) Signed-off-by: Wenchen Fan <[email protected]>

[SPARK-38333][SQL] DPP cause DataSourceScanExec java.lang.NullPointer…

b5cbd73

…Exception

github-actions bot added the SQL label Feb 26, 2022

monkeyboy123 changed the title ~~[SPARK-38333][SQL] DPP cause DataSourceScanExec java.lang.NullPointer…~~ [SPARK-38333][SQL] [3.1]DPP cause DataSourceScanExec java.lang.NullPointer… Feb 27, 2022

monkeyboy123 closed this Feb 27, 2022

monkeyboy123 reopened this Feb 27, 2022

cloud-fan reviewed Feb 28, 2022

View reviewed changes

add UT

801ee50

monkeyboy123 force-pushed the dpp-canonicalization-NPE branch 3 times, most recently from 862edb4 to 9b3a11c Compare March 1, 2022 01:16

fix code style

708a168

monkeyboy123 force-pushed the dpp-canonicalization-NPE branch from 9b3a11c to 708a168 Compare March 1, 2022 04:05

monkeyboy123 mentioned this pull request Mar 30, 2022

[SPARK-38333][SQL] PlanExpression expression should skip addExprTree function in Executor #36012

Closed

monkeyboy123 closed this Mar 31, 2022

[SPARK-38333][SQL] [3.1]DPP cause DataSourceScanExec java.lang.NullPointer… #35662

[SPARK-38333][SQL] [3.1]DPP cause DataSourceScanExec java.lang.NullPointer… #35662

Uh oh!

Conversation

monkeyboy123 commented Feb 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

LuciferYang commented Feb 26, 2022

Uh oh!

monkeyboy123 commented Feb 27, 2022

Uh oh!

LuciferYang commented Feb 27, 2022

Uh oh!

monkeyboy123 commented Feb 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuciferYang commented Feb 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuciferYang commented Feb 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

monkeyboy123 commented Feb 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

monkeyboy123 commented Feb 27, 2022

Uh oh!

AmplabJenkins commented Feb 27, 2022

Uh oh!

LuciferYang commented Feb 27, 2022

Uh oh!

weixiuli commented Feb 28, 2022

Uh oh!

LuciferYang commented Feb 28, 2022

Uh oh!

monkeyboy123 commented Feb 28, 2022

Uh oh!

AngersZhuuuu commented Feb 28, 2022

Uh oh!

LuciferYang commented Feb 28, 2022

Uh oh!

monkeyboy123 commented Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan Feb 28, 2022

Choose a reason for hiding this comment

Uh oh!

monkeyboy123 Feb 28, 2022

Choose a reason for hiding this comment

Uh oh!

monkeyboy123 Mar 1, 2022

Choose a reason for hiding this comment

Uh oh!

monkeyboy123 Mar 27, 2022

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 28, 2022

Choose a reason for hiding this comment

Uh oh!

LuciferYang Mar 28, 2022

Choose a reason for hiding this comment

Uh oh!

monkeyboy123 Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

monkeyboy123 Mar 30, 2022

Choose a reason for hiding this comment

Uh oh!

monkeyboy123 Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu commented Mar 1, 2022

Uh oh!

LuciferYang commented Mar 1, 2022

monkeyboy123 commented Feb 26, 2022 •

edited

Loading

monkeyboy123 commented Feb 27, 2022 •

edited

Loading

LuciferYang commented Feb 27, 2022 •

edited

Loading

LuciferYang commented Feb 27, 2022 •

edited

Loading

monkeyboy123 commented Feb 27, 2022 •

edited

Loading

monkeyboy123 commented Feb 28, 2022 •

edited

Loading

monkeyboy123 Mar 30, 2022 •

edited

Loading

monkeyboy123 Mar 30, 2022 •

edited

Loading