-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK][SPARK-10842]Eliminate creating duplicate stage while generate job dag #8923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #43051 has finished for PR 8923 at commit
|
|
jenkins retest this please |
|
Test build #43084 has finished for PR 8923 at commit
|
|
jenkins retest this please |
|
Test build #43709 has finished for PR 8923 at commit
|
|
Test build #43714 has finished for PR 8923 at commit
|
|
jenkins retest this please |
|
Test build #43772 has finished for PR 8923 at commit
|
|
This looks good, but can you add a test case? Mostly I'd like to understand the dependency structure that leads to this (perhaps there are other corner cases related to it we should look at). I was expecting to see one stage as a dependency for multiple later stages, or something like that, but I don't see that even in your "After" image. A test case will also help prevent future regressions |
|
I prefer to fix this by changing the result type of |
|
@markhamstra @squito We get finalRDD: R7 shufflesdeps, the first is R6 for R6, not added into eh... @markhamstra, change the return type from Stack to Set sounds like good, but the |
|
@suyanNone Yes, do check uniqueness, and if that looks fine, then there isn't really a reason to use a Stack instead of Set for |
|
I see, so this comes from a "diamond" dependency. I'm not seeing that in your "after" image though -- am I just missing it with too many criss-crossing lines? I'm wondering if there is another bug w/ either the listener events or the visualization. I'm actually wondering if there is any point in building up the |
|
@squito I guess the only issue with that is whether we will ever have need in the future to get the ancestor shuffle dependencies without registering them. I doubt it, so I think I'm in favor of simplifying and inlining the code. @kayousterhout ? |
|
I spent a bit of time reproducing this. It is actually rather tricky. (a) the shared dependency has to be a shared test("shared shuffle stages with diamond dependencies", ActiveTag) {
val rddA = new MyRDD(sc, 3, Nil).setName("rddA")
val diamondInput = new ShuffleDependency(rddA, new HashPartitioner(2))
val rddTop = new MyRDD(sc, 2, List(diamondInput)).setName("rddTop")
val diamondTop = new ShuffleDependency(rddTop, new HashPartitioner(10))
val rddBottom = new MyRDD(sc, 2, List(diamondInput)).setName("rddBottom")
val diamondBottom = new ShuffleDependency(rddBottom, new HashPartitioner(10))
val rddC = new MyRDD(sc, 2, List(diamondTop, diamondBottom)).setName("rddC")
val lastDep = new ShuffleDependency(rddC, new HashPartitioner(3))
val rddD = new MyRDD(sc, 3, List(lastDep)).setName("rddD")
submit(rddD, Array(0, 1, 2))
} |
|
Test build #45850 has finished for PR 8923 at commit
|
|
Test build #45861 has started for PR 8923 at commit |
|
i will retrigger the build after jenkins maintenance is done. |
|
jenkins, test this please |
|
Test build #45866 has finished for PR 8923 at commit
|
|
Test build #45867 has finished for PR 8923 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please return an immutable Set.
|
Test build #45975 has finished for PR 8923 at commit
|
|
Test build #47042 has finished for PR 8923 at commit
|
|
@suyanNone still needs the test case |
|
Revert set to Stack, and add test case. |
|
Test build #56984 has finished for PR 8923 at commit
|
|
already merge into #12655, mark this closed |
When we traverse RDD, to generate Stage DAG, Spark will skip to judge the stage was added into shuffleIdToStage or not in some condition: get shuffleDep from getAncestorShuffleDependency
Before:


After:

