Commit fdd3bac
[SPARK-22148][SPARK-15815][SCHEDULER] Acquire new executors to avoid hang because of blacklisting
## What changes were proposed in this pull request?
Every time a task is unschedulable because of the condition where no. of task failures < no. of executors available, we currently abort the taskSet - failing the job. This change tries to acquire new executors so that we can complete the job successfully. We try to acquire a new executor only when we can kill an existing idle executor. We fallback to the older implementation where we abort the job if we cannot find an idle executor.
## How was this patch tested?
I performed some manual tests to check and validate the behavior.
```scala
val rdd = sc.parallelize(Seq(1 to 10), 3)
import org.apache.spark.TaskContext
val mapped = rdd.mapPartitionsWithIndex ( (index, iterator) => { if (index == 2) { Thread.sleep(30 * 1000); val attemptNum = TaskContext.get.attemptNumber; if (attemptNum < 3) throw new Exception("Fail for blacklisting")}; iterator.toList.map (x => x + " -> " + index).iterator } )
mapped.collect
```
Closes #22288 from dhruve/bug/SPARK-22148.
Lead-authored-by: Dhruve Ashar <[email protected]>
Co-authored-by: Dhruve Ashar <[email protected]>
Co-authored-by: Tom Graves <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>1 parent 3ed91c9 commit fdd3bac
7 files changed
Lines changed: 318 additions & 36 deletions
File tree
- core/src
- main/scala/org/apache/spark
- internal/config
- scheduler
- test/scala/org/apache/spark/scheduler
- docs
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
622 | 622 | | |
623 | 623 | | |
624 | 624 | | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
625 | 633 | | |
626 | 634 | | |
627 | 635 | | |
| |||
Lines changed: 20 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
149 | 161 | | |
150 | 162 | | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
| 163 | + | |
| 164 | + | |
161 | 165 | | |
162 | 166 | | |
163 | 167 | | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
164 | 174 | | |
165 | 175 | | |
166 | 176 | | |
| |||
Lines changed: 69 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
120 | 125 | | |
121 | 126 | | |
122 | 127 | | |
| |||
415 | 420 | | |
416 | 421 | | |
417 | 422 | | |
| 423 | + | |
418 | 424 | | |
419 | | - | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
420 | 468 | | |
| 469 | + | |
421 | 470 | | |
422 | 471 | | |
423 | 472 | | |
| |||
453 | 502 | | |
454 | 503 | | |
455 | 504 | | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
456 | 522 | | |
457 | 523 | | |
458 | 524 | | |
| |||
590 | 656 | | |
591 | 657 | | |
592 | 658 | | |
| 659 | + | |
593 | 660 | | |
594 | 661 | | |
595 | 662 | | |
| |||
Lines changed: 23 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
623 | 623 | | |
624 | 624 | | |
625 | 625 | | |
626 | | - | |
627 | | - | |
| 626 | + | |
| 627 | + | |
628 | 628 | | |
629 | 629 | | |
630 | 630 | | |
| |||
635 | 635 | | |
636 | 636 | | |
637 | 637 | | |
638 | | - | |
639 | | - | |
640 | | - | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
641 | 641 | | |
642 | 642 | | |
643 | 643 | | |
| |||
658 | 658 | | |
659 | 659 | | |
660 | 660 | | |
661 | | - | |
| 661 | + | |
662 | 662 | | |
663 | 663 | | |
664 | 664 | | |
665 | | - | |
| 665 | + | |
666 | 666 | | |
667 | 667 | | |
668 | 668 | | |
| |||
679 | 679 | | |
680 | 680 | | |
681 | 681 | | |
682 | | - | |
683 | | - | |
684 | | - | |
685 | | - | |
686 | | - | |
687 | | - | |
688 | | - | |
689 | | - | |
690 | | - | |
691 | | - | |
692 | | - | |
693 | 682 | | |
| 683 | + | |
| 684 | + | |
694 | 685 | | |
695 | 686 | | |
696 | 687 | | |
697 | 688 | | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
698 | 703 | | |
699 | 704 | | |
700 | 705 | | |
| |||
Lines changed: 4 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
100 | | - | |
| 99 | + | |
| 100 | + | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
| 107 | + | |
| 108 | + | |
108 | 109 | | |
109 | 110 | | |
110 | 111 | | |
| |||
0 commit comments