Skip to content

Conversation

@gaborgsomogyi
Copy link
Contributor

@gaborgsomogyi gaborgsomogyi commented Jul 8, 2019

What changes were proposed in this pull request?

There is the following code in TransportClientFactory#createClient

    int clientIndex = rand.nextInt(numConnectionsPerPeer);
    TransportClient cachedClient = clientPool.clients[clientIndex];

which choose a client from its pool randomly. If we are unlucky we might not get the max number of connections out, but less than that.

To prove that I've tried out the following test:

  @Test
  public void testRandom() {
    Random rand = new Random();
    Set<Integer> clients = Collections.synchronizedSet(new HashSet<>());
    long iterCounter = 0;
    while (true) {
      iterCounter++;
      int maxConnections = 4;
      clients.clear();
      for (int i = 0; i < maxConnections * 10; i++) {
        int clientIndex = rand.nextInt(maxConnections);
        clients.add(clientIndex);
      }
      if (clients.size() != maxConnections) {
        System.err.println("Unexpected clients size (iterCounter=" + iterCounter + "): " + clients.size() + ", maxConnections: " + maxConnections);
      }
      if (iterCounter % 100000 == 0) {
        System.out.println("IterCounter: " + iterCounter);
      }
    }
  }

Result:

Unexpected clients size (iterCounter=22388): 3, maxConnections: 4
Unexpected clients size (iterCounter=36244): 3, maxConnections: 4
Unexpected clients size (iterCounter=85798): 3, maxConnections: 4
IterCounter: 100000
Unexpected clients size (iterCounter=97108): 3, maxConnections: 4
Unexpected clients size (iterCounter=119121): 3, maxConnections: 4
Unexpected clients size (iterCounter=129948): 3, maxConnections: 4
Unexpected clients size (iterCounter=173736): 3, maxConnections: 4
Unexpected clients size (iterCounter=178138): 3, maxConnections: 4
Unexpected clients size (iterCounter=195108): 3, maxConnections: 4
IterCounter: 200000
Unexpected clients size (iterCounter=209006): 3, maxConnections: 4
Unexpected clients size (iterCounter=217105): 3, maxConnections: 4
Unexpected clients size (iterCounter=222456): 3, maxConnections: 4
Unexpected clients size (iterCounter=226899): 3, maxConnections: 4
Unexpected clients size (iterCounter=229101): 3, maxConnections: 4
Unexpected clients size (iterCounter=253549): 3, maxConnections: 4
Unexpected clients size (iterCounter=277550): 3, maxConnections: 4
Unexpected clients size (iterCounter=289637): 3, maxConnections: 4
...

In this PR I've adapted the test code not to have this flakyness.

How was this patch tested?

Additional (not committed test) + existing unit tests in a loop.

@SparkQA
Copy link

SparkQA commented Jul 8, 2019

Test build #107343 has finished for PR 25075 at commit 1cf4635.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


Assert.assertEquals(0, failed.get());
Assert.assertEquals(clients.size(), maxConnections);
Assert.assertTrue(clients.size() <= maxConnections);
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jul 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the investigation and this PR, @gaborgsomogyi .
Could you update the PR description if we have a previous Jenkins test failure for this, please?


Assert.assertEquals(0, failed.get());
Assert.assertEquals(clients.size(), maxConnections);
Assert.assertTrue(clients.size() <= maxConnections);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose there's no way to ask it to not return a cached instance?
This seems to be testing that they're all different, but, that doesn't seem to be the current behavior, nor is it necessarily desirable. So, OK.

@dongjoon-hyun
Copy link
Member

I also hit this issue before. But, when I searched Jenkins failure logs today, I can not find one because this happens seldomly. The way of proof in this PR description looks enough for now.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @gaborgsomogyi and @srowen .
Merged to master/2.4/2.3.

dongjoon-hyun pushed a commit that referenced this pull request Jul 8, 2019
There is the following code in [TransportClientFactory#createClient](https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L150)
```
    int clientIndex = rand.nextInt(numConnectionsPerPeer);
    TransportClient cachedClient = clientPool.clients[clientIndex];
```
which choose a client from its pool randomly. If we are unlucky we might not get the max number of connections out, but less than that.

To prove that I've tried out the following test:
```java
  Test
  public void testRandom() {
    Random rand = new Random();
    Set<Integer> clients = Collections.synchronizedSet(new HashSet<>());
    long iterCounter = 0;
    while (true) {
      iterCounter++;
      int maxConnections = 4;
      clients.clear();
      for (int i = 0; i < maxConnections * 10; i++) {
        int clientIndex = rand.nextInt(maxConnections);
        clients.add(clientIndex);
      }
      if (clients.size() != maxConnections) {
        System.err.println("Unexpected clients size (iterCounter=" + iterCounter + "): " + clients.size() + ", maxConnections: " + maxConnections);
      }
      if (iterCounter % 100000 == 0) {
        System.out.println("IterCounter: " + iterCounter);
      }
    }
  }
```

Result:
```
Unexpected clients size (iterCounter=22388): 3, maxConnections: 4
Unexpected clients size (iterCounter=36244): 3, maxConnections: 4
Unexpected clients size (iterCounter=85798): 3, maxConnections: 4
IterCounter: 100000
Unexpected clients size (iterCounter=97108): 3, maxConnections: 4
Unexpected clients size (iterCounter=119121): 3, maxConnections: 4
Unexpected clients size (iterCounter=129948): 3, maxConnections: 4
Unexpected clients size (iterCounter=173736): 3, maxConnections: 4
Unexpected clients size (iterCounter=178138): 3, maxConnections: 4
Unexpected clients size (iterCounter=195108): 3, maxConnections: 4
IterCounter: 200000
Unexpected clients size (iterCounter=209006): 3, maxConnections: 4
Unexpected clients size (iterCounter=217105): 3, maxConnections: 4
Unexpected clients size (iterCounter=222456): 3, maxConnections: 4
Unexpected clients size (iterCounter=226899): 3, maxConnections: 4
Unexpected clients size (iterCounter=229101): 3, maxConnections: 4
Unexpected clients size (iterCounter=253549): 3, maxConnections: 4
Unexpected clients size (iterCounter=277550): 3, maxConnections: 4
Unexpected clients size (iterCounter=289637): 3, maxConnections: 4
...
```

In this PR I've adapted the test code not to have this flakyness.

Additional (not committed test) + existing unit tests in a loop.

Closes #25075 from gaborgsomogyi/SPARK-28261.

Authored-by: Gabor Somogyi <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e11a558)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Jul 8, 2019
There is the following code in [TransportClientFactory#createClient](https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L150)
```
    int clientIndex = rand.nextInt(numConnectionsPerPeer);
    TransportClient cachedClient = clientPool.clients[clientIndex];
```
which choose a client from its pool randomly. If we are unlucky we might not get the max number of connections out, but less than that.

To prove that I've tried out the following test:
```java
  Test
  public void testRandom() {
    Random rand = new Random();
    Set<Integer> clients = Collections.synchronizedSet(new HashSet<>());
    long iterCounter = 0;
    while (true) {
      iterCounter++;
      int maxConnections = 4;
      clients.clear();
      for (int i = 0; i < maxConnections * 10; i++) {
        int clientIndex = rand.nextInt(maxConnections);
        clients.add(clientIndex);
      }
      if (clients.size() != maxConnections) {
        System.err.println("Unexpected clients size (iterCounter=" + iterCounter + "): " + clients.size() + ", maxConnections: " + maxConnections);
      }
      if (iterCounter % 100000 == 0) {
        System.out.println("IterCounter: " + iterCounter);
      }
    }
  }
```

Result:
```
Unexpected clients size (iterCounter=22388): 3, maxConnections: 4
Unexpected clients size (iterCounter=36244): 3, maxConnections: 4
Unexpected clients size (iterCounter=85798): 3, maxConnections: 4
IterCounter: 100000
Unexpected clients size (iterCounter=97108): 3, maxConnections: 4
Unexpected clients size (iterCounter=119121): 3, maxConnections: 4
Unexpected clients size (iterCounter=129948): 3, maxConnections: 4
Unexpected clients size (iterCounter=173736): 3, maxConnections: 4
Unexpected clients size (iterCounter=178138): 3, maxConnections: 4
Unexpected clients size (iterCounter=195108): 3, maxConnections: 4
IterCounter: 200000
Unexpected clients size (iterCounter=209006): 3, maxConnections: 4
Unexpected clients size (iterCounter=217105): 3, maxConnections: 4
Unexpected clients size (iterCounter=222456): 3, maxConnections: 4
Unexpected clients size (iterCounter=226899): 3, maxConnections: 4
Unexpected clients size (iterCounter=229101): 3, maxConnections: 4
Unexpected clients size (iterCounter=253549): 3, maxConnections: 4
Unexpected clients size (iterCounter=277550): 3, maxConnections: 4
Unexpected clients size (iterCounter=289637): 3, maxConnections: 4
...
```

In this PR I've adapted the test code not to have this flakyness.

Additional (not committed test) + existing unit tests in a loop.

Closes #25075 from gaborgsomogyi/SPARK-28261.

Authored-by: Gabor Somogyi <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e11a558)
Signed-off-by: Dongjoon Hyun <[email protected]>
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
There is the following code in [TransportClientFactory#createClient](https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L150)
```
    int clientIndex = rand.nextInt(numConnectionsPerPeer);
    TransportClient cachedClient = clientPool.clients[clientIndex];
```
which choose a client from its pool randomly. If we are unlucky we might not get the max number of connections out, but less than that.

To prove that I've tried out the following test:
```java
  Test
  public void testRandom() {
    Random rand = new Random();
    Set<Integer> clients = Collections.synchronizedSet(new HashSet<>());
    long iterCounter = 0;
    while (true) {
      iterCounter++;
      int maxConnections = 4;
      clients.clear();
      for (int i = 0; i < maxConnections * 10; i++) {
        int clientIndex = rand.nextInt(maxConnections);
        clients.add(clientIndex);
      }
      if (clients.size() != maxConnections) {
        System.err.println("Unexpected clients size (iterCounter=" + iterCounter + "): " + clients.size() + ", maxConnections: " + maxConnections);
      }
      if (iterCounter % 100000 == 0) {
        System.out.println("IterCounter: " + iterCounter);
      }
    }
  }
```

Result:
```
Unexpected clients size (iterCounter=22388): 3, maxConnections: 4
Unexpected clients size (iterCounter=36244): 3, maxConnections: 4
Unexpected clients size (iterCounter=85798): 3, maxConnections: 4
IterCounter: 100000
Unexpected clients size (iterCounter=97108): 3, maxConnections: 4
Unexpected clients size (iterCounter=119121): 3, maxConnections: 4
Unexpected clients size (iterCounter=129948): 3, maxConnections: 4
Unexpected clients size (iterCounter=173736): 3, maxConnections: 4
Unexpected clients size (iterCounter=178138): 3, maxConnections: 4
Unexpected clients size (iterCounter=195108): 3, maxConnections: 4
IterCounter: 200000
Unexpected clients size (iterCounter=209006): 3, maxConnections: 4
Unexpected clients size (iterCounter=217105): 3, maxConnections: 4
Unexpected clients size (iterCounter=222456): 3, maxConnections: 4
Unexpected clients size (iterCounter=226899): 3, maxConnections: 4
Unexpected clients size (iterCounter=229101): 3, maxConnections: 4
Unexpected clients size (iterCounter=253549): 3, maxConnections: 4
Unexpected clients size (iterCounter=277550): 3, maxConnections: 4
Unexpected clients size (iterCounter=289637): 3, maxConnections: 4
...
```

In this PR I've adapted the test code not to have this flakyness.

Additional (not committed test) + existing unit tests in a loop.

Closes apache#25075 from gaborgsomogyi/SPARK-28261.

Authored-by: Gabor Somogyi <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e11a558)
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
There is the following code in [TransportClientFactory#createClient](https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L150)
```
    int clientIndex = rand.nextInt(numConnectionsPerPeer);
    TransportClient cachedClient = clientPool.clients[clientIndex];
```
which choose a client from its pool randomly. If we are unlucky we might not get the max number of connections out, but less than that.

To prove that I've tried out the following test:
```java
  Test
  public void testRandom() {
    Random rand = new Random();
    Set<Integer> clients = Collections.synchronizedSet(new HashSet<>());
    long iterCounter = 0;
    while (true) {
      iterCounter++;
      int maxConnections = 4;
      clients.clear();
      for (int i = 0; i < maxConnections * 10; i++) {
        int clientIndex = rand.nextInt(maxConnections);
        clients.add(clientIndex);
      }
      if (clients.size() != maxConnections) {
        System.err.println("Unexpected clients size (iterCounter=" + iterCounter + "): " + clients.size() + ", maxConnections: " + maxConnections);
      }
      if (iterCounter % 100000 == 0) {
        System.out.println("IterCounter: " + iterCounter);
      }
    }
  }
```

Result:
```
Unexpected clients size (iterCounter=22388): 3, maxConnections: 4
Unexpected clients size (iterCounter=36244): 3, maxConnections: 4
Unexpected clients size (iterCounter=85798): 3, maxConnections: 4
IterCounter: 100000
Unexpected clients size (iterCounter=97108): 3, maxConnections: 4
Unexpected clients size (iterCounter=119121): 3, maxConnections: 4
Unexpected clients size (iterCounter=129948): 3, maxConnections: 4
Unexpected clients size (iterCounter=173736): 3, maxConnections: 4
Unexpected clients size (iterCounter=178138): 3, maxConnections: 4
Unexpected clients size (iterCounter=195108): 3, maxConnections: 4
IterCounter: 200000
Unexpected clients size (iterCounter=209006): 3, maxConnections: 4
Unexpected clients size (iterCounter=217105): 3, maxConnections: 4
Unexpected clients size (iterCounter=222456): 3, maxConnections: 4
Unexpected clients size (iterCounter=226899): 3, maxConnections: 4
Unexpected clients size (iterCounter=229101): 3, maxConnections: 4
Unexpected clients size (iterCounter=253549): 3, maxConnections: 4
Unexpected clients size (iterCounter=277550): 3, maxConnections: 4
Unexpected clients size (iterCounter=289637): 3, maxConnections: 4
...
```

In this PR I've adapted the test code not to have this flakyness.

Additional (not committed test) + existing unit tests in a loop.

Closes apache#25075 from gaborgsomogyi/SPARK-28261.

Authored-by: Gabor Somogyi <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e11a558)
Signed-off-by: Dongjoon Hyun <[email protected]>
XUJiahua pushed a commit to XUJiahua/spark that referenced this pull request Apr 9, 2020
There is the following code in [TransportClientFactory#createClient](https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L150)
```
    int clientIndex = rand.nextInt(numConnectionsPerPeer);
    TransportClient cachedClient = clientPool.clients[clientIndex];
```
which choose a client from its pool randomly. If we are unlucky we might not get the max number of connections out, but less than that.

To prove that I've tried out the following test:
```java
  Test
  public void testRandom() {
    Random rand = new Random();
    Set<Integer> clients = Collections.synchronizedSet(new HashSet<>());
    long iterCounter = 0;
    while (true) {
      iterCounter++;
      int maxConnections = 4;
      clients.clear();
      for (int i = 0; i < maxConnections * 10; i++) {
        int clientIndex = rand.nextInt(maxConnections);
        clients.add(clientIndex);
      }
      if (clients.size() != maxConnections) {
        System.err.println("Unexpected clients size (iterCounter=" + iterCounter + "): " + clients.size() + ", maxConnections: " + maxConnections);
      }
      if (iterCounter % 100000 == 0) {
        System.out.println("IterCounter: " + iterCounter);
      }
    }
  }
```

Result:
```
Unexpected clients size (iterCounter=22388): 3, maxConnections: 4
Unexpected clients size (iterCounter=36244): 3, maxConnections: 4
Unexpected clients size (iterCounter=85798): 3, maxConnections: 4
IterCounter: 100000
Unexpected clients size (iterCounter=97108): 3, maxConnections: 4
Unexpected clients size (iterCounter=119121): 3, maxConnections: 4
Unexpected clients size (iterCounter=129948): 3, maxConnections: 4
Unexpected clients size (iterCounter=173736): 3, maxConnections: 4
Unexpected clients size (iterCounter=178138): 3, maxConnections: 4
Unexpected clients size (iterCounter=195108): 3, maxConnections: 4
IterCounter: 200000
Unexpected clients size (iterCounter=209006): 3, maxConnections: 4
Unexpected clients size (iterCounter=217105): 3, maxConnections: 4
Unexpected clients size (iterCounter=222456): 3, maxConnections: 4
Unexpected clients size (iterCounter=226899): 3, maxConnections: 4
Unexpected clients size (iterCounter=229101): 3, maxConnections: 4
Unexpected clients size (iterCounter=253549): 3, maxConnections: 4
Unexpected clients size (iterCounter=277550): 3, maxConnections: 4
Unexpected clients size (iterCounter=289637): 3, maxConnections: 4
...
```

In this PR I've adapted the test code not to have this flakyness.

Additional (not committed test) + existing unit tests in a loop.

Closes apache#25075 from gaborgsomogyi/SPARK-28261.

Authored-by: Gabor Somogyi <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e11a558)

Change-Id: I2a3b4cbf6e56a8cac992dba0a3bf3a2afaf8ee59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants