Skip to content

Conversation

@mobasherul-ibm
Copy link

No description provided.

@mobasherul-ibm mobasherul-ibm force-pushed the state_repo_reconnect branch 5 times, most recently from 4550847 to 891ddc3 Compare May 16, 2025 12:54
@mobasherul-ibm mobasherul-ibm force-pushed the state_repo_reconnect branch 2 times, most recently from 159d9b5 to 71eb82d Compare May 19, 2025 04:51
fail();
} catch (ExecutionException e) {
assertThat(e.getCause().getCause().getCause(), instanceOf(ReconnectInProgressException.class));
assertThat(e.getCause().getCause().getCause().getCause(), instanceOf(ReconnectInProgressException.class));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional exception nesting ... this is a behavior change and should be reconsidered.

@Override
public void putFailure(K key, V value, StoreAccessException e) {
if (e.getCause() instanceof ReconnectInProgressException) {
if (e.getCause().getCause() instanceof ReconnectInProgressException || e.getCause() instanceof ReconnectInProgressException) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional level of exception testing bothers me. Can it be avoided?

Copy link
Member

@chrisdennis chrisdennis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the exception handling here seem like they might be amplifying and obfuscating these changes. If we can manage to preserve much of the original exception throwing/logic I feel we'll have a smaller more trustable and reviewable changeset.

});

getSucceededFuture.get(20000, TimeUnit.MILLISECONDS);
assertThat(getSucceededFuture::isDone, eventually().is(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This squelches any exceptions throw by the underlying task.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't get that exception will be thrown for some time since reconnection is happening in background. I changed this since it can be slow at times for reconnect to complete. Eventually will still throw the timeout exception as it waits for 1 min for the future to succeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of us is misunderstanding the other... isDone just means complete, either exceptionally or normally. If it completed exceptionally we probably want to know what the exception is.

@mobasherul-ibm
Copy link
Author

The changes to the exception handling here seem like they might be amplifying and obfuscating these changes. If we can manage to preserve much of the original exception throwing/logic I feel we'll have a smaller more trustable and reviewable changeset.

Earlier flow was store -> reconnectingProxy -> commonProxy -> actualentity where exception flow was -
(StoreAccessException <- ReconnectInProgressException) when connection breaks Or for other scenarios
(StoreAccessException<-ServerStoreProxyException<-SomeOtherException) because commonProxy wraps ConnectionClosedException with ServerStoreProxyException and that is passed to reconnectingProxy which just throws ReconnectInProgressException.

Since I removed reconnectingProxy and introduced reconnectable entity flow has changed to -
store -> commonProxy -> reconnectableentity -> actual entity where exception flow has changed to -
(StoreAccessException <- ServerStoreProxyException <- ReconnectInProgressException) when connection breaks as commonProxy is now wrapping the ReconnectInProgressException with ServerStoreProxyException and for other exception it will be -
(StoreAccessException <-. ServerStoreProxyException <- SomeOtherException).

@mobasherul-ibm mobasherul-ibm force-pushed the state_repo_reconnect branch 3 times, most recently from cba165e to 168f777 Compare May 21, 2025 14:29
@mobasherul-ibm mobasherul-ibm force-pushed the state_repo_reconnect branch 2 times, most recently from 22fde53 to 3066706 Compare May 21, 2025 18:56
@cljohnso
Copy link

cljohnso commented May 28, 2025

@mobasherul-ibm please do not resolve comments you did not create -- it makes re-review considerably more difficult. If you want to keep track of items you've addressed, reply to the comment with a +1 or a :white_check_mark, even a simple done.

@cljohnso
Copy link

Still have a test failure in org.ehcache.clustered.IterationFailureBehaviorTest#testIteratorReconnect

@mobasherul-ibm mobasherul-ibm force-pushed the state_repo_reconnect branch 5 times, most recently from 662e6bb to d7a7a21 Compare June 5, 2025 18:08
@mobasherul-ibm mobasherul-ibm requested a review from cljohnso June 6, 2025 06:18
@mobasherul-ibm mobasherul-ibm force-pushed the state_repo_reconnect branch from d7a7a21 to 6ca13df Compare June 6, 2025 07:02
}

@Test
@Ignore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being ignored with expectation for a future repair? If so, an issue number should be mentioned.

}

@Test
@Ignore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being ignored with expectation for a future repair? If so, an issue number should be mentioned.

systemProperty 'kitInstallationPath', "$unzipKit.destinationDir/${project(':clustered:ehcache-clustered').archivesBaseName}-$project.version-kit"
// Uncomment to include client logging in console output
// testLogging.showStandardStreams = true
testLogging.showStandardStreams = true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Permanent change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove once approved

= CacheManagerBuilder.newCacheManagerBuilder()
.with(ClusteringServiceConfigurationBuilder.cluster(CLUSTER.get().getConnectionURI().resolve("/iterator-cm"))
.autoCreate(server -> server.defaultServerResource("primary-server-resource")));
.autoCreateOnReconnect(server -> server.defaultServerResource("primary-server-resource")));
Copy link
Member

@chrisdennis chrisdennis Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the logic of why this should need to be autoCreateOnReconnect... what I don't get is how we were getting away with it before. What changed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

earlier entity were not reconnectable so there were no connection retries but now that changed thats why I changed it to ensure all connections are closed at the end of this test.

Copy link

@cljohnso cljohnso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs conversations with Chris and possibly some minor tweaks.

cache2.put(0L, "secondcache");
});

assertThat(() -> putFuture1.get(5000, TimeUnit.MILLISECONDS),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to need an eventually to have a stable test?

threw(Matchers.<Throwable>both(instanceOf(ExecutionException.class))
.and(hasCause(hasCause(hasCause(hasCause(instanceOf(ReconnectInProgressException.class))))))));

assertThat(() -> putFuture2.get(5000, TimeUnit.MILLISECONDS),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to need an eventually to have a stable test?

cache2.put(0L, "secondValue");
expireLease();

assertThat(() -> cache1.get(0L),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible the reconnect succeeds before this statement runs?


cacheManager.removeCache("clustered-cache1");

try {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better with the assertThat(..., threw(...)) pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants