-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-18456. NullPointerException in ObjectListingIterator. #4909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18456. NullPointerException in ObjectListingIterator. #4909
Conversation
The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create() where a strong reference to the new value is kept in a local variable *and referred to later* so that the JVM will not GC it. Change-Id: I29929965c31c8e6b2ab9a491fbadb40871f10c3d
|
s3a testing in progress |
mehakmeet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, really like the descriptive comments to make it clear.
In terms of testing I was wondering if we could use Runtime.getRuntime().gc() to actually run between calling threadMap.getForCurrentThread(); without storing the ref and asserting that the count of entries created is 2.
Something like this maybe? I'm not sure of the implications thisRuntime.getRuntime().gc() would have on rest of the tests tho, should be fine I think.
threadMap.getForCurrentThread();
Runtime.getRuntime().gc();
threadMap.getForCurrentThread();
Assertions.assertThat(threadMap.getEntriesCreatedCount()).isEqualTo(2);
Assertions.assertThat(threadMap.getReferenceLostCount()).isEqualTo(1);
| .describedAs("current thread map value on second set") | ||
| .isEqualTo("hello"); | ||
|
|
||
| // it is forbidden to explictly set to null via the set() call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo "explictly"
| .isNull(); | ||
|
|
||
| // second attempt returns itself | ||
| Assertions.assertThat(threadMap.setForCurrentThread("hello")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
little doubt here: what happens if we set it to "hello2" this time, does the set return "hello" or "hello2"?
Can you add the next assert to be of a different value than "hello", just to confirm if set actually returns the previous set value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added it at the end, to show the dynamic value is returned on the overwrite...easiest place to add
| } | ||
|
|
||
| /** | ||
| /**y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo
| // note if there was any change in the reference. | ||
| // as this forces strongRef to be kept in scope | ||
| if (strongRef != resolvedStrongRef) { | ||
| LOG.debug("Created instance for key {}: {} overwritten by {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is the case, shouldn't we raise an exception? Are we not returning the wrong value then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's just addressing a race condition...the reason we do the lookup is to ensure that all threads share the same instance. the exact choice of which one is not considered relevant
| /** | ||
| * Set the new value for the current thread. | ||
| * @param newVal new reference to set for the active thread. | ||
| * @return any old value, possibly null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "any old value", or should this be "previously set value"?
| computation from any live thread." | ||
| */ | ||
|
|
||
| final V strongRef = requireNonNull(factory.apply(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Message for a factory returning null instance.
|
problem with the gc stuff is what it took to trigger it in the earlier test for that...you need to make a lot of attempts and put memory load on the system before that gc() call does anything. it's more of a request than a command, and the JVM is free to ignore it |
|
test failure is already fixed in #4900 |
Change-Id: I5aa94d2c4dc6640ab08e5d531ae9b650575d0cc7
Change-Id: I4b434ffc495ce0a4f5c1dea0047983fa6348b642
|
merged with trunk to fix the failure. one test failure against s3 london, anyone else seen this? race condition maybe? |
Tried to create a real GC, but System.gc() wouldn't do it, and it is too brittle to safely use in a test case anyway Change-Id: I9f3eadaa75125bb481f193ae29752483a434bf8a
|
🎊 +1 overall
This message was automatically generated. |
Change-Id: Ibc7e2f2134c8bf43412cdbf9821649ed53ef81d5
|
🎊 +1 overall
This message was automatically generated. |
mukund-thakur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1.
This problem surfaced in impala integration tests IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build after the change HADOOP-17461. Add thread-level IOStatistics Context The actual GC race condition came with HADOOP-18091. S3A auditing leaks memory through ThreadLocal references The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create() where a strong reference to the new value is kept in a local variable *and referred to later* so that the JVM will not GC it. Along with the fix, extra assertions ensure that if the problem is not fixed, applications will fail faster/more meaningfully. Contributed by Steve Loughran.
dannycjones
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already merged but...
+1 (non-binding), lgtm. I had one question but no concern.
| // resolve that reference, handling the situation where somehow it was removed from the map | ||
| // between the put() and the get() | ||
| resolvedStrongRef = resolve(retrievedWeakRef); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a strong reference on L207, why do we expect to lose it?
I could see it happening in the old create(K key) implementation, but less so here.
…4909) This problem surfaced in impala integration tests IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build after the change HADOOP-17461. Add thread-level IOStatistics Context The actual GC race condition came with HADOOP-18091. S3A auditing leaks memory through ThreadLocal references The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create() where a strong reference to the new value is kept in a local variable *and referred to later* so that the JVM will not GC it. Along with the fix, extra assertions ensure that if the problem is not fixed, applications will fail faster/more meaningfully. Contributed by Steve Loughran.
The fix for HADOOP-18456/IMPALA-11592 is, if our hypothesis is correct,
in WeakReferenceMap.create() where a strong reference to the new value
is kept in a local variable and referred to later so that the JVM will not GC it.
Description of PR
WeakReferenceMap.create() is resistant and resilient to GC taking place during
its creation process.
Local variables were renamed to show when refs are strong vs. weak.
How was this patch tested?
There's a new test, but otherwise code review.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?