Skip to content

Conversation

@clintropolis
Copy link
Member

Description

This fixes a null pointer exception that occurs in numerous places when polling for segments while having an empty metadata database after the changes of #7653. I'm not certain if this is the correct fix since previously the snapshot would remain null until there were segments available during a poll, but it resolves the issue for me at least, and doesn't appear to have any ill side-effects.

2019-07-18T20:48:10,597 WARN [qtp265052195-81] org.eclipse.jetty.server.HttpChannel - /druid/coordinator/v1/metadata/segments
java.lang.NullPointerException
    at org.apache.druid.server.http.MetadataResource.getAllUsedSegmentsWithOvershadowedStatus(MetadataResource.java:174) ~[classes/:?]
    at org.apache.druid.server.http.MetadataResource.getAllUsedSegments(MetadataResource.java:142) ~[classes/:?]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_192]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_192]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_192]
    at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) ~[jersey-server-1.19.3.jar:1.19.3]
2019-07-18T21:26:15,173 ERROR [Coordinator-Exec--0] org.apache.druid.server.coordinator.DruidCoordinator - Caught exception, ignoring so that schedule keeps going.: {class=org.apache.druid.server.coordinator.DruidCoordinator, exceptionType=class java.lang.NullPointerException, exceptionMessage=null}
java.lang.NullPointerException
	at org.apache.druid.server.coordinator.DruidCoordinatorRuntimeParams$Builder.withSnapshotOfDataSourcesWithAllUsedSegments(DruidCoordinatorRuntimeParams.java:354) ~[classes/:?]
	at org.apache.druid.server.coordinator.DruidCoordinator$CoordinatorRunnable.run(DruidCoordinator.java:657) [classes/:?]
	at org.apache.druid.server.coordinator.DruidCoordinator$2.call(DruidCoordinator.java:559) [classes/:?]

etc.


This PR has:

  • been self-reviewed.
  • been tested in a localhost Druid cluster.

ImmutableMap<String, String> dataSourceProperties = createDefaultDataSourceProperties();
if (segments == null || segments.isEmpty()) {
log.info("No segments found in the database!");
dataSourcesSnapshot = DataSourcesSnapshot.fromUsedSegments(Collections.emptyList(), dataSourceProperties);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how segments == null is possible here, but if dataSourcesSnapshot is already non-null, probably this assignment should be skipped. The cause may be intermittent database unavailability or failure. This condition should be logged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I was not able to observe segments being null in my experiments, which mostly consisted of killing and restarting the mysql i was testing against. It doesn't look like it should be able to be null, the javadoc for the Query.list method

     * Executes the select
     * <p/>
     * Will eagerly load all results
     *
     * @throws org.skife.jdbi.v2.exceptions.UnableToCreateStatementException
     *                            if there is an error creating the statement
     * @throws org.skife.jdbi.v2.exceptions.UnableToExecuteStatementException
     *                            if there is an error executing the statement
     * @throws org.skife.jdbi.v2.exceptions.ResultSetException if there is an error dealing with the result set

makes it look like it's either going to make a list or throw an exception.

If we are confident it shouldn't happen then it should just be removed i think, but if still unsure it might make more sense to handle segments == null separate and log and probably not update the snapshot even if it's still null, because it's an unexpected condition that doesn't necessarily mean the same thing as a truly empty segments table. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's harmless to put an if (segments == null) block that does a log.wtf and returns early, will go ahead and modify to do that just in case.


if (segments == null || segments.isEmpty()) {
log.info("No segments found in the database!");
if (segments == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is pointless, since .list() above cannot return null and inReadOnlyTransaction will not convert a nonnull return to null. Might as well remove it, or include a Preconditions.checkNotNull or something (throw an exception on the unanticipated null rather than returning early and leaving the snapshot null).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preconditions check makes sense to me, to control the error messaging here instead of allowing the NPE to happen downstream, will update

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, actually this is just in the poll, i guess the NPE would still potentially happen, regardless that approach seems cleaner.

// effect of a segment mark call reflected in MetadataResource API calls.

ImmutableMap<String, String> dataSourceProperties = createDefaultDataSourceProperties();
if (segments.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is the fix, right? (Continuing on even if segments is empty?)

It looks good to me.

@gianm gianm merged commit f24e2f1 into apache:master Jul 21, 2019
@clintropolis
Copy link
Member Author

Thanks for the review!

@clintropolis clintropolis deleted the sql-metadata-snapshot-npe-fix branch July 21, 2019 10:07
@clintropolis clintropolis added this to the 0.16.0 milestone Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants