-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-25839 Bulk Import fails with java.io.IOException: Type mismatch in value from map #3232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
offermannu
wants to merge
2
commits into
apache:master
Choose a base branch
from
offermannu:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2
−4
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PrivateCellUtil.writeFlatKeywrites the key value in a different format fromKeyValueUtil.write. This breaks compatibility, can we just keep writing in the same format as before?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever
CellWritableComparable.write()writes must be compatible withCellWritableComparable.readFieldswhich is currently not the case (see this exception in HBASE-25839. The phrase "keyLength=0" inside the exception message comes actually from the statement
out.writeIn(0)in line 139).My proposal aligns the "write" with the existing
CellWritableComparable.readFields. Theoretically one can adjustCellWritableComparable.readFieldsso that it becomes compatible with the currentwritemethod but this looks more complicated to me.AFAICS the key is only used during the mapper sorting phase (see stacktrace). The reducer doesn't care about the key at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are basically reverting changes from commit 0a24178 from 4 years ago (HBASE-18649). Can you get rid of that exception if you just remove the
out.writeInt(0)there? I'm not sure, though, on other potential side effects. Maybe we would also need to changeImport.readFieldsto stay aligned with HBASE-18649 (I guess we could useCellBuilderto construct aCelldirectly).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, but it doesn't work unfortunately. If you look at the
readFieldsmethod you'll see that it callsKeyValue.create(DataInput in)which in turn callsKeyValue.create(int length, DataInput in). The length is read as integer from the DataInput which kind of matches whatwritehas written before in line 138 - so far so good (I'm refering to original version btw).Honestly I'm a little bit confused at this point and don't know whether
lengthexpects the key length or the length of the whole key-value pair. You'll see thatKeyValueallocates a byte array of lengthlength(line 2266) and so I guess that it must be the length of key+value.Now keep in mind that the first 4 bytes are consumed from DataInput (the length) and
KeyValue.createcalls:in.readFully(bytes); return new KeyValue(bytes, 0, length);- thusbytesstarts with four '0' bytes coming fromImport.write line 139: out.writeInt(0);!The KeyValue constructor in turn calls
KeyValueUtil.checkKeyValueBytes. And here I discovered that the serialization can never match the deserialization becauseKeyValueUtil.checkKeyValueBytesexpects the key len at the beginning of thebytesbuffer (note thatoffset==0) but this is zero as mentioned before. The check finally results into the exception:Error: java.lang.IllegalArgumentException: Invalid key length in KeyValue. keyLength=0, KeyValueBytesHex=\x00\x00\x00\x00\x00Lc30f5d93[...], offset=0, length=97To sum it up:
CellWritableComparable.writewrites:[ [4 bytes key length], [4 bytes '0'], [...] ]
CellWritableComparable.readFieldsexpects:[ [4bytes key-value length], [4 bytes key length], [...] ]
Honestly, I wonder how that ever worked ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so how about the other option I mentioned in my last comment:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the implementation enough to evaluate the alternatives. As far as I can tell
CellWritableComparable.readFieldsuses kind of standard deserialization methods whileCellWritableComparable.writeimplements its own serialization methods.But before we find out what is the best way we should ask ourselfs if this feature is used at all. Since the code is broken for at least 4 years it seems to me that nobody uses the large import feature.
I ran into this problem because we had to import 2,5 billion records into a new HBase/Hadoop cluster. The direct import failed because the regionserver was flooded with write requests and we got lots of
RegionTooBusyExceptions. So we tried the bulk import and ran into this bugs which I tried to fix. Unfortunately it didn't work either because YARN produced too many cache files for local sorting and the disks ran full - but that's another story.Finally we found a way to get the direct import working again which means I don't have a testing environment anymore.
IMHO if the feature is not used at all and there is no test available, it would be smartest to delete the feature.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch in HBASE-18649 changed this part and made to write only the key part of the cell (rowkey + cf + q + ts + type) instead of the entire KV. So it writes key length
out.writeInt(PrivateCellUtil.estimatedSerializedSizeOfKey(kv));
And then value len as 0
and then key part.
But this is not compatible with counter part read where it expects a prefix int part which says the overall length of this KV (Ya what we wrote is still a KV , or call it Cell, where value is empty).
So I agree to the fact that this never worked.
We can not get rid of out.writeInt(0).
On the other hand we have to write this entire KV length what we are going to write. This is key len + value len(0) + 4 + 4 (we write these extra 4 bytes twice for these 2 lengths too)
int keyLen = CellUtil.estimatedSerializedSizeOfKey(kv);
int valueLen = 0; // We avoid writing value here. So just serialize as if an empty value.
out.writeInt(keyLen+ valueLen + KeyValue.KEYVALUE_INFRASTRUCTURE_SIZE);
out.writeInt(keyLen);
out.writeInt(valueLen);
CellUtil.writeFlatKey(kv, out);
Not sure whether we already have some util method for this. Else worth adding?
Above will work @offermannu ?