Skip to content

Update LargeFileUploadTask.java#1531

Merged
Ndiritu merged 9 commits intomicrosoftgraph:fix/lfufrom
ffcdf:patch-1
Mar 19, 2024
Merged

Update LargeFileUploadTask.java#1531
Ndiritu merged 9 commits intomicrosoftgraph:fix/lfufrom
ffcdf:patch-1

Conversation

@ffcdf
Copy link
Copy Markdown
Contributor

@ffcdf ffcdf commented Mar 1, 2024

Suggested modification to the chunkInputStream method:

From the second pass through this method, the code in the line "int lengthAssert = stream.read(buffer, begin, length);" throws the exception java.lang.IndexOutOfBoundsException when trying to assign the read bytes to a non-existent position in the buffer.

The suggested change implies changing InputStream to FileInputStream throughout the LargeFileUploadTask class.

Fixes # IndexOutOfBoundException in method chunkInputStream

Changes proposed in this pull request

  • Change InputStream to FileInputStream.
  • Change
    byte[] buffer = new byte[length]; int lengthAssert = stream.read(buffer, begin, length);
    to
    ByteBuffer byteBuffer = ByteBuffer.allocate(length); int lengthAssert = stream.getChannel().read(byteBuffer); byteBuffer.flip(); byte[] buffer = byteBuffer.array(); byteBuffer.clear();

Other links

Suggested modification to the chunkInputStream method:

From the second pass through this method, the code in the line "int lengthAssert = stream.read(buffer, begin, length);" throws the exception java.lang.IndexOutOfBoundsException when trying to assign the read bytes to a non-existent position in the buffer.

The suggested change implies changing InputStream to FileInputStream throughout the LargeFileUploadTask class.
@ffcdf ffcdf requested a review from a team as a code owner March 1, 2024 23:35
@ffcdf
Copy link
Copy Markdown
Contributor Author

ffcdf commented Mar 1, 2024

@microsoft-github-policy-service agree

@ffcdf ffcdf closed this Mar 4, 2024
According to the documentation:

public int read(byte[] b, int off, int len) throws IOException

The first byte read is stored into element b[off], the next one into b[off+1], and so on.

However, from the second pass through the chunkInputStream method, off is greater than any position of b and therefore, b[off] does not exist.
@ffcdf ffcdf reopened this Mar 4, 2024
@Ndiritu
Copy link
Copy Markdown
Contributor

Ndiritu commented Mar 4, 2024

@ffcdf thanks for the changes.

Please update the Changelog as well. You can add a section for 3.1.7.

@Ndiritu
Copy link
Copy Markdown
Contributor

Ndiritu commented Mar 4, 2024

Since there seems to be a follow-up issue, I think we can do without the version bump to prevent releasing something broken?
cc: @ramsessanchez @baywet

@baywet
Copy link
Copy Markdown
Member

baywet commented Mar 4, 2024

@Ndiritu we've had multiple complaints about this task, and attempted multiple fixes. (we still have to review this one that attempts fixing a regression microsoft/kiota-java#1109)

I think we're probably missing parts of the equation here. I you could take time to run integration tests for that task (without this PR) against exchange and ODSP (I seem to remember the behaviour was slightly different between the two services), I think it'd go a long way to:

  • ensuring we're not missing something obvious, and that everything has indeed been fixed.
  • improving customer's trust in this task's implementation.

A couple of important things to consider during this integration test:

  • the file should be about 10MBs to ensure multiple upload requests are sent
  • we should re-download and checksum the file to ensure no logical corruption happens because of the control flow.

@Ndiritu
Copy link
Copy Markdown
Contributor

Ndiritu commented Mar 4, 2024

Sure @baywet. Makes sense to test this extensively before anything is merged/released.

There's a follow-up issue I didn't highlight clearly that was mentioned here in the resolved conversation

@baywet
Copy link
Copy Markdown
Member

baywet commented Mar 4, 2024

This is the reason why I suggested integration testing against both services. I seemed like we still have issues with ODSP, but maybe I misread that comment?

@baywet
Copy link
Copy Markdown
Member

baywet commented Mar 4, 2024

also, on a somewhat related matter, we should plan a path of graph core and update it all the way (with kiota dependencies) in the service libraries. To ensure nobody is on outdated dependencies moving forward and reporting things we've already fixed. As you mentioned, it's probably part of the confusion around that task (and batching) at the moment.

@Ndiritu
Copy link
Copy Markdown
Contributor

Ndiritu commented Mar 4, 2024

This is the reason why I suggested integration testing against both services. I seemed like we still have issues with ODSP, but maybe I misread that comment?

No we're on the same page now.

@calebkiage
Copy link
Copy Markdown
Contributor

I have a question about the upload task. Are all the chunk lengths guaranteed to exactly match what the reader has available for every chunk? The reason I ask is that (according to the docs), the read contract reads bytes on a best effort basis. So, for any invocation, the read bytes can be less than the length expected.

@ffcdf ffcdf mentioned this pull request Mar 5, 2024
@ffcdf
Copy link
Copy Markdown
Contributor Author

ffcdf commented Mar 5, 2024

@ffcdf thanks for the changes.

Please update the Changelog as well. You can add a section for 3.1.7.

Updated. Pull request 1536.

@Ndiritu
Copy link
Copy Markdown
Contributor

Ndiritu commented Mar 5, 2024

I have a question about the upload task. Are all the chunk lengths guaranteed to exactly match what the reader has available for every chunk? The reason I ask is that (according to the docs), the read contract reads bytes on a best effort basis. So, for any invocation, the read bytes can be less than the length expected.

There's no guarantee that we'll read the expected chunk length from the stream (from my understanding of how read works). We currently fail if what we read doesn't match the length we expected to get. Maybe we should retry reading using read with offsets until we get the expected chunk length or reach the end of the file.

Then set the relevant content-range headers based on this and rely on the API response to determine the next byte position to start from. We currently pre-calculate the ranges before the upload actually begins which might be another bug.

@calebkiage
Copy link
Copy Markdown
Contributor

@Ndiritu, thanks for the info. I actually asked because of the assert. I wondered if it would fail for a valid scenario. Maybe the integration tests can help us pick up situations where the read fails but shouldn't.

ffcdf added 4 commits March 6, 2024 14:37
Changed chunkInputStream method in LargeFileUploadTask to resolve IndexOutOfBoundsException when uploading large files
@Ndiritu Ndiritu changed the base branch from dev to fix/lfu March 19, 2024 12:34
@Ndiritu
Copy link
Copy Markdown
Contributor

Ndiritu commented Mar 19, 2024

@ffcdf thank you very much for your contribution! I'll merge this to a separate branch and build on your changes with a few additional fixes.

@ihudedi
Copy link
Copy Markdown

ihudedi commented Apr 4, 2024

Hi @ffcdf @baywet
Are you going to change the InputStream to FileInutStream?
So from now I can't upload large file not from local system?
I have a stream that I need to upload not always from local system.
Thanks,
Itay

@baywet
Copy link
Copy Markdown
Member

baywet commented Apr 16, 2024

Deferring to @Ndiritu who's back an in charge of tackling this issue end to end.

@Ndiritu
Copy link
Copy Markdown
Contributor

Ndiritu commented Apr 18, 2024

Hi @ffcdf @baywet Are you going to change the InputStream to FileInutStream? So from now I can't upload large file not from local system? I have a stream that I need to upload not always from local system. Thanks, Itay

Hi @ihudedi, we have kept this generic as an InputStream to cover a broad set of scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LargeFileUpload always produces IndexOutOfBoundsException if the input stream is longer than the chunk size

5 participants