Skip to content

Conversation

@mbauman
Copy link
Member

@mbauman mbauman commented Oct 14, 2015

This removes the LinearFast special cases from non-scalar indexing. Previously, we were manually hoisting the div/rem sub2ind calculation along the indexed strides, but LLVM seems to be just as capable at performing this optimization in the cases I have tested. Even better, though, this creates a clean separation between the array indexing fallbacks:

  • Scalar fallbacks use ind2sub and sub2ind to compute the required number of indices that the custom type must implement (as determined by Base.linearindexing).
  • Non-scalar fallbacks simply "unwrap" the elements from AbstractArrays (and Colon) and use scalar indexing with the indices that were provided.
  • (CartesianIndices are also expanded to individual integers, but that is a smaller detail.)

In all cases that I've tried, I've been unable to measure a performance difference. Indeed, the LLVM IR looks identical in my spot-checks, too.

@mbauman
Copy link
Member Author

mbauman commented Oct 15, 2015

LLVM seems to be just as capable at performing this optimization

This result holds for both 3.3 and 3.7. Failures were all unrelated.

This removes the LinearFast special cases from non-scalar indexing.  Previously, we were manually hoisting the div/rem sub2ind calculation along the indexed strides, but LLVM seems to be just as capable at performing this optimization in the cases I have tested.  Even better, though, this creates a clean separation between the array indexing fallbacks:

* Scalar fallbacks use `ind2sub` and `sub2ind` to compute the required number of indices that the custom type must implement.
* Non-scalar fallbacks simply "unwrap" the elements from AbstractArrays and use scalar indexing with the indices that were provided.
* (CartesianIndices are also expanded to individual integers, but that is a smaller detail.)

In all cases that I've tried, I've been unable to measure a performance difference. Indeed, the LLVM IR looks identical in my spot-checks, too.
mbauman added a commit that referenced this pull request Oct 16, 2015
Detangle linear indexing from non-scalar indexing
@mbauman mbauman merged commit 29e8d33 into master Oct 16, 2015
@mbauman mbauman deleted the mb/nolinear branch October 16, 2015 17:35
@timholy
Copy link
Member

timholy commented Nov 6, 2015

Given MAX_TUPLETYPE_LEN, nice to get rid of one of the args.

But your comment in the code & above seems to imply that LLVM is figuring out the div/rem optimization. I don't think it is; this is the "forward direction" (sub2ind), which doesn't involve div/rem (it's just multiplication and addition). It's ind2sub where that's an issue.

mbauman added a commit to mbauman/julia that referenced this pull request Jan 11, 2016
Back in JuliaLang#13614 I changed the signature of _unsafe_getindex!, but I failed to update the BitArray methods... so they were just using the generic definitions. This is a very simple fix to restore the indexing performance for BitArrays.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants