Skip to content

Conversation

@chriselrod
Copy link
Collaborator

Depends on ThreadingUtilities.jl, which will be registered in a few days.

@chriselrod chriselrod changed the title Fast threaded matmul. WIP: Fast threaded matmul. Jan 18, 2021
@chriselrod
Copy link
Collaborator Author

Tests pass locally (with ThreadingUtilities.jl installed).

@chriselrod chriselrod changed the title WIP: Fast threaded matmul. Fast threaded matmul. Jan 18, 2021
@DilumAluthge DilumAluthge marked this pull request as draft January 18, 2021 22:40
@DilumAluthge DilumAluthge added the enhancement New feature or request label Jan 18, 2021
@codecov
Copy link

codecov bot commented Jan 19, 2021

Codecov Report

Merging #41 (900e3c3) into master (c1fd4ff) will decrease coverage by 87.76%.
The diff coverage is 11.51%.

Impacted file tree graph

@@             Coverage Diff              @@
##            master      #41       +/-   ##
============================================
- Coverage   100.00%   12.23%   -87.77%     
============================================
  Files            9       12        +3     
  Lines          121      719      +598     
============================================
- Hits           121       88       -33     
- Misses           0      631      +631     
Impacted Files Coverage Δ
src/Octavian.jl 100.00% <ø> (ø)
src/block_sizes.jl 0.00% <0.00%> (-100.00%) ⬇️
src/types.jl 0.00% <0.00%> (-100.00%) ⬇️
src/memory_buffer.jl 3.44% <3.44%> (-96.56%) ⬇️
src/funcptrs.jl 4.65% <4.65%> (ø)
src/macrokernels.jl 6.28% <6.28%> (-93.72%) ⬇️
src/matmul.jl 6.88% <6.88%> (-93.12%) ⬇️
src/integerdivision.jl 22.22% <22.22%> (ø)
src/staticfloats.jl 33.33% <33.33%> (ø)
src/init.jl 70.00% <60.00%> (-30.00%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1fd4ff...900e3c3. Read the comment docs.

@DilumAluthge
Copy link
Member

@chriselrod Any idea what's causing these test failures?

@DilumAluthge DilumAluthge changed the title Fast threaded matmul. Fast threaded matmul Jan 19, 2021
@DilumAluthge DilumAluthge marked this pull request as ready for review January 19, 2021 03:15
@DilumAluthge
Copy link
Member

Let's mark the CodeCov checks as not required for now. We can increase the code coverage in future PRs.

LGTM to merge once CI is green. It's just the x86 that's left to fix now?

@DilumAluthge
Copy link
Member

I unmarked the codecov statuses as required.

BTW you have the access to change those settings if you want. You should see it under settings, branches, branch protection settings.

@DilumAluthge
Copy link
Member

Good to merge?

@chriselrod
Copy link
Collaborator Author

chriselrod commented Jan 20, 2021

It'd be nice if we could query some information about how big the stack is.

64-bit is stack allocating A, but this would cause stack overflows with 32-bit, so I'm using a global cache there.
I should also perhaps make the size of the stack allocated A closer to R₁Default*FIRST__CACHE_SIZE than to FIRST__CACHE_SIZE, but because R₁ is an argument to some of these that benchmark/tilesearch.jl, for example, uses (there's a copy of that file in StrideArrays.jl, I'll add it to Octavian in a separate PR).

Fortran and LLVM both let you stack allocate dynamically sized arrays, but Julia does not, meaning I need to pick some fixed upper bound.
While LLVM lets you, llvmcall also does not. That's because the lifetime of the stack allocated object equals the scope of the function allocating it. llvmcall works by creating a function and inlining it, but it's that original inlined function whose scope determines the lifetime. In otherwords, calling alloca via llvmcall doesn't work, because the memory is instantly freed.

So for the time being, it just allocates FIRST__CACHE_SIZE data on the stack. This turned out to be too much for 32-bit Julia, but I haven't seen it cause a problem on 64-bit Julia.

@chriselrod chriselrod merged commit 4749ad7 into master Jan 20, 2021
@chriselrod chriselrod deleted the fastmatmul branch January 20, 2021 09:36
@DilumAluthge DilumAluthge mentioned this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants