[RFC] Update ICA interface and implementation by baggepinnen · Pull Request #122 · JuliaStats/MultivariateStats.jl

baggepinnen · 2020-07-07T05:53:44Z

This PR is aimed at starting a discussion around the interface and implementation of ICA.

The current implementation uses symbols for dispatch which means that one can not add new algorithms without modifying the source. It also uses an awkward "function to get a function" do dispatch on the derivative type.

This PR:

changes the algorithm dispatch to using an AbstractICAAlg
Dispatches directly on the ICADeriv type and gets rid of the icaderiv middlehand.
Refactors the loop out into self contained functions. This is arguably a little bit uglier than the previous approach, but it has a huge benefit: one can easily modify these new functions to exploit SIMD for 5x+ greater performance.
Some performance benchmarks on the computationally heavy part of the algorithm:

130.192 μs (1 allocation: 32 bytes) # Baseline
56.411 μs (1 allocation: 32 bytes)  # @avx
52.914 μs (1 allocation: 32 bytes)  # SLEEFPirates master
21.719 μs (1 allocation: 32 bytes)  # SLEEFPirates.tanh_fast

The factor of 6x is too big to ignore for many scenarios, but does require a dependency on LoopVectorization. Consider the two functions below, which in this PR makes the heavy lifting

function update_UE!(f::Tanh{T}, U::AbstractMatrix{T}, E1::AbstractVector{T}) where T
    n,k = size(U)
    _s = zero(T)
    a = f.a
    @inbounds for j = 1:k
        for i = 1:n 
            t = tanh(a * U[i,j])
            U[i,j] = t
            _s += a * (1 - t^2)
        end
        E1[j] = _s / n
    end
end

function update_UE!(f::Tanh{T}, U::AbstractMatrix{T}, E1::AbstractVector{T}) where T
    n,k = size(U)
    _s = zero(T)
    a = f.a
    @inbounds for j = 1:k
        @avx for i = 1:n 
            t = SLEEFPirates.tanh_fast(a * U[i,j])
            U[i,j] = t
            _s += a * (1 - t^2)
        end
        E1[j] = _s / n
    end
end

the second version using LoopVectorization and SLEEFPirates is 6x faster on my machine. A user wanting to make this happen can simply implement a new ICADeriv type Tanhfast and overload update_UE!, without MultivariateStats taking a dependency on LoopVectorization.jl

This is obviously a breaking change, but I hope people will find it for the better.

wildart · 2022-01-20T21:50:22Z

Merged in #174.

* Added docstrings, and updated `ICAGDeriv` as in #122 * added ICA docs * updated ICA test

baggepinnen added 8 commits July 6, 2020 17:15

add Random to deps

631b857

Change ICADeriv interface

1029f15

performance improvements

74be774

deactivate strange test

8933101

doc changes to ICADeriv

5092b7e

test on julia v1.4 instead of 1.1

f903597

AbstractICAAlg for dispatch instead of symbols

dcea82e

fix struct

604df83

wildart added a commit to wildart/MultivariateStats.jl that referenced this pull request Jan 20, 2022

Added docstrings, and updated ICAGDeriv as in JuliaStats#122

d28db09

wildart closed this Jan 20, 2022

wildart added a commit that referenced this pull request Jan 23, 2022

Refactor ICA (#174)

da006ee

* Added docstrings, and updated `ICAGDeriv` as in #122 * added ICA docs * updated ICA test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Update ICA interface and implementation#122

[RFC] Update ICA interface and implementation#122
baggepinnen wants to merge 8 commits intoJuliaStats:masterfrom
baggepinnen:master

baggepinnen commented Jul 7, 2020 •

edited

Loading

Uh oh!

wildart commented Jan 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

baggepinnen commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wildart commented Jan 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

baggepinnen commented Jul 7, 2020 •

edited

Loading