Skip to content

Use Fast-Mod algorithm for native EEHashTableBase #65778

@EgorBo

Description

@EgorBo

When I profile this simple virtual generic callsite:

public class Prog
{
    static void InvokeVGM(Prog p)
    {
        while (true)
            p.VGM<string>();
    }

    virtual void VGM<T>() {}
}

in VTune I see div as the most expensive part (if I read it correctly):
image
It turns out to be a EEHashTable lookup:
image

And It does make sense since div is a very expensive operation, on some CPU its latency can be bigger than 100 cycles. On my Coffee Lake it's 26/6 while e.g. mul is 3/1 (Latency/rec.trput)

The managed Dictionary was optimized with "Fast Mod" algorithm where we pre-calculate a special multiplier every time a new bucket is added and then can avoid expensive idiv operation by using mul see dotnet/coreclr#27299 and #406

Quick demo:

public static uint Test(uint x)
{
    uint result = 0;

    ulong cachedMul = GetFastModMultiplier(y);
    for (int i = 0; i < x; i++)
    {
        // result = i % x;
        result = FastMod(i, x, cachedMul);
    }
    return result;
}

public static ulong GetFastModMultiplier(uint divisor) =>
    ulong.MaxValue / divisor + 1;

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static uint FastMod(uint value, uint divisor, ulong multiplier) => 
    (uint)(((((multiplier * value) >> 32) + 1) * divisor) >> 32);

NOTE: the algorithm adds an overhead on dictionary expansion
NOTE2: CompareKeys is not inlined

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions