-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
Labels
Milestone
Description
When I profile this simple virtual generic callsite:
public class Prog
{
static void InvokeVGM(Prog p)
{
while (true)
p.VGM<string>();
}
virtual void VGM<T>() {}
}in VTune I see div as the most expensive part (if I read it correctly):

It turns out to be a EEHashTable lookup:

And It does make sense since div is a very expensive operation, on some CPU its latency can be bigger than 100 cycles. On my Coffee Lake it's 26/6 while e.g. mul is 3/1 (Latency/rec.trput)
The managed Dictionary was optimized with "Fast Mod" algorithm where we pre-calculate a special multiplier every time a new bucket is added and then can avoid expensive idiv operation by using mul see dotnet/coreclr#27299 and #406
Quick demo:
public static uint Test(uint x)
{
uint result = 0;
ulong cachedMul = GetFastModMultiplier(y);
for (int i = 0; i < x; i++)
{
// result = i % x;
result = FastMod(i, x, cachedMul);
}
return result;
}
public static ulong GetFastModMultiplier(uint divisor) =>
ulong.MaxValue / divisor + 1;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static uint FastMod(uint value, uint divisor, ulong multiplier) =>
(uint)(((((multiplier * value) >> 32) + 1) * divisor) >> 32);NOTE: the algorithm adds an overhead on dictionary expansion
NOTE2: CompareKeys is not inlined
yahorsi, nietras, SupinePandora43 and PaulusParssinencontrolflow and SupinePandora43warappa