-
Notifications
You must be signed in to change notification settings - Fork 957
Description
While playing around with profiling various workloads, I noticed that getNodeByQuery was quite slow, taking about 4% of the CPU of a given command. After deep dive, I realized that we basically are checking three 171kb (16384 * 8) blocks of memory to determine who owns the slot being accessed, whether that slot is migrating, and whether that is slot is being imported. The amount of data far exceeds L1 cache sizes and will also exceed most L2 caches, so we are likely experiencing a large number of cache misses because of these lookups. If you remove all three of those bottlenecks, you get about a 5% performance overall performance improvement.
My thought is that we can use the slot bits on the clusterNode object to determine if we own the slot. We assume that most of the time the client is going to send requests to the right node. We can also keep a smaller structure of all the nodes we are importing from and migrating to, so we don't have to check the map on every lookup.
Potential followup of #631.