Avoid copying cppy::ptrs in safe_richcompare#239
Avoid copying cppy::ptrs in safe_richcompare#239MatthieuDartiailh merged 2 commits intonucleic:mainfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #239 +/- ##
=======================================
Coverage 97.67% 97.67%
=======================================
Files 24 24
Lines 1074 1074
Branches 162 162
=======================================
Hits 1049 1049
Misses 12 12
Partials 13 13 |
|
I see a 6% speedup from this single change... Edit: Appears I didn't benchmark it enough... seems to only be a slight increase. |
|
C++ dark magic. Can you add a comment saying that passing by reference (I assume it is what is happening) we avoid extra incref/decref pairs ? |
|
I think it was just a typo because I don't see it done anywhere else.. but I can add a comment if you think it's necessary. I found several of places where hidden xincref/xdecrefs can be cleanly removed by avoiding using cppy::ptr (planning to do some more PR's eventually). |
|
I think every time an observer/topic is added it is doing a lot of hidden stuff too... https://godbolt.org/z/ejazxzPzh Edit: I think using emplace_back instead of push_back should help there |
|
Free functions defined in header files need to be marked |
|
Can you show an example of the benchmark you are using? |
|
I'm still using the this table which intentionally exercises adding/removing observers (due to the bad expression). Anything not doing a lot of observer updates will not see as much of a difference. import random
from time import time
import cProfile, pstats, io
from atom.api import Atom, Str, Int, Instance, List
from enaml.widgets.api import (
Window, Container, Label, Field, PushButton, VGroup, HGroup, CheckBox
)
from enaml.core.api import Looper, Conditional
MAX_COUNT = 30
class Rack(Atom):
name = Str()
class Box(Atom):
name = Str()
weight = Int()
rack = Instance(Rack)
enamldef Main(Window):
attr boxes = [
Box(name=f"{i}", weight=random.randint(5, 100))
for i in range(500)
]
attr racks = [
Rack(name=f"{i}") for i in "ABCDEFGHIJKHIJKLMNOPQRSTUVWXYZ"
]
# Boxes per rack
attr groups = {} # dict[Rack, list[Box]]
activated :: assign_boxes()
func assign_boxes():
pr = cProfile.Profile()
pr.enable()
t0 = time()
groups = {}
remaining = boxes[:]
while remaining:
box = remaining.pop()
rack = random.choice(racks)
if rack not in groups:
group = groups[rack] = []
else:
group = groups[rack]
if len(group) >= MAX_COUNT:
continue
group.append(box)
box.rack = rack
self.groups = groups
pr.disable()
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats(pstats.SortKey.CUMULATIVE)
ps.print_stats()
print(s.getvalue())
VGroup:
HGroup:
PushButton:
text = "Reassign"
clicked :: assign_boxes()
CheckBox: compute_weights:
text = "Compute weights"
HGroup:
padding = 0
Label:
text = "Rack"
Looper:
iterable = racks
Label:
text = f"{loop.item.name}"
Looper:
iterable = range(MAX_COUNT)
HGroup:
padding = 0
attr row = loop.index
Label:
text = f"{loop.item}"
Looper:
iterable << racks
Label:
attr rack: Rack = loop.item
attr boxes << groups.get(rack, [])
attr box << boxes[row] if row < len(boxes) else None
text << f"{box.name} {box.weight}" if box else ""
Conditional:
condition << compute_weights.checked
HGroup:
padding = 0
Label:
text = "Weights"
Looper:
iterable = racks
Label:
# This expr is bad news
text << "{}".format(
sum([
b.weight for b in boxes
if b.rack == loop.item
])
)
|
|
From reading on CPU branch predictors I think it might be better to use a single error case branch in https://github.com/nucleic/atom/blob/main/atom/src/utils.h#L105-L108 ? |
|
I'm not opposed to the change you have here, just want to caution you about micro-benchmarking. As soon as an observer object implements a rich comparison method, the cost of the |
The time spent in that function is dominated by calls to the Python C-API. Those two comparisons won't make a lick of difference in an actual program. I'd rather not micro-optimize for pathological cases at the expense of readability, especially in the absence of a real-world pathological case. |
|
There's got to be something else going on there. How many observers are hooked up in that single vector, and where did they come from? That could probably be refactored into an app-specific data structure, so that you only have one subscription for the entire collection. |
|
I took another trace of just a single page load (no actions) and there's 2677 observe calls. Each user has it's own copy of the page (& observers) so that's probably the real problem... |
|
~3000 calls to Observers are stored per-object, not per-attribute, the assumption being that you might have around a dozen or so, max, on an object. The data structure which stores the observers is optimized for space and dispatch efficiency, at the expense of add/remove operations. The add/remove is |
|
If you are adding/removing a ton of observers every time a user drag/drops something, then that logic should be corrected. In particular: I never intended I never got around to creating a dedicated data-table widget for Enaml, but I have made one for the web (maybe 80% complete): |
|
Yeah that one datagrid table is insane. Fortunately I only have like 20 x 40 cells. |
|
I would still like to see a comment. Did you audit the rest of the code for similar cases ? |
|
I tried converting the observer pool to use a dict, and while it's quite a bit faster with the add/remove it uses a lot more memory because there huge peaks in observer counts due to the modification guards. Using the sortedmap might be a more balanced tradeoff but I'm not sure if you'd be open to changing the observer pool? |
|
We should probably audit cppy to use references. |
|
I check cppy and it is clean. enaml has one instance of suboptimal usage in signaling.cpp though. |
|
Doing those changes is safe and meaningful I think. Let's move the discussion around changes to the observerpool to a dedicated issue. |

Avoid an xincref/xdecref for each argument of each call... see https://godbolt.org/z/WPdo6fG39
Also is it necessary to mark the PyObject* version of safe_richcompare as inline?