Skip to content

Avoid copying cppy::ptrs in safe_richcompare#239

Merged
MatthieuDartiailh merged 2 commits intonucleic:mainfrom
frmdstryr:safe-richcompare-copy
Jan 28, 2025
Merged

Avoid copying cppy::ptrs in safe_richcompare#239
MatthieuDartiailh merged 2 commits intonucleic:mainfrom
frmdstryr:safe-richcompare-copy

Conversation

@frmdstryr
Copy link
Contributor

@frmdstryr frmdstryr commented Jan 24, 2025

Avoid an xincref/xdecref for each argument of each call... see https://godbolt.org/z/WPdo6fG39

Also is it necessary to mark the PyObject* version of safe_richcompare as inline?

@codecov
Copy link

codecov bot commented Jan 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.67%. Comparing base (338630c) to head (c2bf418).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #239   +/-   ##
=======================================
  Coverage   97.67%   97.67%           
=======================================
  Files          24       24           
  Lines        1074     1074           
  Branches      162      162           
=======================================
  Hits         1049     1049           
  Misses         12       12           
  Partials       13       13           

@frmdstryr
Copy link
Contributor Author

frmdstryr commented Jan 24, 2025

I see a 6% speedup from this single change... Edit: Appears I didn't benchmark it enough... seems to only be a slight increase.

@MatthieuDartiailh
Copy link
Member

MatthieuDartiailh commented Jan 24, 2025

C++ dark magic. Can you add a comment saying that passing by reference (I assume it is what is happening) we avoid extra incref/decref pairs ?

@frmdstryr
Copy link
Contributor Author

I think it was just a typo because I don't see it done anywhere else.. but I can add a comment if you think it's necessary.

I found several of places where hidden xincref/xdecrefs can be cleanly removed by avoiding using cppy::ptr (planning to do some more PR's eventually).

@frmdstryr
Copy link
Contributor Author

frmdstryr commented Jan 24, 2025

I think every time an observer/topic is added it is doing a lot of hidden stuff too... https://godbolt.org/z/ejazxzPzh

Edit: I think using emplace_back instead of push_back should help there

@sccolbert
Copy link
Member

Free functions defined in header files need to be marked inline to avoid linker errors.

@sccolbert
Copy link
Member

Can you show an example of the benchmark you are using?

@frmdstryr
Copy link
Contributor Author

I'm still using the this table which intentionally exercises adding/removing observers (due to the bad expression). Anything not doing a lot of observer updates will not see as much of a difference.

import random
from time import time
import cProfile, pstats, io
from atom.api import Atom, Str, Int, Instance, List
from enaml.widgets.api import (
    Window, Container, Label, Field, PushButton, VGroup, HGroup, CheckBox
)
from enaml.core.api import Looper, Conditional

MAX_COUNT = 30

class Rack(Atom):
    name = Str()

class Box(Atom):
    name = Str()
    weight = Int()
    rack = Instance(Rack)


enamldef Main(Window):
    attr boxes = [
        Box(name=f"{i}", weight=random.randint(5, 100))
        for i in range(500)
    ]
    attr racks = [
        Rack(name=f"{i}") for i in "ABCDEFGHIJKHIJKLMNOPQRSTUVWXYZ"
    ]

    # Boxes per rack
    attr groups = {} # dict[Rack, list[Box]]

    activated :: assign_boxes()

    func assign_boxes():
        pr = cProfile.Profile()
        pr.enable()
        t0 = time()
        groups = {}
        remaining = boxes[:]
        while remaining:
            box = remaining.pop()
            rack = random.choice(racks)

            if rack not in groups:
                group = groups[rack] = []
            else:
                group = groups[rack]
            if len(group) >= MAX_COUNT:
                continue
            group.append(box)
            box.rack = rack
        self.groups = groups
        pr.disable()
        s = io.StringIO()
        ps = pstats.Stats(pr, stream=s).sort_stats(pstats.SortKey.CUMULATIVE)
        ps.print_stats()
        print(s.getvalue())
    VGroup:
        HGroup:
            PushButton:
                text = "Reassign"
                clicked :: assign_boxes()
            CheckBox: compute_weights:
                text = "Compute weights"
        HGroup:
            padding = 0
            Label:
                text = "Rack"
            Looper:
                iterable = racks
                Label:
                    text = f"{loop.item.name}"
        Looper:
            iterable = range(MAX_COUNT)
            HGroup:
                padding = 0
                attr row = loop.index
                Label:
                    text = f"{loop.item}"
                Looper:
                    iterable << racks
                    Label:
                        attr rack: Rack = loop.item
                        attr boxes << groups.get(rack, [])
                        attr box << boxes[row] if row < len(boxes) else None
                        text << f"{box.name} {box.weight}" if box else ""
        Conditional:
            condition << compute_weights.checked
            HGroup:
                padding = 0
                Label:
                        text = "Weights"
                Looper:
                    iterable = racks
                    Label:
                        # This expr is bad news
                        text << "{}".format(
                            sum([
                                b.weight for b in boxes
                               if b.rack == loop.item
                            ])
                        )

@frmdstryr
Copy link
Contributor Author

From reading on CPU branch predictors I think it might be better to use a single error case branch in https://github.com/nucleic/atom/blob/main/atom/src/utils.h#L105-L108 ?

@sccolbert
Copy link
Member

I'm not opposed to the change you have here, just want to caution you about micro-benchmarking. As soon as an observer object implements a rich comparison method, the cost of the incref/decref will be wholly insignificant.

@sccolbert
Copy link
Member

sccolbert commented Jan 24, 2025

From reading on CPU branch predictors I think it might be better to use a single error case branch in https://github.com/nucleic/atom/blob/main/atom/src/utils.h#L105-L108 ?

The time spent in that function is dominated by calls to the Python C-API. Those two comparisons won't make a lick of difference in an actual program. I'd rather not micro-optimize for pathological cases at the expense of readability, especially in the absence of a real-world pathological case.

@frmdstryr
Copy link
Contributor Author

Here's the callgrind stats from the action on the actual application (w/o this change). It's a table in enaml-web doing a drag drop from one cell to another with 9 users.
Screenshot_20250124_141411

For what it's worth the last change to enaml with the subscription observer made it significantly better already (went from like 500ms to 300ms).

@sccolbert
Copy link
Member

sccolbert commented Jan 24, 2025

There's got to be something else going on there. How many observers are hooked up in that single vector, and where did they come from?

That could probably be refactored into an app-specific data structure, so that you only have one subscription for the entire collection.

@frmdstryr
Copy link
Contributor Author

I took another trace of just a single page load (no actions) and there's 2677 observe calls. Each user has it's own copy of the page (& observers) so that's probably the real problem...

@sccolbert
Copy link
Member

~3000 calls to observe spread over a couple hundred object instances is not too bad. If you have that many calls to attribute(s) owned by the same object, then you have a problem.

Observers are stored per-object, not per-attribute, the assumption being that you might have around a dozen or so, max, on an object.

The data structure which stores the observers is optimized for space and dispatch efficiency, at the expense of add/remove operations. The add/remove is O(n^2), so that can really add up quickly if you are (ab)using the observer system beyond its design.

@sccolbert
Copy link
Member

If you are adding/removing a ton of observers every time a user drag/drops something, then that logic should be corrected.

In particular: I never intended Looper and Include to be used to create a data table.

I never got around to creating a dedicated data-table widget for Enaml, but I have made one for the web (maybe 80% complete):
https://phosphorjs.github.io/examples/datagrid/

@frmdstryr
Copy link
Contributor Author

Yeah that one datagrid table is insane. Fortunately I only have like 20 x 40 cells.

@MatthieuDartiailh
Copy link
Member

I would still like to see a comment. Did you audit the rest of the code for similar cases ?

@frmdstryr
Copy link
Contributor Author

frmdstryr commented Jan 28, 2025

I tried converting the observer pool to use a dict, and while it's quite a bit faster with the add/remove it uses a lot more memory because there huge peaks in observer counts due to the modification guards.

Using the sortedmap might be a more balanced tradeoff but I'm not sure if you'd be open to changing the observer pool?

@MatthieuDartiailh
Copy link
Member

We should probably audit cppy to use references.
For the observerpool I will let @sccolbert comment on the tradeoffs.

@MatthieuDartiailh
Copy link
Member

I check cppy and it is clean. enaml has one instance of suboptimal usage in signaling.cpp though.

@MatthieuDartiailh
Copy link
Member

Doing those changes is safe and meaningful I think. Let's move the discussion around changes to the observerpool to a dedicated issue.

@MatthieuDartiailh MatthieuDartiailh merged commit 60eca49 into nucleic:main Jan 28, 2025
18 checks passed
@frmdstryr frmdstryr deleted the safe-richcompare-copy branch February 2, 2025 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants