Avoid copying cppy::ptrs in safe_richcompare by frmdstryr · Pull Request #239 · nucleic/atom

frmdstryr · 2025-01-24T13:44:16Z

Avoid an xincref/xdecref for each argument of each call... see https://godbolt.org/z/WPdo6fG39

Also is it necessary to mark the PyObject* version of safe_richcompare as inline?

codecov · 2025-01-24T13:46:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.67%. Comparing base (338630c) to head (c2bf418).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #239   +/-   ##
=======================================
  Coverage   97.67%   97.67%           
=======================================
  Files          24       24           
  Lines        1074     1074           
  Branches      162      162           
=======================================
  Hits         1049     1049           
  Misses         12       12           
  Partials       13       13

frmdstryr · 2025-01-24T13:58:07Z

I see a 6% speedup from this single change... Edit: Appears I didn't benchmark it enough... seems to only be a slight increase.

MatthieuDartiailh · 2025-01-24T15:21:14Z

C++ dark magic. Can you add a comment saying that passing by reference (I assume it is what is happening) we avoid extra incref/decref pairs ?

frmdstryr · 2025-01-24T15:54:39Z

I think it was just a typo because I don't see it done anywhere else.. but I can add a comment if you think it's necessary.

I found several of places where hidden xincref/xdecrefs can be cleanly removed by avoiding using cppy::ptr (planning to do some more PR's eventually).

frmdstryr · 2025-01-24T16:00:23Z

I think every time an observer/topic is added it is doing a lot of hidden stuff too... https://godbolt.org/z/ejazxzPzh

Edit: I think using emplace_back instead of push_back should help there

sccolbert · 2025-01-24T17:35:59Z

Free functions defined in header files need to be marked inline to avoid linker errors.

sccolbert · 2025-01-24T17:43:24Z

Can you show an example of the benchmark you are using?

frmdstryr · 2025-01-24T18:00:29Z

I'm still using the this table which intentionally exercises adding/removing observers (due to the bad expression). Anything not doing a lot of observer updates will not see as much of a difference.

import random
from time import time
import cProfile, pstats, io
from atom.api import Atom, Str, Int, Instance, List
from enaml.widgets.api import (
    Window, Container, Label, Field, PushButton, VGroup, HGroup, CheckBox
)
from enaml.core.api import Looper, Conditional

MAX_COUNT = 30

class Rack(Atom):
    name = Str()

class Box(Atom):
    name = Str()
    weight = Int()
    rack = Instance(Rack)


enamldef Main(Window):
    attr boxes = [
        Box(name=f"{i}", weight=random.randint(5, 100))
        for i in range(500)
    ]
    attr racks = [
        Rack(name=f"{i}") for i in "ABCDEFGHIJKHIJKLMNOPQRSTUVWXYZ"
    ]

    # Boxes per rack
    attr groups = {} # dict[Rack, list[Box]]

    activated :: assign_boxes()

    func assign_boxes():
        pr = cProfile.Profile()
        pr.enable()
        t0 = time()
        groups = {}
        remaining = boxes[:]
        while remaining:
            box = remaining.pop()
            rack = random.choice(racks)

            if rack not in groups:
                group = groups[rack] = []
            else:
                group = groups[rack]
            if len(group) >= MAX_COUNT:
                continue
            group.append(box)
            box.rack = rack
        self.groups = groups
        pr.disable()
        s = io.StringIO()
        ps = pstats.Stats(pr, stream=s).sort_stats(pstats.SortKey.CUMULATIVE)
        ps.print_stats()
        print(s.getvalue())
    VGroup:
        HGroup:
            PushButton:
                text = "Reassign"
                clicked :: assign_boxes()
            CheckBox: compute_weights:
                text = "Compute weights"
        HGroup:
            padding = 0
            Label:
                text = "Rack"
            Looper:
                iterable = racks
                Label:
                    text = f"{loop.item.name}"
        Looper:
            iterable = range(MAX_COUNT)
            HGroup:
                padding = 0
                attr row = loop.index
                Label:
                    text = f"{loop.item}"
                Looper:
                    iterable << racks
                    Label:
                        attr rack: Rack = loop.item
                        attr boxes << groups.get(rack, [])
                        attr box << boxes[row] if row < len(boxes) else None
                        text << f"{box.name} {box.weight}" if box else ""
        Conditional:
            condition << compute_weights.checked
            HGroup:
                padding = 0
                Label:
                        text = "Weights"
                Looper:
                    iterable = racks
                    Label:
                        # This expr is bad news
                        text << "{}".format(
                            sum([
                                b.weight for b in boxes
                               if b.rack == loop.item
                            ])
                        )

frmdstryr · 2025-01-24T18:08:27Z

From reading on CPU branch predictors I think it might be better to use a single error case branch in https://github.com/nucleic/atom/blob/main/atom/src/utils.h#L105-L108 ?

sccolbert · 2025-01-24T18:08:35Z

I'm not opposed to the change you have here, just want to caution you about micro-benchmarking. As soon as an observer object implements a rich comparison method, the cost of the incref/decref will be wholly insignificant.

sccolbert · 2025-01-24T18:11:52Z

From reading on CPU branch predictors I think it might be better to use a single error case branch in https://github.com/nucleic/atom/blob/main/atom/src/utils.h#L105-L108 ?

The time spent in that function is dominated by calls to the Python C-API. Those two comparisons won't make a lick of difference in an actual program. I'd rather not micro-optimize for pathological cases at the expense of readability, especially in the absence of a real-world pathological case.

frmdstryr · 2025-01-24T19:29:55Z

Here's the callgrind stats from the action on the actual application (w/o this change). It's a table in enaml-web doing a drag drop from one cell to another with 9 users.

For what it's worth the last change to enaml with the subscription observer made it significantly better already (went from like 500ms to 300ms).

sccolbert · 2025-01-24T19:38:18Z

There's got to be something else going on there. How many observers are hooked up in that single vector, and where did they come from?

That could probably be refactored into an app-specific data structure, so that you only have one subscription for the entire collection.

frmdstryr · 2025-01-24T19:44:59Z

I took another trace of just a single page load (no actions) and there's 2677 observe calls. Each user has it's own copy of the page (& observers) so that's probably the real problem...

sccolbert · 2025-01-24T19:51:12Z

~3000 calls to observe spread over a couple hundred object instances is not too bad. If you have that many calls to attribute(s) owned by the same object, then you have a problem.

Observers are stored per-object, not per-attribute, the assumption being that you might have around a dozen or so, max, on an object.

The data structure which stores the observers is optimized for space and dispatch efficiency, at the expense of add/remove operations. The add/remove is O(n^2), so that can really add up quickly if you are (ab)using the observer system beyond its design.

sccolbert · 2025-01-24T19:56:34Z

If you are adding/removing a ton of observers every time a user drag/drops something, then that logic should be corrected.

In particular: I never intended Looper and Include to be used to create a data table.

I never got around to creating a dedicated data-table widget for Enaml, but I have made one for the web (maybe 80% complete):
https://phosphorjs.github.io/examples/datagrid/

frmdstryr · 2025-01-24T20:05:34Z

Yeah that one datagrid table is insane. Fortunately I only have like 20 x 40 cells.

MatthieuDartiailh · 2025-01-28T15:28:28Z

I would still like to see a comment. Did you audit the rest of the code for similar cases ?

frmdstryr · 2025-01-28T16:28:18Z

I tried converting the observer pool to use a dict, and while it's quite a bit faster with the add/remove it uses a lot more memory because there huge peaks in observer counts due to the modification guards.

Using the sortedmap might be a more balanced tradeoff but I'm not sure if you'd be open to changing the observer pool?

MatthieuDartiailh · 2025-01-28T18:57:44Z

We should probably audit cppy to use references.
For the observerpool I will let @sccolbert comment on the tradeoffs.

MatthieuDartiailh · 2025-01-28T20:22:49Z

I check cppy and it is clean. enaml has one instance of suboptimal usage in signaling.cpp though.

MatthieuDartiailh · 2025-01-28T20:30:22Z

Doing those changes is safe and meaningful I think. Let's move the discussion around changes to the observerpool to a dedicated issue.

Avoid copying cppy::ptrs in safe_richcompare

719c348

Add comment on passing by reference

c2bf418

MatthieuDartiailh merged commit 60eca49 into nucleic:main Jan 28, 2025
18 checks passed

frmdstryr mentioned this pull request Feb 2, 2025

Draft: Refactor catom to reduce size of unobserved atoms #240

Closed

frmdstryr deleted the safe-richcompare-copy branch February 2, 2025 22:04

Uh oh!

Conversation

frmdstryr commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

frmdstryr commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatthieuDartiailh commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frmdstryr commented Jan 24, 2025

Uh oh!

frmdstryr commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sccolbert commented Jan 24, 2025

Uh oh!

sccolbert commented Jan 24, 2025

Uh oh!

frmdstryr commented Jan 24, 2025

Uh oh!

frmdstryr commented Jan 24, 2025

Uh oh!

sccolbert commented Jan 24, 2025

Uh oh!

sccolbert commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frmdstryr commented Jan 24, 2025

Uh oh!

sccolbert commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frmdstryr commented Jan 24, 2025

Uh oh!

sccolbert commented Jan 24, 2025

Uh oh!

sccolbert commented Jan 24, 2025

Uh oh!

frmdstryr commented Jan 24, 2025

Uh oh!

MatthieuDartiailh commented Jan 28, 2025

Uh oh!

frmdstryr commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatthieuDartiailh commented Jan 28, 2025

Uh oh!

MatthieuDartiailh commented Jan 28, 2025

Uh oh!

MatthieuDartiailh commented Jan 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

frmdstryr commented Jan 24, 2025 •

edited

Loading

codecov bot commented Jan 24, 2025 •

edited

Loading

frmdstryr commented Jan 24, 2025 •

edited

Loading

MatthieuDartiailh commented Jan 24, 2025 •

edited

Loading

frmdstryr commented Jan 24, 2025 •

edited

Loading

sccolbert commented Jan 24, 2025 •

edited

Loading

sccolbert commented Jan 24, 2025 •

edited

Loading

frmdstryr commented Jan 28, 2025 •

edited

Loading