Skip to content

Conversation

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Aug 2, 2020

My mentee @xvxvxvxvxv noticed that iterating over array.array is
slightly faster than iterating over bytes. Looking at the source I
observed that arrayiter_next() calls getitem(ao, it->index++) wheras
striter_next() uses the idiom (paraphrased)

item = PyLong_FromLong(seq->ob_sval[it->it_index]);
if (item != NULL)
    ++it->it_next;
return item;

I'm not 100% sure but I think that the second version has less
opportunity for the CPU to overlap the index++ operation with the
rest of the code (which in both cases involves a call). So here I am
optimistically incrementing the index -- if the PyLong_FromLong() call
fails, this will leave the iterator pointing at the next byte, but
honestly I doubt that anyone would seriously consider resuming use of
the iterator after that kind of failure (it would have to be a
MemoryError). And the author of arrayiter_next() made the same
consideration (or never ever gave it a thought :-).

With this, a loop like

for _ in b: pass

is now slightly faster than the same thing over an equivalent array,
rather than slightly slower (in both cases a few percent).

My mentee @xvxvxvxvxv noticed that iterating over array.array is
slightly faster than iterating over bytes.  Looking at the source I
observed that arrayiter_next() calls `getitem(ao, it->index++)` wheras
striter_next() uses the idiom (paraphrased)

    item = PyLong_FromLong(seq->ob_sval[it->it_index]);
    if (item != NULL)
        ++it->it_next;
    return item;

I'm not 100% sure but I think that the second version has fewer
opportunity for the CPU to overlap the `index++` operation with the
rest of the code (which in both cases involves a call).  So here I am
optimistically incrementing the index -- if the PyLong_FromLong() call
fails, this will leave the iterator pointing at the next byte, but
honestly I doubt that anyone would seriously consider resuming use of
the iterator after that kind of failure (it would have to be a
MemoryError).  And the author of arrayiter_next() made the same
consideration (or never ever gave it a thought :-).

With this, a loop like

    for _ in b: pass

is now slightly *faster* than the same thing over an equivalent array,
rather than slightly *slower* (in both cases a few percent).
Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvanrossum

I measured that 2-3% speed improvement on follow script.

Benchmark

./python.exe -m pyperf compare_to master.json pr-21705.json
Mean +- std dev: [master] 443 us +- 10 us -> [pr-21705] 432 us +- 10 us: 1.02x faster (-2%)

Script

import pyperf

runner = pyperf.Runner()
runner.timeit(name="bench bytes iter",
              stmt="""for _ in datas: pass""",
              setup = """datas = b'a'*9999 """
)

@gvanrossum
Copy link
Member Author

@corona10 Thanks! I'll take it, but I'll wait a bit until for else to pipe up. Maybe there's a reason that I've forgotten we need to be extra careful with not incrementing the index until we're sure we have a result.

I also have a question about pyperf -- I'm not sure how I get JSON output from the script you show.

@corona10
Copy link
Member

corona10 commented Aug 3, 2020

@gvanrossum
You can get the JSON file by executing the attached script with the option :)

./python.exe bench_iter.py -o pr-21705.json

For more detail: https://github.com/psf/pyperf

@gvanrossum
Copy link
Member Author

Thanks! That is truly magical -- apparently pyperf parsers the command line even when it is not the __main__ program. I didn't see anything about this in the pyperf docs.

@gvanrossum
Copy link
Member Author

(And I now see that the pyperf's README file has this up front -- but I didn't think to look there, I only looked on readthedocs. :-( )

@gvanrossum gvanrossum merged commit 488512b into master Aug 3, 2020
@gvanrossum gvanrossum deleted the faster-bytes-iter branch August 3, 2020 16:04
@bedevere-bot
Copy link

@gvanrossum: Please replace # with GH- in the commit message next time. Thanks!

shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Aug 4, 2020
)

My mentee @xvxvxvxvxv noticed that iterating over array.array is
slightly faster than iterating over bytes.  Looking at the source I
observed that arrayiter_next() calls `getitem(ao, it->index++)` wheras
striter_next() uses the idiom (paraphrased)

    item = PyLong_FromLong(seq->ob_sval[it->it_index]);
    if (item != NULL)
        ++it->it_next;
    return item;

I'm not 100% sure but I think that the second version has fewer
opportunity for the CPU to overlap the `index++` operation with the
rest of the code (which in both cases involves a call).  So here I am
optimistically incrementing the index -- if the PyLong_FromLong() call
fails, this will leave the iterator pointing at the next byte, but
honestly I doubt that anyone would seriously consider resuming use of
the iterator after that kind of failure (it would have to be a
MemoryError).  And the author of arrayiter_next() made the same
consideration (or never ever gave it a thought :-).

With this, a loop like

    for _ in b: pass

is now slightly *faster* than the same thing over an equivalent array,
rather than slightly *slower* (in both cases a few percent).
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Aug 20, 2020
)

My mentee @xvxvxvxvxv noticed that iterating over array.array is
slightly faster than iterating over bytes.  Looking at the source I
observed that arrayiter_next() calls `getitem(ao, it->index++)` wheras
striter_next() uses the idiom (paraphrased)

    item = PyLong_FromLong(seq->ob_sval[it->it_index]);
    if (item != NULL)
        ++it->it_next;
    return item;

I'm not 100% sure but I think that the second version has fewer
opportunity for the CPU to overlap the `index++` operation with the
rest of the code (which in both cases involves a call).  So here I am
optimistically incrementing the index -- if the PyLong_FromLong() call
fails, this will leave the iterator pointing at the next byte, but
honestly I doubt that anyone would seriously consider resuming use of
the iterator after that kind of failure (it would have to be a
MemoryError).  And the author of arrayiter_next() made the same
consideration (or never ever gave it a thought :-).

With this, a loop like

    for _ in b: pass

is now slightly *faster* than the same thing over an equivalent array,
rather than slightly *slower* (in both cases a few percent).
xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020
)

My mentee @xvxvxvxvxv noticed that iterating over array.array is
slightly faster than iterating over bytes.  Looking at the source I
observed that arrayiter_next() calls `getitem(ao, it->index++)` wheras
striter_next() uses the idiom (paraphrased)

    item = PyLong_FromLong(seq->ob_sval[it->it_index]);
    if (item != NULL)
        ++it->it_next;
    return item;

I'm not 100% sure but I think that the second version has fewer
opportunity for the CPU to overlap the `index++` operation with the
rest of the code (which in both cases involves a call).  So here I am
optimistically incrementing the index -- if the PyLong_FromLong() call
fails, this will leave the iterator pointing at the next byte, but
honestly I doubt that anyone would seriously consider resuming use of
the iterator after that kind of failure (it would have to be a
MemoryError).  And the author of arrayiter_next() made the same
consideration (or never ever gave it a thought :-).

With this, a loop like

    for _ in b: pass

is now slightly *faster* than the same thing over an equivalent array,
rather than slightly *slower* (in both cases a few percent).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants