A (very) slight speed improvement for iterating over bytes #21705

gvanrossum · 2020-08-02T23:05:30Z

My mentee @xvxvxvxvxv noticed that iterating over array.array is
slightly faster than iterating over bytes. Looking at the source I
observed that arrayiter_next() calls getitem(ao, it->index++) wheras
striter_next() uses the idiom (paraphrased)

item = PyLong_FromLong(seq->ob_sval[it->it_index]);
if (item != NULL)
    ++it->it_next;
return item;

I'm not 100% sure but I think that the second version has less
opportunity for the CPU to overlap the index++ operation with the
rest of the code (which in both cases involves a call). So here I am
optimistically incrementing the index -- if the PyLong_FromLong() call
fails, this will leave the iterator pointing at the next byte, but
honestly I doubt that anyone would seriously consider resuming use of
the iterator after that kind of failure (it would have to be a
MemoryError). And the author of arrayiter_next() made the same
consideration (or never ever gave it a thought :-).

With this, a loop like

for _ in b: pass

is now slightly faster than the same thing over an equivalent array,
rather than slightly slower (in both cases a few percent).

@xvxvxvxvxv

My mentee @xvxvxvxvxv noticed that iterating over array.array is slightly faster than iterating over bytes. Looking at the source I observed that arrayiter_next() calls `getitem(ao, it->index++)` wheras striter_next() uses the idiom (paraphrased) item = PyLong_FromLong(seq->ob_sval[it->it_index]); if (item != NULL) ++it->it_next; return item; I'm not 100% sure but I think that the second version has fewer opportunity for the CPU to overlap the `index++` operation with the rest of the code (which in both cases involves a call). So here I am optimistically incrementing the index -- if the PyLong_FromLong() call fails, this will leave the iterator pointing at the next byte, but honestly I doubt that anyone would seriously consider resuming use of the iterator after that kind of failure (it would have to be a MemoryError). And the author of arrayiter_next() made the same consideration (or never ever gave it a thought :-). With this, a loop like for _ in b: pass is now slightly *faster* than the same thing over an equivalent array, rather than slightly *slower* (in both cases a few percent).

corona10

@gvanrossum

I measured that 2-3% speed improvement on follow script.

Benchmark

./python.exe -m pyperf compare_to master.json pr-21705.json
Mean +- std dev: [master] 443 us +- 10 us -> [pr-21705] 432 us +- 10 us: 1.02x faster (-2%)

Script

import pyperf

runner = pyperf.Runner()
runner.timeit(name="bench bytes iter",
              stmt="""for _ in datas: pass""",
              setup = """datas = b'a'*9999 """
)

gvanrossum · 2020-08-03T03:54:35Z

@corona10 Thanks! I'll take it, but I'll wait a bit until for else to pipe up. Maybe there's a reason that I've forgotten we need to be extra careful with not incrementing the index until we're sure we have a result.

I also have a question about pyperf -- I'm not sure how I get JSON output from the script you show.

corona10 · 2020-08-03T04:25:06Z

@gvanrossum
You can get the JSON file by executing the attached script with the option :)

./python.exe bench_iter.py -o pr-21705.json

For more detail: https://github.com/psf/pyperf

gvanrossum · 2020-08-03T04:29:12Z

Thanks! That is truly magical -- apparently pyperf parsers the command line even when it is not the __main__ program. I didn't see anything about this in the pyperf docs.

gvanrossum · 2020-08-03T04:30:34Z

(And I now see that the pyperf's README file has this up front -- but I didn't think to look there, I only looked on readthedocs. :-( )

bedevere-bot · 2020-08-03T16:04:17Z

@gvanrossum: Please replace # with GH- in the commit message next time. Thanks!

@xvxvxvxvxv

) My mentee @xvxvxvxvxv noticed that iterating over array.array is slightly faster than iterating over bytes. Looking at the source I observed that arrayiter_next() calls `getitem(ao, it->index++)` wheras striter_next() uses the idiom (paraphrased) item = PyLong_FromLong(seq->ob_sval[it->it_index]); if (item != NULL) ++it->it_next; return item; I'm not 100% sure but I think that the second version has fewer opportunity for the CPU to overlap the `index++` operation with the rest of the code (which in both cases involves a call). So here I am optimistically incrementing the index -- if the PyLong_FromLong() call fails, this will leave the iterator pointing at the next byte, but honestly I doubt that anyone would seriously consider resuming use of the iterator after that kind of failure (it would have to be a MemoryError). And the author of arrayiter_next() made the same consideration (or never ever gave it a thought :-). With this, a loop like for _ in b: pass is now slightly *faster* than the same thing over an equivalent array, rather than slightly *slower* (in both cases a few percent).

@xvxvxvxvxv

) My mentee @xvxvxvxvxv noticed that iterating over array.array is slightly faster than iterating over bytes. Looking at the source I observed that arrayiter_next() calls `getitem(ao, it->index++)` wheras striter_next() uses the idiom (paraphrased) item = PyLong_FromLong(seq->ob_sval[it->it_index]); if (item != NULL) ++it->it_next; return item; I'm not 100% sure but I think that the second version has fewer opportunity for the CPU to overlap the `index++` operation with the rest of the code (which in both cases involves a call). So here I am optimistically incrementing the index -- if the PyLong_FromLong() call fails, this will leave the iterator pointing at the next byte, but honestly I doubt that anyone would seriously consider resuming use of the iterator after that kind of failure (it would have to be a MemoryError). And the author of arrayiter_next() made the same consideration (or never ever gave it a thought :-). With this, a loop like for _ in b: pass is now slightly *faster* than the same thing over an equivalent array, rather than slightly *slower* (in both cases a few percent).

@xvxvxvxvxv

) My mentee @xvxvxvxvxv noticed that iterating over array.array is slightly faster than iterating over bytes. Looking at the source I observed that arrayiter_next() calls `getitem(ao, it->index++)` wheras striter_next() uses the idiom (paraphrased) item = PyLong_FromLong(seq->ob_sval[it->it_index]); if (item != NULL) ++it->it_next; return item; I'm not 100% sure but I think that the second version has fewer opportunity for the CPU to overlap the `index++` operation with the rest of the code (which in both cases involves a call). So here I am optimistically incrementing the index -- if the PyLong_FromLong() call fails, this will leave the iterator pointing at the next byte, but honestly I doubt that anyone would seriously consider resuming use of the iterator after that kind of failure (it would have to be a MemoryError). And the author of arrayiter_next() made the same consideration (or never ever gave it a thought :-). With this, a loop like for _ in b: pass is now slightly *faster* than the same thing over an equivalent array, rather than slightly *slower* (in both cases a few percent).

gvanrossum added skip issue skip news labels Aug 2, 2020

the-knights-who-say-ni added the CLA signed label Aug 2, 2020

bedevere-bot added the awaiting core review label Aug 2, 2020

corona10 approved these changes Aug 3, 2020

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Aug 3, 2020

gvanrossum merged commit 488512b into master Aug 3, 2020

gvanrossum deleted the faster-bytes-iter branch August 3, 2020 16:04

bedevere-bot removed the awaiting merge label Aug 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

A (very) slight speed improvement for iterating over bytes #21705

A (very) slight speed improvement for iterating over bytes #21705

Uh oh!

gvanrossum commented Aug 2, 2020 •

edited

Loading

Uh oh!

corona10 left a comment

Uh oh!

gvanrossum commented Aug 3, 2020

Uh oh!

corona10 commented Aug 3, 2020

Uh oh!

gvanrossum commented Aug 3, 2020

Uh oh!

gvanrossum commented Aug 3, 2020

Uh oh!

bedevere-bot commented Aug 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

A (very) slight speed improvement for iterating over bytes #21705

A (very) slight speed improvement for iterating over bytes #21705

Uh oh!

Conversation

gvanrossum commented Aug 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corona10 left a comment

Choose a reason for hiding this comment

Benchmark

Script

Uh oh!

gvanrossum commented Aug 3, 2020

Uh oh!

corona10 commented Aug 3, 2020

Uh oh!

gvanrossum commented Aug 3, 2020

Uh oh!

gvanrossum commented Aug 3, 2020

Uh oh!

bedevere-bot commented Aug 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gvanrossum commented Aug 2, 2020 •

edited

Loading