Recycle buffer lines #1731

jerch · 2018-10-08T17:39:45Z

A first attempt to recycle buffer lines.

Can be tested by switching experimentalBufferLineImpl to TypedArray.

Note: The recycling is only active for the typed array version to make it a fair comparison (the JS array version is much slower with it). Still buggy with faulty behavior in some tests (cannot be tested easily atm, the tests currently rely on the hardcoded JS array type). More to come...

The recycling is now active for both buffer line versions.

Part of #791

src/Buffer.ts

jerch · 2018-10-09T22:04:21Z

@Tyriar Here comes the new approach: a callback version of push. It was the fastest thing I could come up with. I first tried your trim idea with another method, but it was much worse in runtime.

Still up for other ideas.

NB: I kinda screwed up the branch and had to reset it lol.

jerch · 2018-10-10T13:11:18Z

Added an experimentalPushRecycling option, this way it can be tested in the demo with both bufferline implementations. Also made a small change to BufferLine.copyFrom, the JSArray version now profits from the recycling, too.

For my typical benchmark ls -lR /usr/lib I see the following runtime numbers (range from 5 runs):

no recycling:
- JSArray: 1900 - 2300 ms (corresponds to current master)
- TypedArray: 2200 - 2600 ms
with recycling:
- JSArray: 1600 - 2000 ms
- TypedArray: 1700 - 2100 ms

With that small change in BufferLine.copyFrom the JSArray version is again the fastest. The TypedArray version show a greater benefit from recycling and is close behind (~10% faster than JSArray without recycling, still ~5% slower than JSArray with recycling).

Edit: The numbers above are the total JS runtime for my benchmark which also contains the renderer runtime and the websocket overhead (unbuffered in the server script). Currently I have no isolated input only test setup, but we can approximate the boost by subtracting these numbers:

cost of unbuffered websocket overhead ~300 ms (tends to be bigger though, also causes the big ranges above)
cost of renderer ~700 ms (450ms for drawImage + some JS calls)

Final speedup for the input handler code average:

JSArray: 1100 vs. 800 => ~27% faster
TypedArray: 1400 vs. 900 => ~35% faster

Speedup master vs. recycled TypedArray: 1100 vs. 900 => ~18%

Guess I should do a real input chain benchmark to get more reliable numbers.

jerch · 2018-10-10T19:14:32Z

Here are the numbers for the input chain alone (done with this script https://gist.github.com/jerch/31f23538c5ca1a5079a78bbd627398ce, ./benchmark_data1 contains the output of ls -lR /usr/lib):

{ BufferLineType: 'JsArray',
  Recycling: false,
  Throughput: '10.45 MB/s',
  File: './benchmark_data1',
  Duration: 4596,
  Size: 50361113 }
{ BufferLineType: 'TypedArray',
  Recycling: false,
  Throughput: '11.44 MB/s',
  File: './benchmark_data1',
  Duration: 4199,
  Size: 50361113 }
{ BufferLineType: 'JsArray',
  Recycling: true,
  Throughput: '12.04 MB/s',
  File: './benchmark_data1',
  Duration: 3990,
  Size: 50361113 }
{ BufferLineType: 'TypedArray',
  Recycling: true,
  Throughput: '19.15 MB/s',
  File: './benchmark_data1',
  Duration: 2508,
  Size: 50361113 }

Seems the typed array + recycling doubles the throughput 😄

Question here is what to make out of these numbers and why they differ that much from the numbers in the browser:

Testing of real data from the pty is abit wonky in the browser and leads to unreliable numbers due to the websocket between (not clue why it behaves in such an undeterministic way, it causes all types of hiccups in browser tests)
Now the typed array version is always faster: Kinda what I expected in the first place due to reduced GC pressure, but the browser always said - nope its slower, hmm. Maybe this is related to the rendering in browser, the green boxes look quite different and show higher frame rates most of the time for the typed array.
Maybe v8 in nodejs does some fundamentally different things than chrome's v8 - Imho unlikely but not impossible.
We have not yet sliced xterm.js into a separate offscreen working part, the numbers could be totally wrong due missing DOM and internal errors - Yeah well the data were eaten without errors. I also tested data files with more complicated stuff like midnight commander startup - it wont run.

Perfwise I think the numbers from the nodejs tests are closer to the truth - they only measure the input chain thats affected most by the changes while the browser tests seem to measure some side effects as well.

src/common/CircularList.ts

src/Terminal.ts

jerch · 2018-10-22T16:00:00Z

@Tyriar Did a version without the callback as you suggested (see last commit). Well, seems it has several flaws:

needs a precheck on caller side whether trim would occur, imho thats bad API design as it leaks the trim functionality and even will lead to wrong results if the check was forgotten or is faulty
~~its actually slower than the callback variant (14 vs. 17 MB/s)~~ With slight code changes its faster (~19 MB/s).

I dont like the callback thing either here, its just I had no better idea, how to put it without exposing to many internals. Still up for other ideas.

Edit: Woops forgot to push, see commit below.

jerch · 2018-10-26T14:34:47Z

@Tyriar Removed the callback variant, trimAndRecycle is alot faster. Drawback is the precondition to use it only for a full circular list or hell will break loose. I commented it accordingly, so we should be on the safe side as long as we dont neglect the comments/doc.

Now it works as follows for recycling:

a terminal instance holds a blankline blueprint (a buffer line instance with blankLine content)
the blueprint is read upon entering Terminal.scroll and compared to .cols and current attrs, if they differ the blueprint is recreated
any new line is fast cloned from this blueprint (newLine = blueprint.clone();)
if the buffer is at max length, trimAndRecycle steps in, sets the old trimmed line as new active one and returns it, this line gets the cell content from the blueprint by fast copy (trimAndRecycle().copyFrom(blueprint);)
done

The blueprint is needed to have something to copy the cell content from without recreating a line everytime. The blueprint alone already gives a nice speedup since the cached blankline is unlikely to change very often.
Overall the benchmark numbers for the input chain are:

JSArray no recycling: 7 - 9 MB/s
JSArray with recycling: 9 - 11 MB/s
TypedArray no recycling: 10 - 12 MB/s
TypedArray with recycling: 18 - 20 MB/s

Up for another review.

src/Terminal.ts

Tyriar

Looks pretty solid, just a few things and I think we can merge this!

typings/xterm.d.ts

src/Terminal.ts

src/BufferLine.ts

jerch · 2018-10-30T13:57:43Z

@Tyriar Something like this should work:

  public pushFromBlueprint(blueprint: IBufferLine, allowRecycle?: boolean): void {
    if (this._length === this._maxLength) {
      this._startIndex = ++this._startIndex % this._maxLength;
      this.emit('trim', 1);
      if (allowRecycle) {
        (this._array[this._getCyclicIndex(this._length - 1)] as unknown as IBufferLine).copyFrom(blueprint);
      } else {
        this._array[this._getCyclicIndex(this._length - 1)] = blueprint.clone() as unknown as T;
      }
    } else {
      this._array[this._getCyclicIndex(this._length)] = blueprint.clone() as unknown as T;
      this._length++;
    }
  }

But this has several issues:

basically removes the type system
BufferLine impl is pulled into CircularList
also pulls the blueprint thing into CircularList
not sure if this.emit('trim', 1); is at the right position (same goes for current push)

Note it is not possible to give push the final line object beforehand (cloned or recycled), since push has to decide first whether to recycle at all (the final object is either a clone or a copyFrom recycled object). To encapsulate this nicely we have imho only 2 options - either the callback variant or give up the generic <T> and do an inherited version of CircularList for IBufferLine to clean up the type system above.

jerch · 2018-10-30T16:42:01Z

@Tyriar The last commit reverts to a cleaner precheck version and does the recycling explicitly in Terminal.scroll. This seems much cleaner to me than trying to merge it with push.
Slightly slower than pushAndRecycle, but less hazardous to use.

src/BufferLine.ts

src/Terminal.ts

jerch · 2018-11-07T00:08:01Z

@Tyriar The last commit is a compromise between speed and code safety. Was not able to remove the double checking of (this._length === this._maxLength) for the recycle control flow path. recycle now always works but returns undefined for non full ringbuffers, still the full precheck is faster than handling the undefined (due to deopt in the ringbuffer when reading out of bound).
Speed decreased only slightly (18.5 MB/s vs 17.5 MB/s seems neglectible).

src/common/CircularList.ts

typings/xterm.d.ts

jerch · 2018-11-20T21:12:56Z

Averages now to ~18.5 MB/s, this is a nice enhancement compared to ~7.5 MB/s in the current master.

Tyriar

🎉

jerch added the work-in-progress Do not merge label Oct 8, 2018

Tyriar reviewed Oct 8, 2018

View reviewed changes

src/Buffer.ts Outdated Show resolved Hide resolved

Tyriar mentioned this pull request Oct 8, 2018

Buffer performance improvements #791

Closed

new approach

33a3580

jerch force-pushed the reuse_bufferlines branch from ceb955b to 33a3580 Compare October 9, 2018 21:58

better integration and experimentalPushRecycling option

751e2ef

Tyriar requested changes Oct 10, 2018

View reviewed changes

src/common/CircularList.ts Outdated Show resolved Hide resolved

src/common/CircularList.ts Outdated Show resolved Hide resolved

src/Terminal.ts Outdated Show resolved Hide resolved

src/Terminal.ts Show resolved Hide resolved

Merge branch 'demo_buffer' into reuse_bufferlines

588ed5e

jerch mentioned this pull request Oct 11, 2018

Improve testing and tracking of performance critical components #1688

Closed

Merge branch 'master' into reuse_bufferlines

6f6c7c2

jerch added 4 commits October 22, 2018 18:15

trimAndRecycle with external check

82228ad

slightly faster trimAndRecycle

1585aba

Merge branch 'master' into reuse_bufferlines

9820068

remove pushRecycling, test case for trimAndRecycle, docs

93d7a30

jerch closed this Oct 26, 2018

jerch reopened this Oct 26, 2018

jerch removed the work-in-progress Do not merge label Oct 26, 2018

Tyriar mentioned this pull request Oct 26, 2018

Handle comments made on commits that no longer exist microsoft/vscode-pull-request-github#635

Open

Tyriar requested changes Oct 26, 2018

View reviewed changes

src/Terminal.ts Outdated Show resolved Hide resolved

Tyriar requested changes Oct 26, 2018

View reviewed changes

typings/xterm.d.ts Outdated Show resolved Hide resolved

src/Terminal.ts Outdated Show resolved Hide resolved

src/BufferLine.ts Outdated Show resolved Hide resolved

Merge branch 'master' into reuse_bufferlines

aaffc2b

revert to precheck pushWouldTrim

fb64c52

jerch closed this Oct 30, 2018

jerch reopened this Oct 30, 2018

Tyriar reviewed Oct 31, 2018

View reviewed changes

src/BufferLine.ts Show resolved Hide resolved

src/Terminal.ts Outdated Show resolved Hide resolved

jerch added 2 commits November 7, 2018 00:56

compromise between code safety and speed

e9906f9

Merge branch 'master' into reuse_bufferlines

22e4a2d

jerch self-assigned this Nov 15, 2018

jerch added this to the 3.9.0 milestone Nov 15, 2018

Merge branch 'master' into reuse_bufferlines

ef6f0db

Tyriar changed the title ~~first attempt to recycle buffer lines~~ Recycle buffer lines Nov 18, 2018

Tyriar requested changes Nov 18, 2018

View reviewed changes

src/common/CircularList.ts Outdated Show resolved Hide resolved

src/common/CircularList.ts Outdated Show resolved Hide resolved

typings/xterm.d.ts Outdated Show resolved Hide resolved

jerch added 3 commits November 20, 2018 21:49

throw error in recycle with buffer not full

3964bb2

remove experimental flag for recycling, merge with TypedArray setting

15c4cc8

Merge branch 'master' into reuse_bufferlines

895d6fc

jerch force-pushed the reuse_bufferlines branch from 9ee0b51 to 895d6fc Compare November 20, 2018 21:05

Tyriar approved these changes Nov 22, 2018

View reviewed changes

Tyriar and others added 2 commits November 22, 2018 10:48

Merge branch 'master' into reuse_bufferlines

2687fd0

Merge branch 'master' into reuse_bufferlines

2f453f9

jerch merged commit b4faaef into xtermjs:master Nov 22, 2018

Recycle buffer lines #1731

Recycle buffer lines #1731

Uh oh!

Conversation

jerch commented Oct 8, 2018 • edited by Tyriar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jerch commented Oct 9, 2018

Uh oh!

jerch commented Oct 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerch commented Oct 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerch commented Oct 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerch commented Oct 26, 2018

Uh oh!

Uh oh!

Tyriar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerch commented Oct 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerch commented Oct 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerch commented Nov 7, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerch commented Nov 20, 2018

Uh oh!

Tyriar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jerch commented Oct 8, 2018 •

edited by Tyriar

Loading

jerch commented Oct 10, 2018 •

edited

Loading

jerch commented Oct 10, 2018 •

edited

Loading

jerch commented Oct 22, 2018 •

edited

Loading

jerch commented Oct 30, 2018 •

edited

Loading

jerch commented Oct 30, 2018 •

edited

Loading