Add detailed view of latency by emilk · Pull Request #10173 · rerun-io/rerun

emilk · 2025-06-09T03:24:15Z

What

This adds a breakdown of where the end-to-end latency is coming from. It automatically appears in the top panel when latency is high. Hover it to see details:

Notable changes

The latency metrics are no longer opt-in. We always show them now if they get surprisingly high.

The measurements are made by adding extra metadata to the record batches at IPC encode/decode time.

We already have accurate timestamps at log-time (RowId) and batch-time (ChunkId).

Testing

Apply this diff:

diff --git a/tests/rust/log_benchmark/src/image.rs b/tests/rust/log_benchmark/src/image.rs
index e36478262a1..877c3e0bce4 100644
--- a/tests/rust/log_benchmark/src/image.rs
+++ b/tests/rust/log_benchmark/src/image.rs
@@ -26,8 +26,7 @@ fn prepare() -> Vec<u8> {
 fn execute(mut raw_image_data: Vec<u8>) -> anyhow::Result<()> {
     re_tracing::profile_function!();
 
-    let (rec, _storage) =
-        rerun::RecordingStreamBuilder::new("rerun_example_benchmark_").memory()?;
+    let rec = rerun::RecordingStreamBuilder::new("rerun_example_benchmark_").connect_grpc()?;
 
     for i in 0..NUM_LOG_CALLS {
         raw_image_data[i] += 1; // Change a single pixel of the image data, just to make sure we transmit something different each time.

The run this in one terminal:

pixi run rerun-release

and in another:

cargo run -p log_benchmark --release -- --benchmarks image --profile

Findings

In a test where I encode a bunch of uncompressed 1024² RGBA images (tests/rust/log_benchmark/src/image.rs).
I found that by far the slowest part is the IPC encoding of arrow data (yes, this is a release build):

zooming in:

This suggests that if we simply encode the chunks in parallel instead on N cores, we will get a N-times speedup (when the actual logging thread is not the bottleneck).

This also gives a likely reason for the recent regression: we switched from arrow2 to arrow-rs.

Perhaps we could also spend some time optimizing arrow::ipc::writer::StreamWriter

github-actions · 2025-06-09T03:25:38Z

Web viewer built successfully. If applicable, you should also test it:

I have tested the web viewer

Result	Commit	Link	Manifest
✅	`a558811`	https://rerun.io/viewer/pr/10173	`+nightly` `+main`

^{Note: This comment is updated whenever you push a commit.}

nikolausWest · 2025-06-09T09:23:53Z

Really great stuff! So great to see this breakdown so easily.

This suggests that if we simply encode the chunks in parallel instead on N cores, we will get a N-times speedup (when the actual logging thread is not the bottleneck).

We should keep in mind here though that in the user's system, it may not be desirable for the logging to take up all the resources. Don't know what the answer is but this might need to be a setting (using many threads to improve latency is obviously great ion an off-device bridge but perhaps not on device in production).

Perhaps we could also spend some time optimizing arrow::ipc::writer::StreamWriter

Any improvements we can do to the single threaded case would be great in both cases =)

jprochazk · 2025-06-09T09:37:35Z

I wonder how the changes from this draft affect the measurements here:

Improve encode perf #9826

Will try it myself soonish

jprochazk · 2025-06-09T10:06:49Z

The latency metrics are no longer opt-in. We always show them now if they get surprisingly high.
The measurements are made by adding extra metadata to the record batches at IPC encode/decode time.

I would expect this to be part of the "show performance metrics" setting. How much overhead does this add to encode/decode time?

emilk · 2025-06-09T16:16:10Z

The latency metrics are no longer opt-in. We always show them now if they get surprisingly high.
The measurements are made by adding extra metadata to the record batches at IPC encode/decode time.

I would expect this to be part of the "show performance metrics" setting. How much overhead does this add to encode/decode time?

We need to insert the measurement on the SDK side, without knowing the settings for the viewer. But this is only a single time measurement per chunk, so the added overhead is minimal.

emilk · 2025-06-09T16:33:42Z

-        // Show queue latency on hover, as that is part of this.
-        // For instance, if the framerate is really bad we have less time to ingest incoming data,
-        // leading to an ever-increasing input queue.
-        let rx = app.msg_receive_set();
-        let queue_len = rx.queue_len();
-        let latency_sec = rx.latency_nanos() as f32 / 1e9;
-        // empty queue == unreliable latency
-        if 0 < queue_len {
-            response.on_hover_ui(|ui| {
-                ui.label(format!(
-                    "Queue latency: {}, length: {}",
-                    latency_text(latency_sec),
-                    format_uint(queue_len),
-                ));
-
-                ui.label(
-                    "When more data is arriving over network than the Rerun Viewer can ingest, a queue starts building up, leading to latency and increased RAM use.\n\
-                         We call this the queue latency.");
-            });
-        }


I removed the old "queue latency", which was just measuring the latency from receiving data until ingesting it; which we now cover in another way with the new code

nikolausWest · 2025-06-09T16:34:08Z

Perhaps a bit silly but did a Chatgpt deep research to look for opportunities to improve performance. May be something in there of interest

Performance of Arrow StreamWriter in Rerun.pdf

Co-authored-by: Andreas Reich <andreas@rerun.io>

emilk · 2025-06-11T04:00:04Z

This was introduced via #10173 and led to breakage on the dataplatform side.

### Related * #10173 ### What The removal of this variant led to breakage on the dataplatform side.

## Related * Investigates #9812 * Closes #9973 ## What This adds a breakdown of where the end-to-end latency is coming from. It automatically appears in the top panel when latency is high. Hover it to see details: ![image](https://github.com/user-attachments/assets/f73a0b3a-365a-412b-aa7e-f6e4d3a95620) ## Notable changes The latency metrics are no longer opt-in. We always show them now if they get surprisingly high. The measurements are made by adding extra metadata to the record batches at IPC encode/decode time. We already have accurate timestamps at log-time (`RowId`) and batch-time (`ChunkId`). ## Testing Apply this diff: ``` diff --git a/tests/rust/log_benchmark/src/image.rs b/tests/rust/log_benchmark/src/image.rs index e364782..877c3e0bce4 100644 --- a/tests/rust/log_benchmark/src/image.rs +++ b/tests/rust/log_benchmark/src/image.rs @@ -26,8 +26,7 @@ fn prepare() -> Vec<u8> { fn execute(mut raw_image_data: Vec<u8>) -> anyhow::Result<()> { re_tracing::profile_function!(); - let (rec, _storage) = - rerun::RecordingStreamBuilder::new("rerun_example_benchmark_").memory()?; + let rec = rerun::RecordingStreamBuilder::new("rerun_example_benchmark_").connect_grpc()?; for i in 0..NUM_LOG_CALLS { raw_image_data[i] += 1; // Change a single pixel of the image data, just to make sure we transmit something different each time. ``` The run this in one terminal: ```sh pixi run rerun-release ``` and in another: ```sh cargo run -p log_benchmark --release -- --benchmarks image --profile ``` ## Findings In a test where I encode a bunch of uncompressed 1024² RGBA images (`tests/rust/log_benchmark/src/image.rs`). I found that by far the slowest part is the IPC encoding of arrow data (yes, this is a release build): <img width="727" alt="Image" src="https://github.com/user-attachments/assets/2f9b4856-f9ae-48b2-abb7-2dcefafc17b8" /> zooming in: <img width="679" alt="Image" src="https://github.com/user-attachments/assets/f0dcd198-ec22-4655-90ce-36eed1aa318e" /> This suggests that if we simply encode the chunks in parallel instead on N cores, we will get a N-times speedup (when the actual logging thread is not the bottleneck). This also gives a likely reason for the recent regression: we switched from `arrow2` to `arrow-rs`. Perhaps we could also spend some time optimizing `arrow::ipc::writer::StreamWriter` --------- Co-authored-by: Andreas Reich <andreas@rerun.io>

### Related * #10173 ### What The removal of this variant led to breakage on the dataplatform side.

@emilk

#10222) ### Related * Fixes test regression introduced in #10173 ### What Related internal slack discussion https://rerunio.slack.com/archives/C07T0LFT5BQ/p1749799544492489 * [x] pass full check ci cc: @emilk

### Related * #10222 * #10173 * #9812 ### What My previous PR added timestamps to the arrow metadata whenever we encoded or decoded arrow IPC. This caused all sorts of test failures. In this PR, we instead only insert it in two places: * The gRPC log sink, used by the SDK * The gRPC client used by the viewer ### How to test ```sh pixi run rerun-release ``` ```sh cargo run -p log_benchmark --release -- --benchmarks image --connect ``` ### TODO * [x] Run `main` CI ### Future work * Add timestamp for when the messages hits the proxy server (`re_grpc_server`) * Or better yet: #10229 🤦

emilk added 15 commits June 8, 2025 17:21

Refactor: move IngestionStatistics to own file

94caacf

Fix typo

80d5997

Refactor latency stats

a4d0461

Measure latency from log() call to batcher

6ec1ee6

Set up logging in log_benchmark

792fdfd

Driveby: simplify ptoobuf-arrowmsg-decoding

fc8bca7

Store when a sorbet batch was encoded as IPC as arrow metadata

3b50cbb

Add insert_metadata to re_arrow_util

3e8bd16

Add when the RecordBatch was decoded to the arrow metadata

7d5c369

Add struct TimestampMetadata

a824720

Add TimestampMetadata to the parsed sorbet schema

05d7713

More fine-grained latency estimation

24941f9

Add last step of the latency

d2bd6bc

Improve profiling scopes

0e58155

Make the log_benchmark image test more realistic

6aee4ea

emilk added 🧑‍💻 dev experience developer experience (excluding CI) 📉 performance Optimization, memory use, etc include in changelog labels Jun 9, 2025

emilk added 5 commits June 8, 2025 20:41

More images in benchmark

5b65e6e

Remove dead code

be66b83

Always show latency numbers if they go over one second

f703844

Add profiling of the logging side of the benchmark

1317b08

Rename options so its value resets

015a934

emilk commented Jun 9, 2025

View reviewed changes

emilk and others added 11 commits June 10, 2025 18:37

Make individual latency measurements optional

0a7639f

Simpler schema cloning

27610d8

Make the log benchmark image sizes match

90e08fa

Improve comment around unwrap_used

f4c683c

Co-authored-by: Andreas Reich <andreas@rerun.io>

Better docstring

22eca67

Co-authored-by: Andreas Reich <andreas@rerun.io>

Lower the latency history a git

afc936c

Explain why we add the metadata to every IPC

2403036

Better naming of the constants

9a14b08

Slightly better module naming

2e1d6ff

Disable the timestamping during unit tests

24fecee

Refactor: break out long functioj

5c5b5a9

emilk added 2 commits June 10, 2025 21:06

Nicer flow-chart view of latency

07b4124

Cargo fmt

6ebb1de

Wumpf self-requested a review June 11, 2025 11:59

Merge branch 'main' into emilk/improve-latency-measurements

a558811

Wumpf approved these changes Jun 12, 2025

View reviewed changes

emilk merged commit c618728 into main Jun 12, 2025
47 checks passed

emilk deleted the emilk/improve-latency-measurements branch June 12, 2025 16:37

grtlr added a commit that referenced this pull request Jun 13, 2025

Revert removal of DecodeError::Chunk

bb77181

This was introduced via #10173 and led to breakage on the dataplatform side.

grtlr mentioned this pull request Jun 13, 2025

Revert removal of DecodeError::Chunk #10219

Merged

grtlr added a commit that referenced this pull request Jun 13, 2025

Revert removal of DecodeError::Chunk (#10219)

64f1adc

### Related * #10173 ### What The removal of this variant led to breakage on the dataplatform side.

grtlr added a commit that referenced this pull request Jun 13, 2025

Revert removal of DecodeError::Chunk (#10219)

b007051

### Related * #10173 ### What The removal of this variant led to breakage on the dataplatform side.

Wumpf mentioned this pull request Jun 13, 2025

Disable reading/writing timing metadata to/from encoded record batches #10222

Merged

1 task

emilk mentioned this pull request Jun 13, 2025

Re-enable latency measurements #10228

Merged

1 task

Wumpf mentioned this pull request Jun 16, 2025

Make log_time column implicit and potentially introduce ingest_time #1891

Closed

emilk mentioned this pull request Aug 27, 2025

Displaying real-time images results in significant memory growth #10717

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add detailed view of latency#10173

Add detailed view of latency#10173
emilk merged 37 commits into
mainfrom
emilk/improve-latency-measurements

emilk commented Jun 9, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2025 •

edited

Loading

Uh oh!

nikolausWest commented Jun 9, 2025

Uh oh!

jprochazk commented Jun 9, 2025 •

edited

Loading

Uh oh!

jprochazk commented Jun 9, 2025

Uh oh!

emilk commented Jun 9, 2025

Uh oh!

emilk Jun 9, 2025

Uh oh!

nikolausWest commented Jun 9, 2025 •

edited

Loading

Uh oh!

emilk commented Jun 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

emilk commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related

What

Notable changes

Testing

Findings

Uh oh!

github-actions Bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikolausWest commented Jun 9, 2025

Uh oh!

jprochazk commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jprochazk commented Jun 9, 2025

Uh oh!

emilk commented Jun 9, 2025

Uh oh!

emilk Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

nikolausWest commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emilk commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

emilk commented Jun 9, 2025 •

edited

Loading

github-actions Bot commented Jun 9, 2025 •

edited

Loading

jprochazk commented Jun 9, 2025 •

edited

Loading

nikolausWest commented Jun 9, 2025 •

edited

Loading

emilk commented Jun 11, 2025 •

edited

Loading