Skip to content

Commit 70da16a

Browse files
committed
fix(http/prom): record bodies when eos reached
this commit fixes a bug discovered by @alpeb, which was introduced in proxy v2.288.0. > The associated metric is `outbound_http_route_request_statuses_total`: > > ``` > $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 5 > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error="UNKNOWN"} 5 > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error="UNKNOWN"} 10 > ``` > > The problem was introduced in `edge-25.3.4`, with the proxy `v2.288.0`. > Before that the metrics looked like: > > ``` > $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error=""} 193 > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 96 > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error=""} 96 > ``` > > So the difference is the non-empty value for `error=UNKNOWN` even > when `https_status` is 2xx, which `linkerd viz stat-outbound` > interprets as failed requests. in #3086 we introduced a suite of route- and backend-level metrics. that subsystem contains a body middleware that will report itself as having reached the end-of-stream by delegating directly down to its inner body's `is_end_stream()` hint. this is roughly correct, but is slightly distinct from the actual invariant: a `linkerd_http_prom::record_response::ResponseBody<B>` must call its `end_stream` helper to classify the outcome and increment the corresponding time series in the `outbound_http_route_request_statuses_total` metric family. in #3504 we upgraded our hyper dependency. while doing so, we neglected to include a call to `end_stream` if a data frame is yielded and the inner body reports itself as having reached the end-of-stream. this meant that instrumented bodies would be polled until the end is reached, but were being dropped before a `None` was encountered. this commit fixes this issue in two ways, to be defensive: * invoke `end_stream()` if a non-trailers frame is yielded, and the inner body now reports itself as having ended. this restores the behavior in place prior to #3504. see the relevant component of that diff, here: <https://github.com/linkerd/linkerd2-proxy/pull/3504/files#diff-45d0bc344f76c111551a8eaf5d3f0e0c22ee6e6836a626e46402a6ae3cbc0035L262-R274> * rather than delegating to the inner `<B as Body>::is_end_stream()` method, report the end-of-stream being reached by inspecting whether or not the inner response state has been taken. this is the state that directly indicates whether or not the `ResponseBody<B>` middleware is finished. X-ref: #3504 X-ref: #3086 X-ref: linkerd/linkerd2#8733 Signed-off-by: katelyn martin <[email protected]>
1 parent d6cc305 commit 70da16a

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

linkerd/http/prom/src/record_response.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,8 @@ where
268268
Some(Ok(frame)) => {
269269
if let trls @ Some(_) = frame.trailers_ref() {
270270
end_stream(this.state, Ok(trls));
271+
} else if this.inner.is_end_stream() {
272+
end_stream(this.state, Ok(None));
271273
}
272274
}
273275
Some(Err(error)) => end_stream(this.state, Err(error)),
@@ -278,7 +280,9 @@ where
278280
}
279281

280282
fn is_end_stream(&self) -> bool {
281-
self.inner.is_end_stream()
283+
// If the inner response state is still in place, the end of the stream has not been
284+
// classified and recorded yet.
285+
self.state.is_none()
282286
}
283287
}
284288

0 commit comments

Comments
 (0)