Skip to content

Conversation

@sdemjanenko
Copy link

When tracing a request, Newrelic should not stop the transaction if
the response is lazy-evaluated (e.g. Streaming). Rack's Body response
can render streaming responses by wiring an Enumerator into @body.
The server that writes to the client then calls response.each which
iterates over each part of the body (parts can be lazy-generated).

We are interested in tracing this code because DB queries, http requests
or other business logic may be running in this Enumerator.

@sdemjanenko sdemjanenko force-pushed the streaming_middleware branch 2 times, most recently from 567be53 to c225bed Compare October 25, 2019 22:36
@sdemjanenko
Copy link
Author

Sample trace:
image

When tracing a request, Newrelic should not stop the transaction if
the response is lazy-evaluated (e.g. Streaming). Rack's Body response
can render streaming responses by wiring an Enumerator into `@body`.
The server that writes to the client then calls response.each which
iterates over each part of the body (parts can be lazy-generated).

We are interested in tracing this code because DB queries, http requests
or other business logic may be running in this Enumerator.
@justinfoote
Copy link
Contributor

Hello @sdemjanenko, thanks for the contribution!

I'll pull the code down and give it a thorough review.

In the meantime, it appears that this change breaks quite a few tests on certain versions of ruby with certain versions of Rails. It would be great if you could see what you can do about these failures.

I'll also try to make sense of the failures while I'm reviewing these changes. I'll update you here if/when I find what's causing them.

Thanks again!

@sdemjanenko
Copy link
Author

Is each test suite running in parallel? It could be possible that the next test starts even though the previous response was not closed.

Its probably safest if we wrap this new behavior in a config flag to enable it, that way it can be opt-in

@justinfoote
Copy link
Contributor

The test suites each run in their own container.

It looks like there are two categories of failure here. The first are caused by a breaking change in sprockets v4.0.0 -- we've dealt with those in a separate branch that will be integrated soon, and we can ignore them for the purposes of this PR.

There are also a bunch of failures we cannot ignore. It looks like this change causes many failures in our unit test suite running in Rails 3.0, 3.1, 3.2, and 4.0.
This is an example: https://travis-ci.org/newrelic/rpm/jobs/603076064

To run these tests yourself, run this command from the main rpm directory:

bundle exec rake test:env[rails30,rails31,rails32,rails40]

Beyond that, I'd like to know a bit more about the problem that this PR solves. Can you include a screenshot of a transaction trace without the changes in this PR?
Also, can you provide a sample application that exercises this functionality?

Thanks!

@justinfoote
Copy link
Contributor

@sdemjanenko, it's been a while since I've heard from you. Before we can review or merge this PR, I'd like to understand your use case a little more.

Let me know if you'd like us to proceed. Otherwise, I'll close this PR in about two days.

Thanks!

@sdemjanenko
Copy link
Author

sdemjanenko commented Nov 20, 2019 via email

@justinfoote
Copy link
Contributor

This sounds great! And we're happy to keep this PR open as long as you need. We just don't want to keep PRs around that have been forgotten about.

Thanks for all the effort you're putting into this!

sdemjanenko added a commit to sdemjanenko/newrelic_rpm_sample_streaming_app that referenced this pull request Nov 25, 2019
This is the proposed patch in github.com/newrelic/newrelic-ruby-agent/pull/302. To run
this patch set `NEWRELIC_PATCH=true` when calling `rails server`.
@sdemjanenko
Copy link
Author

Here is a demo app: https://github.com/sdemjanenko/newrelic_rpm_sample_streaming_app

You will need to configure Newrelic to report to your own application. Once that is done if you run bundle exec rails server and visit http://localhost:3000/demo you should see text that looks like JSON slowly appear in chunks over ~10 seconds. If you look at your Newrelic trace you will not see any calls to expensive_method_call.

Stop rails and restart it with NEWRELIC_PATCH=true bundle exec rails server. When you reload the page you will still see it appear in chunks. When you look at your Newrelic trace you will now see a segment for body_each and inside that you will see 100 calls to expensive_method_call. Moreover, the Newrelic trace will show a time of ~10 seconds whereas before it showed a time of <10 ms.

@rachelleahklein
Copy link

Hi @sdemjanenko,

Thanks for the repro! It really helps us see the before-and-after impact of this work. Very nice.

We will resume reviewing your PR and will be in touch about its possible inclusion in an upcoming version of the agent. In the meantime, I encourage you to look into the test failures Justin mentioned:

There are also a bunch of failures we cannot ignore. It looks like this change causes many failures in our unit test suite running in Rails 3.0, 3.1, 3.2, and 4.0.
This is an example: https://travis-ci.org/newrelic/rpm/jobs/603076064

To run these tests yourself, run this command from the main rpm directory:

bundle exec rake test:env[rails30,rails31,rails32,rails40]

Again, thanks, and we look forward to continuing the conversation here!

@sdemjanenko
Copy link
Author

Could I get an overview of how the testcases end up running? Specifically I am looking at run_test_case in test/new_relic/agent/distributed_tracing_cross_agent_test.rb. It isn't clear to me what the test_case executes. I suspect the code that is running the test cases isn't wired up to call body.close when will finish the first middleware segment.

@rachelleahklein
Copy link

Hi, @sdemjanenko. Upon closer inspection, it looks like the tests in question have to do with some very specific New Relic implementation details, so we will handle looking into those failures as well as we evaluate your PR. No action is needed on your part at this time.

We'll be in touch. Thanks again for all your work!

@sdemjanenko
Copy link
Author

As I was looking through the code and tests I noticed that the middleware instrumentation may not get set up (if disable_middleware_instrumentation is set). From what I can tell in this case we probably also need to patch:
https://github.com/newrelic/rpm/blob/master/lib/new_relic/rack/agent_hooks.rb#L30

Does that analysis seem reasonable?

@mwlang
Copy link
Contributor

mwlang commented Dec 13, 2019

@sdemjanenko, first let me just say I really enjoyed your medium blog post on the topic of streaming chunks of data with an Enumerator. (Oops! I just re-read that article and realized it was written by Noel
Rappin)
. That was a great background piece on helping us understand your implementation here as well as your use case.

We reviewed your submission at length internally and with respect to your question about patching agent_hooks middleware.

What we noticed in our review is that the new additions to the new functionality added to NewRelic::Agent::Instrumentation::MiddlewareTracing does not make any material changes to the flow that is already there, but it does take on performing some more work with the end result as well as ensuring the transaction/segment is closed when appropriate.

This highly suggests that we can separate this new functionality into its own Rack app. If we can successfully separate this new functionality into it's own Rack App, then we solve three major concerns at once:

  1. Avoid giving MiddlewareTracing module's #call method additional and unrelated responsibility.
  2. The Rack App is now first-class citizen in the middleware stack and can do it's intended processing regardless of disable_middleware_instrumentation setting.
  3. Ability to opt in/out of this level of traceability with a configuration setting.

Some initial thoughts into how this may shape up:

  • create a new Rack app, NewRelic::Rack::StreamingMonitoring inheriting from AgentMiddleware
  • move new functionality into this Rack App.
  • install into the middleware stack by adding it to the list here: NewRelic::Control::Frameworks::Rails#init_config

These changes would make your PR much more modular and testable while better fitting into the overall design of the New Relic Agent. Additionally, we could introduce a configuration flag to opt-in/opt-out of this level of tracing.

Would you be willing to take on this refactoring effort? If so, we are more than happy to guide you through it and support your effort along the way.

@sdemjanenko
Copy link
Author

@mwlang I was thinking the same thing. It definitely seems cleaner to have a clear middleware layer which controls this behavior. I just made a first pass at this middleware: sdemjanenko@28383db

One thing to note, it is important that this middleware controls the outer-most segment. In my testing I had noticed strange traces where an inner segment is not marked finished before the outer-most transaction is marked finished.

@mwlang
Copy link
Contributor

mwlang commented Dec 16, 2019

@sdemjanenko I'm starting to look at the implementation you put together. Regarding your comment about inserting at top of middleware stack, would inserting the same way browser monitoring is, but just before browser monitoring accomplish the desired processing order? I think the prerequisite here is simply that this new streaming tracer is the first NR middleware encountered in the stack to ensure the inner segment is closed before the outer transaction. See in-line comments on your commit.

@mwlang
Copy link
Contributor

mwlang commented Dec 27, 2019

@sdemjanenko, it's been a while since I've heard from you. I'd love to hear your thoughts on our last exchange and if we can help with anything.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@rachelleahklein
Copy link

Hi @sdemjanenko, we wanted to check in with you again on this pull request. We really appreciate the work you've done so far. Can we do anything to help you bring it to completion (i.e. addressing our inline comments on your most recent commit)? If you don't have the time to work on this at the moment, we can also mark it as "help wanted" to encourage contributions from others. Please let us know your preference.

Again, thank you so much for all you've done so far, and we're looking forward to hearing from you.

@mwlang mwlang closed this Jun 30, 2020
@tannalynn
Copy link
Contributor

Hello @sdemjanenko
We recently renamed our master branch to main. This closed all open pull requests, including this one.
Please reopen a pull request to our main branch for these changes.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants