Skip to content

Conversation

@clrudolphi
Copy link
Contributor

@clrudolphi clrudolphi commented Aug 15, 2024

Adds support for creation of Cucumber Messages during the execution of Reqnroll generated tests.
See Cucumber Message for information about the output format.

🤔 What's changed?

Changes are in three main area:

  1. Feature Generation: during feature-file code-behind code generation, the Gherkin toolchain is used to create the first few Messages for the feature. Factory methods are generated for each of these message types. These are attached to the FeatureInfo object.
  2. Cucumber Message Publication: written as an embedded plugin, this component listens for Reqnroll ExecutionEvents and converts them, and their attached state, to Cucumber Messages. These are 'published' to one or more consumers.
  3. Message Formatters: two embedded runtime plugins, these plugins register with the Publishing component to receive the Messages as they are released by the Publisher. One stores them directly to an .ndjson file, the other converts the messages into an HTML representation of the test run (using the Cucumber HTML Formatter).

Messages are strings serialized from the CucumberMessage Envelope format.

Configuration support is added as a new section of the reqnroll.json configuration file. (Without interfering with the existing configuration schema and config reading mechanisms). Configuration settings can also be overridden by use of environment variables.

⚡️ What's your motivation?

To provide future compatibility with the broader Cucumber eco-system of tooling.
To provide the input to a future implementation of a Reqnroll replacement to LivingDocs.

🏷️ What kind of change is this?

  • ⚡ New feature (non-breaking change which adds new behaviour)

♻️ Anything particular you want feedback on?

Scope

  • What standard 'formatters' should be included out-of-the-box? So far I have implemented a 'Messages' formatter that simply serializes the Message envelopes to a file, and the 'html' formatter which generates a static html file from the envelopes.
    Other options might include:
    -- Publish (to Cucumber's publicly hosted anonymous report site; I have a prototype of this working so easy to add)
    -- json (an older format the preceded Messages, but might be useful for those using older reporting/tracking mechanisms)

Testing

  • There are two types of integration tests. BASIC tests simply execute a feature with very little validation. I wrote those during development so that I could quickly see what the code-generator was doing; and later used them as quick smoke tests. The other tests, most of which are based off of the Cucumber Messages CCK, use approval style testing in that the output of the run is compared against a golden copy of the expected output. Should BASIC tests be moved to approval style tests or simply eliminated now that they're not as valuable?

  • The Approval-style tests are largely based upon the CCK and broken into three categories: CCKScenarios (these pass), NonCompliantCCKScenarios(scenarios from the CCK that Reqnroll can't support), and NonCCKScenarios (scenarios that Reqnroll does support but for which there are no corresponding CCK Scenarios)

  • What generation scenarios are missing?

  • What runtime scenarios or combinations are missing?

  • Should the Attachments non-passing CCK scenario be reworked into a passing non-CCK scenario?

  • Should compatibility with attachment handling by all three Test Frameworks be tested?

  • Which areas are missing Unit Tests that need them?

  • The BASIC and Approval tests use the SystemTest framework, which runs using MsTest. Should some testing be conducted under the other Test Frameworks for compatibility testing?

  • Unit Tests: Unit tests have been created for the more complicated classes that have significant behavior. There are other data holding classes (some with simple behaviors) that do not have tests.

  • Writing the unit tests has highlighted a high-degree of coupling between classes and complicated flows. Would appreciate any ideas/suggestions for simplification of the design.

Runtime

  • Is the start-up mechanism (embedded plugins) acceptable? <<Answered: use of IRuntimePlugin has been eliminated>>
  • Is the coordination between sinks and the broker acceptable?
  • The File writing Sinks represent the first time in the CORE Reqnroll library that any code is doing physical IO. Is that OK or should it be in a separate project that is injected?
  • Is the way I've used subscriptions to plugin events acceptable? <<Moot, no longer using IRuntimePlugin>>
  • Is the way I've used subscriptions to ExecutionEvents acceptable?
  • Is the instancing strategy OK? (One tracking object per Feature, one test-case level tracking object per test case, split of definition from execution records)
  • What threading situations have I not handled?
  • Are the modifications to the ExecutionEvent classes acceptable? (Added constructor arguments, properties, etc)
  • Async adoption into ExecutionEvent listeners and further into TestExecutionEngine and TestRunner (specifically SkipScenarioAsync and SkipStepAsync); marked synchronous ExecutionEventListener as Obsolete. Is this OK?
  • Are the dependencies taken on FeatureContext and ScenarioContext acceptable?
  • Are the changes made to FeatureInfo and ScenarioInfo acceptable?
  • A certain amount of set-up busy work is required to transform from Gherkin.AST Messages classes to Io.Cucumber.Messages classes. Should we push Gherkin to adopt the Messages classes and do away with its own subset?
  • Can we rely on plugins that provide their own behavior/override of IReqnrollOutputHelper to invoke the base class behavior so that Messages code can properly hook in to the flow of information? (worried about Attachments)
  • There are several places where I added class constructor overrides that added parameters for Messages data. Should the old constructors be marked as Obsolete and internal RnR code that invokes them be refactored? (I really did this to avoid the need to rework a bunch of the existing tests, ex: FeatureInfo)
  • Serialization: The System.Text.Json serializer provides an option to specify which Text Encoder to use. The one that so far has provided the best interoperability results is the Web.JavascriptEncoder.UnsafeRelaxedJsonEscaping encoder. Is that acceptable?
  • Attachment handling: When binding/stepDefinition code calls ReqnrollOutputHelper.AddAttachment(), the only thing stored is the filename/path to the attachment. Later, as the TestRun is completed and messages are generated, the CucumberMessageFactory pulls the file content from the file path and Base64 encodes it.
    • Is that deferred action acceptable (or should it be pulled immediately)?
    • Should the content be pulled from the test execution framework TestContext rather than going directly to the file system?
    • The code reads the entire attachment into memory and then converts to Base64. Should this be refactored to use a buffer/stream approach so that large attachments don't affect system memory pressure?

Configuration

  • While Messages subsystem configuration is contained within the reqnroll.json file, it is kept in a separate section and read and parsed separately from regular reqnroll configuration. Is that acceptable or should it be merged with regular reqnroll configuration handling logic? Should the settings themselves be moved under an existing json section?
  • Anything that can done now to better align with eventual use of Microsoft.Configuration?
  • The default ndjson file name is reqnroll_report.ndjson. Should the Assembly name be prefixed to this? (This might? be needed to avoid conflicts when running multiple assemblies simultaneously as described in parallel-by-process in the docs)

Generation

  • Are the intrusions into the generated test method code acceptable? (m_pickleID variable, pickleIndex in the row-test data arguments)
  • What adjustments should be made now to more easily duplicate this approach with the Roslyn generator?
  • Code gen is wrapped in a try/catch that swallows any exception and then nulls out the generated functions. This avoids code gen failure when something is wrong with GherkinDoc and Pickles but at the cost of transparency. Is that OK?
  • Any tracing I was doing to help during development has been removed. Is there a sanctioned way for RnR to emit internal tracing/debugging logs that doesn't interfere with regular tool output?

Compliance/Features

  • The Meta message should include information about the git branch/commit and the CI in use; using the set of Environment Variables per build system that my research on the internet should be used, but I have no way of testing/confirming that these are correct as I don't have those build systems available.

Other:

  • The IFileService and IDirectoryService (and implementations) in the Analytics/UserId folder are somewhat duplicative of the IFileSystem in Reqnroll.Utils. (And weirdly placed as they're not generally discoverable for use outside of Analytics). Would you mind some refactoring and consolidation into an abstraction of File & Directory operations that can be used throughout the projects?

📋 Checklist:

  • I've changed the behaviour of the code
    • I have added/updated tests to cover my changes.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "[vNext]" section of the CHANGELOG, linking to this pull request & included my GitHub handle to the release contributors list.

@clrudolphi clrudolphi requested a review from gasparnagy August 15, 2024 16:21
@clrudolphi
Copy link
Contributor Author

FYI, github Build is failing b/c one produced component (FileSinkPlugin) gets published only to my local Nuget store. It can then not be found by the build pipeline in github.

@clrudolphi clrudolphi added the enhancement New feature or request label Aug 17, 2024
@clrudolphi
Copy link
Contributor Author

clrudolphi commented Aug 19, 2024

First end-to-end working scenario is operational.
Needed out of a PR review:

  • Overall code structure
  • Patterns that have been used (good/bad) and patterns that should have been used
  • Thread-safety: is the level of thread awareness and approach to thread-safety appropriate and adequate?

Next:

  • Copy the Cucumber Compatibility Kit scenarios into an integration test suite and build out the structure to build, invoke, and confirm compliance.

@clrudolphi clrudolphi force-pushed the feature-CucumberMessages branch from 364e466 to 1f2f1e6 Compare September 4, 2024 18:30
@clrudolphi
Copy link
Contributor Author

Build is failing b/c I have a private build of Cucumber Messages nuget package. When the next release of that comes out, I will patch up the nuget references in this branch.

@clrudolphi clrudolphi self-assigned this Sep 16, 2024
@gasparnagy
Copy link
Contributor

Build is failing b/c I have a private build of Cucumber Messages nuget package. When the next release of that comes out, I will patch up the nuget references in this branch.

This is fine. We have feed https://www.myget.org/feed/Packages/reqnroll-unstable where we could publish interim releases (of dependencies as well) and I think this feed is added to the nuget sources, so the build would be able to resolve it though them. But I never tested this model yet.

@clrudolphi clrudolphi force-pushed the feature-CucumberMessages branch from 5962ab0 to 4ccd7ae Compare September 28, 2024 22:07
@gasparnagy
Copy link
Contributor

In order to fix the System.Text.Json loading problem I have found a fix: We need to ensure that the necessary System.Text.Json related assemblies are packaged with our generator, so that they can be loaded in frameworks that does not have this version.

In order to do that, you need to add these lines to the Reqnroll.Tools.MsBuild.Generation.nuspec file (to the "files" section):

    <!-- Ensure that the needed version of System.Text.Json and its dependencies are available for the generator. -->
    <file src="bin\$config$\netstandard2.0\System.Text.Json.dll" target="tasks\$Reqnroll_Core_Tools_TFM$" />
    <file src="bin\$config$\netstandard2.0\Microsoft.Bcl.AsyncInterfaces.dll" target="tasks\$Reqnroll_Core_Tools_TFM$" />
    <file src="bin\$config$\netstandard2.0\System.Text.Encodings.Web.dll" target="tasks\$Reqnroll_Core_Tools_TFM$" />

@gasparnagy
Copy link
Contributor

@clrudolphi while working on the diagnosis of this, I noticed that there is now a generic catch in UnitTestFeatureGenerator.PersistStaticCucumberMessagesToFeatureInfo (line 247) that hides different assembly loading issues (e.g. if a dependency of the System.Text.Json is missing). As a result it hides the error and an empty pickle collection is initialized that causes an index out of range error when calling PickleJar.CurrentPickleId. (You can test if you comment out the last two lines of the fix I pasted.)

We would need to find a way to propagate these errors (should we really catch?), or at least somehow make sure that it is possible to figure out what is wrong without debugging. (Now I found them by putting a Debugger.Launch(); inside that catch block.)

@clrudolphi
Copy link
Contributor Author

Thanks for pointing that out.
The intent of the catch is to prevent picking problems from interfering with test generation.
I'll improve the error handling here by logging the exception and adding guards to the use of the picking information later during code generation.

@clrudolphi
Copy link
Contributor Author

I believe this is now ready for a shakedown and code review. I've fixed the issues that I'm aware of and completed my punch-list. Documentation is updated. I've tested it with multiple Features running in parallel and desk-checked it for thread safety for the upcoming Scenario parallel execution model.

@clrudolphi
Copy link
Contributor Author

It looks like Cucumber Messages schemas are changing in an upcoming release PR102. This will require some changes to logic within the Messages implementation.
If you can, please go ahead and provide review comments on the subsystem as it exists now; as I'm sure there is plenty to be improved.

@clrudolphi clrudolphi marked this pull request as ready for review October 24, 2024 15:36
@clrudolphi
Copy link
Contributor Author

It looks like Cucumber Messages schemas are changing in an upcoming release PR102. This will require some changes to logic within the Messages implementation. If you can, please go ahead and provide review comments on the subsystem as it exists now; as I'm sure there is plenty to be improved.

Messages v27 and CCK updates have been published. Today's commits update our Messages implementation to support this release. This adds a HookType enumeration for Hooks and a TestRunStartedId attribute to the TestRunStarted and TestRunFinished messages.

@clrudolphi
Copy link
Contributor Author

Behavior Question: Retry and Undefined Steps.
According to the Cucumber Compatibility Kit, when a test fails because there is an UNDEFINED step, it is expected that there will be no Messages output for that Test Case (aka, Scenario/TestRow) after the initial failure. That is, they implicitly expect that the Cucumber retry mechanism will recognize that the failure was due to an undefined step and thus will not retry.

When experimenting with xRetry.Reqnroll, I find that this framework does not behave that way. It does not recognize the undefined step failure and proceeds with retries (up to to its configured limit).
(I haven't yet attempted this with the other retry implementations for Reqnroll yet).

Question: which component should be responsible for ensuring compatibility with Cucumber Messages expectations? Should we expect each Retry implementation to behave as expected by Cucumber? Or, should the Messages implementation recognize the Undefined step situation and ignore any subsequent retry executions of that Test Case?
The former is certainly the "proper" way, but requires cooperation from multiple projects and maintainers. The latter is a cover-up and a hack, but I think I can make it work.

NB - curiously, the CCK is silent on the expected behavior of tests containing PENDING steps. One would think the same rule would apply (no need to retry something doomed to fail).

@clrudolphi
Copy link
Contributor Author

clrudolphi commented Jan 23, 2025

Retry and TestCaseFinished messages -- non-compliance with CCK.

When a TestCase completes, Messages writes a TestCaseFinished message, such as this:

{"testCaseFinished":{"testCaseStartedId":"54","timestamp":{"seconds":1737648321,"nanos":361094000},"willBeRetried":false}}

Note the "willBeRetried" boolean flag at the end. Ideally, we would be aware at run-time whether failed test case executions might be retried and emit an appropriate value here. Unfortunately, the responsibility for retries is outside the control of Reqnroll (either built in to the test framework (such as with VSTest and nUnit) or the responsibility of a Reqnroll plugin.

Since we'll never know beforehand whether a failed execution will be retried, this message field is hard-coded to "false".

The current implementation writes out a Test Case's worth of Messages upon receipt of a ScenarioFinishedEvent. A retry of that testcase involves Reqnroll publishing another set of ScenarioStarted, StepStarted/Finished, ScenarioFinished. So we recognize that a retry is being attempted when a ScenarioStarted event is processed for a scenario that we've already executed. But by then the Messages for that prior execution have already been written out. With some extensive work, we might be able to queue them up and only write them upon receipt of the AfterTestRun event (and take that opportunity to patch up 'willBeRetried' values for any test cases that were retried.)

How important is it for our Messages implementation to be compliant on this point? Is it worth the refactoring work?

EDIT: Resolved. 'willBeRetried' supported by deferring publication until end of test run allowing us to identify retries and patch up the 'willBeRetried' value.

@Code-Grump
Copy link
Contributor

Code-Grump commented Jan 23, 2025

First of all, those are great finds. Good job on such thorough investigation.

Onto the problem itself: ideally, a good API design prevents invalid states, but in this instance, I would suggest we allow plug-ins to publish messages that would be considered invalid. We're treating Cucumber Messages as a mechanism for providing visibility of test execution and if the plugin causes invalid messages to be published, it's on the plugin to decide whether it's important to its users to be fully compatible with the spec. It may be completely compatible with the tools in use by the plugin users!

The alternative would be to fail, either loudly causing a runtime failure for a user of the retry plugin, or silently and make the whole problem really opaque. I don't believe either of those are desirable failure modes, as both cause issues for the end-user that either stop them running tests or give them mysteriously incomplete output.

@clrudolphi
Copy link
Contributor Author

clrudolphi commented Jan 24, 2025

we might be able to queue them up and only write them upon receipt of the AfterTestRun event (and take that opportunity to patch up 'willBeRetried' values for any test cases that were retried.)

An alternative that occurred to me is to create an Interface that retry plugins must implement that we would call upon during handling of ScenarioFinished event. The interface would allow the retry plugins the opportunity to tell us whether a failed test case would be retried. I would implement a Null implementation of this interface and register it in the global container. Retry plugins would implement the interface and replace the null impl with their own in the global container.

Thoughts? Too complicated?

EDIT: resolved - see comment below

Updated the configuration.md file.
Modified the constants file to make use of 'Directory' consistent across the configuration and constants.
…uired refactoring and duplication of the retry feature and expected results.
…o that actual Messages are sorted by PickleID of the test case(Scenario). This will ensure that Attempt counts are properly compared.
clrudolphi and others added 4 commits July 21, 2025 07:53
Renamed test class to reflect naming of Disabled environment variable.
This commit introduces the `FormattersForcedDisabledOverrideProvider` class, which implements the `IFormattersConfigurationDisableOverrideProvider` interface and always returns `true` for the `Disabled` method.

Additionally, the `BindingProviderService.cs` file has been updated to register this new provider in the dependency injection container so that the formatters do not run when only binding discovery is being run.
…rnalDataPlugin, the Location.Line of each row is the line of the HeaderRow (if not zero/null) or the last line of the Steps (plus 1).
@gasparnagy
Copy link
Contributor

@clrudolphi I have reviewed the "Anything particular you want feedback on?" section and as far as I can tell, the solutions that are chosen for the PR are good for now. Some notes.

  • The delayed loading of the attachment file is OK. If you run the tests, you anyway only see the last version of a file. We will see anyway if this causes a problem.
  • The attachment buffering performance is also something I would not deal with right now. Let's see if it causes problems with the real usage.
  • "The IFileService and IDirectoryService (and implementations) in the Analytics/UserId folder are somewhat duplicative of the IFileSystem in Reqnroll.Utils" - we can handle this separately later.
  • "Can we rely on plugins that provide their own behavior/override of IReqnrollOutputHelper to invoke the base class behavior" - yes, this is not intended to be overwritten by plugins in normal situations.
  • For many other topics, the current solution seems to be fine, but we will see with the real usage if we need adjustments.

I keep doing some testing / review on initialization and performance, but I will make comments on it separately.

Introduce `DeterministicIdGenerator` for generating deterministic GUIDs using SHA-1 hashing and UUIDv5.  This supports the need for Test Class generation to render the exact same source string if the Feature text hasn't changed.

Update `CucumberMessagesConverter` to make `ConvertToCucumberMessagesSource` static for improved usability.
Modify `UnitTestFeatureGenerator` to utilize the static method.
Add `UUIDv5` class for UUID version 5 generation.
Introduce ShortGuidIdGenerator class implementing IIdGenerator to generate unique, URL-safe Base64 strings. Update DefaultDependencyProvider to register the new generator and remove the old GuidIdGenerator. Add unit tests to ensure ID uniqueness, non-emptiness, and safe disposal behavior.
@fredrikeriksson
Copy link

  • The attachment buffering performance is also something I would not deal with right now. Let's see if it causes problems with the real usage.

I keep doing some testing / review on initialization and performance, but I will make comments on it separately.

I have some feedback regarding this, useful or not but I hope it gives some pointers :)

I've ran around 800 tests on 20 cores in under 1 second according to stats in html output (latest ci package) so at the moment performance doesn't seem any issue at all.
Granted they're all unit tests of domain requirements so they're quick in nature and run in parallell but at the same time they should stress the solution better than slow tests I guess.

This was, by the way, executed with TUnit and "Start Without Debugging".
I expect this to increase with some couple of hundreds in the coming days so I'll keep monitoring this.

@fredrikeriksson
Copy link

fredrikeriksson commented Jul 23, 2025

I also found that there is no ordering of features in the html output, which is a tad annoying.
One would at least expect regular string sorting, but numerical ordering would be a bonus albeit that requires more effort until .NET10 arrives, unless something equal already exists in Renroll ofc.
StringSort: feature10 feature2
NumericOrdering: feature2 feature10

Interesting. I'm not sure what ordering is used (this is controlled by the Cucumber/React components). We're borrowing several libraries and components from Cucumber. The Cucumber approach to this is the Stream metaphor, things appear in the order they are processed. My experience with this via my testing is that the Features are displayed in some sort of time order but not sure whether that is Feature completion or Feature Start. We 'simulate' contemporaneous ordering by sorting the Cucumber Messages by timestamp. Because of the way Reqnroll works we cannot publish messages contemporaneous but we wait until the test run is complete.

It seems the react component just uses the same ordering in the produced .ndjson file, I just compared the rendered html with that file.

So this should mean that you actually can control the ordering as you already wait for the whole run to complete?
If so you would probably make it configurable, i.e. opt in for full name ordering, or just default to it?
Most tests probably run in parallell anyway so the order would otherwise be randomized every run.

Another approach would be to raise this an feature request at cucumber/react-components to allow controlling ordering via the component, somehow.
That could be better for the overall ecosystem?

@clrudolphi
Copy link
Contributor Author

It seems the react component just uses the same ordering in the produced .ndjson file,

I'm not surprised. The assumption about ordering of the Messages is that they are emitted as a stream, that is, in time-series order of when the event occurred. So the list of Features is presumably ordered in the sequence in which the test engine undertook to execute the first test within each Feature. That is not guaranteed to be in alphabetical order or even the order in which Features appear in a project. That is up to the executing Test framework.

I like the idea of submitting a feature request to Cucumber/ReactComponents. I'm sure you're not the first to see this.

Updated FeatureExecutionTracker constructor to check for GherkinDocument instead of Pickles in the FeatureCucumberMessageInfo structure, which should be faster.
Refactored GenerateStaticMessages to use List.Add instead of yield return.

In ContainerBuilder.cs, made the Cucumber message publisher initialization conditional based on configuration settings.
Copy link
Contributor

@gasparnagy gasparnagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to merge. Any further changes can be done via additional PRs.
Thx @clrudolphi for the great work!

@gasparnagy gasparnagy merged commit c66ede3 into main Jul 28, 2025
6 checks passed
@304NotModified 304NotModified deleted the feature-CucumberMessages branch July 28, 2025 15:57
@gasparnagy gasparnagy restored the feature-CucumberMessages branch August 21, 2025 21:05
@gasparnagy gasparnagy deleted the feature-CucumberMessages branch August 22, 2025 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants