Skip to content

Conversation

@IvanAnishchuk
Copy link

@IvanAnishchuk IvanAnishchuk commented Nov 23, 2025

This PR provides a major part of a new testing framework - spec wrapper with automatic test execution tracing and smart beacon state tracking. It's based on #4603 and lessons learned in #4724 were taken into account.

This is fully backwards-compatible thing, new test/infra module was added, no preexisting code changed.

Key Changes:

Smart State Tracking: The RecordingSpec now automatically detects state context switches by tracking the root hash of the state argument. It prepends a load_state operation only when the state has actually changed (including out-of-band mutations in tests), removing the need for manual yields and the like.

Pydantic Serialization: Leverages Pydantic field_validator and model_dump to automatically handle type coercion (e.g., converting bytes to hex strings, int subclasses to primitives) and sanitation.

Lean Proxy: Reduced RecordingSpec (based on wrapt) to a thin wrapper that strictly handles function interception and flow control.

Testing: Added unit tests for the new tools and example spec tests based on the new infra. All tests are passing, lint was done.

Fixes #4603
Related to #4724

@IvanAnishchuk
Copy link
Author

assert_state operation is probably one of the bigger items not implemented here yet, although it's not extremely hard to write a helper for that, just wasn't sure


# Classes that should be treated as tracked SSZ objects in the trace.
# Maps class name -> context collection name.
CLASS_NAME_MAP: dict[str, str] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

View - got it, will change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are int subclasses and the like that are subclassing View and technically can be SSZ'd but probably should be stored directly in the trace for simplicity and compactness (Slot is one example I saw in tests). Any suggestion how to handle those?


class ContextObjectsModel(BaseModel):
"""
Defines the SSZ objects (artifacts) loaded in the 'context' block.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping should be by hash_root. All View have this method. No need to make the type part of the filename. No need to keep the mapping. Getting the hash_root will give you the filename

"""

metadata: dict[str, Any] = Field(..., description="Test run metadata (fork, preset, etc.)")
context: ContextModel = Field(default_factory=ContextModel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what is this for. It is not in the YAML example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover complexity (there was originally a way to customize artifact names so a mapping to keep track of them was necessary). Probably not required if we match everything by hash, will remove.

_artifacts: dict[str, Container] = PrivateAttr(default_factory=dict)

def register_object(self, obj: Any) -> ContextVar | None:
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is need to register. The obj.hash_tree_root().hex() should be the name. Also, just store the filename. Perhaps, in the model we can store a type of object that contains the filename. Like SSZSerialised(filename)

Copy link
Author

@IvanAnishchuk IvanAnishchuk Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense, having three separate ways to spell the hash is unnecessary. Will simplify.


return context_name

def dump_to_dir(self, output_dir: str, config: dict[str, Any] = None) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the normal pydantic way of saving to have this as a method of the model, or having externally in another function? We should do it the pydantic way

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check... I'm more used to seeing it as a method but let's do idiomatic 👍

print(f"ERROR: Failed to write YAML {path}: {e}")


class ConfigModel(BaseModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used. Why is it here?

config: dict[str, Any] = Field(..., description="Dictionary of config constants")


class MetaModel(BaseModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used. Why is it here?


def decorator(fn: Callable):
@functools.wraps(fn)
def wrapper(*args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too long, are you sure we need to do all this? I think we just need to wrap spec. If we need it move some things to other functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's simplify further. On it 👷‍♂️ Thank you for feedback, this helps a lot!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what is happening, you are trying to work out details about the tests to decide the name of where to store it.

The thing is that the decorator is not the best place for this. It is the test runner, so it is better if the decorator just returns the trace as an object and in the runner we do the saving to file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ookay, I think I understand... Should the result be returned in format compatible with default Dumper (generator of triplet tuples) or should this come with a custom runner/dumper that supports unwrapping pydantic objects?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not compatible with the dumper. I think we need to return the pydantic model. Then make sure that parts that expect the yields up the calling stack also can let this type pass through. Then in this function: https://github.com/ethereum/consensus-specs/blob/master/tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py#L87 we handle differently if it is returned a iterator/generator or the pydantic mode. Then there if there is a pydantic model we dump it. In this function test_case contains all the meta info about the test we are running, so that way you can calculate the folder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

So far I have made it functional with the decorator just yielding things (it's in a separate branch for now, works but looks a little weird), it should be trivial enough to modify that function to detect return type instead and make the decorator just return the instance.

I also addressed all the other points (or almost, still rechecking) and aligned format details, etc. with the description in the original issue as well as I could. Should be ready for another review soon.

params: dict[str, Any] = Field(
default_factory=dict, description="Arguments passed to the function"
)
result: Any | None = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this fields doesn't match the issue, like this, it is assert_output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, method is missing. I think we need a structure with an abstract base class. Where op defines the subclass

Represents a function call ('op'), its inputs, and its outcome.
"""

op: str = Field(..., description="The spec function name, e.g., 'process_slots'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, op is not this

pydantic models for the spec trace
core spec tracing logic
use wrapt to wrap the spec and intercept the calls
tracing decorator
some basic unit tests for the trace recorder
some converted test examples
use 0x prefix for hex bytes in trace
a README with a short explanation how tracing works
add "method" to StepModel for spec_call op
remove unneeded things
address some more requirements, format, etc.
new approach - decorator just generates data for dumper
add the auto-assert of state in the end of test trace
adjust assert/load tracing logic according to the issue
rename record_spec_trace -> spec_trace
test fixes
more simplicity
some cleanup
this still uses generator functions but adds a new data type
functional but probably could be implemented better
@cesareduardogarciaportillo-lang

https://discord.gg/the-arenaUna disculpa espero no se mal intérprete como una grosería de mi parte, pero no comprendo ni siquiera un 40% otro idioma, y menos temas tan específicos como programación y sistemas, sinceramente les admiro su labor como programadores, es algo que yo reconozco no se hacer, pero suelo tener en ocasiones buenas ideas, y me gusta aferrarme por el bienestar del equipo a que se hagan de la mejor manera posible, si les interesa pueden checar - [ ] este grupo

@IvanAnishchuk IvanAnishchuk requested a review from leolara December 5, 2025 23:08
@IvanAnishchuk
Copy link
Author

@leolara if you could just take another glance? :)

I'm not sure about the best way to integrate object-returning tests into the same harness as generator tests. I did something and it works with reftests and has minimal impact on existing code but it doesn't look quite right, yet I don't want to rewrite half the test framework for supporting this either, at least not without some guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spec test sequence trace, a general test vector format for the Ethereum consensus layer

3 participants