Skip to content

Conversation

@procdump
Copy link

@procdump procdump commented Nov 21, 2025

Problem

With solana-sbpf 0.13.1 it's possible to collect register tracing data in a consistent way. Yet a testing harness like mollusk does not provide a default mechanism to collect this data in a useful format for later ingestion.
The idea of this PR is to provide such mechanism in a way consumer tools could take advantage of - particularly willing to analyze the tracing data and generate statistics on top of it.

Solution

The PR takes advantage of the invocation-inspect-callback feature where a callback could be called prior to or after the processed instruction. A default post instruction callback for storing the register tracing data in files is provided and it basically persists the register sets, sbpf instructions and an identification sha256 hash of the executable that was actually used for the generation of this data. The motivation behind providing the sha256 identification is that files could grow in number and it's necessary for the consumer to be able to distinguish which shared object to use when analyzing the tracing data in a deterministic way. Also in order to let consumers instantly take advantage of this feature it's written in a way that register tracing gets enabled as long as the SBF_TRACE_DIR is provided denoting the directory in which the tracing data will be persisted. The motivation behind this approach is that it will let users instantly get out the tracing data of their tests which can eventually be passed to statistics generators such as code coverage tools, disassemblers, etc.

Review

Before getting further I'd be happy to get some feedback if you find this approach feasible or have a better idea how it may be done.

Copy link
Collaborator

@buffalojoec buffalojoec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this is still in draft, so just a few fly-by things I noticed!

This is looking really good, though. Make sure to add some tests and a README section for the new feature!

#[cfg(feature = "fuzz-fd")]
pub slot: u64,

pub enable_register_tracing: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is always going to be false when the feature is disabled, I think it should be private, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it private prevents such manual instantiations where we get a compiler complaint that the enable_register_tracing is private:

..Default::default()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah crud. What about just using the environment variable everywhere for the logic gate? The problem is, if someone flips this to true on Mollusk directly, but doesn't provide the SBF_TRACE_DIR, it's going to disable tracing anyway. This would be confusing for people.

Alternatively, we could choose a default dir under target if we really want this to be public. However, doing something like this:

let mut mollusk = Mollusk::default();
mollusk.enable_register_tracing = true;
...

...would still not add the callback and not activate tracing, as per the current implementation.

Copy link
Author

@procdump procdump Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah crud. What about just using the environment variable everywhere for the logic gate? The problem is, if someone flips this to true on Mollusk directly, but doesn't provide the SBF_TRACE_DIR, it's going to disable tracing anyway. This would be confusing for people.

You're right - it would be confusing. There's another case that also concerns me - it's related to the environment variable only approach. As I'm going to write some tests for this feature it's that I would need to set the envvar from within the test itself. Setting envvars is unsafe as tests are run in the same process but in different threads. Hence having multiple tests playing with the SBF_TRACE_DIR var is UB. This is the reason I added with_register_tracing in the first place - just having a constructor that has the ability to instantiate mollusk with register tracing but without the need to play with the envvar.

Alternatively, we could choose a default dir under target if we really want this to be public. However, doing something like this:

I think it's probably a good idea to have a default path.

let mut mollusk = Mollusk::default();
mollusk.enable_register_tracing = true;
...

...would still not add the callback and not activate tracing, as per the current implementation.

I think I came up with some sort of a possible solution - I can bury the enable_register_tracing bool in the callback structure - thus nobody would be able to set it without actually providing a handler - I'll try to rework it to see the final outcome and will show it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting envvars is unsafe as tests are run in the same process but in different threads. Hence having multiple tests playing with the SBF_TRACE_DIR var is UB.

Why wouldn't the SBF_TRACE_DIR variable just be readonly across all threads?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I came up with some sort of a possible solution - I can bury the enable_register_tracing bool in the callback structure - thus nobody would be able to set it without actually providing a handler - I'll try to rework it to see the final outcome and will show it here.

Cool! Ping me when you throw it up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the changes. I've also added one test implementing a custom register tracing callback counting the number of JMP class instructions executed. Tests help me reason about the API and its consumption later. On one side it's now that you can tweak existing codebases with the SBF_TRACE_DIR and on the other side it's possible to enable register tracing without the need to tweak the same environment variable at all - hence the with_register_tracing constructor.

What I don't like up to now is that you can instantiate Mollusk::default() and change the callback expecting this will enable register tracing. Unfortunately it won't because it's the ProgramCache that also needs to have register tracing tweaked. Yet doesn't such post-fact changes also not work for the rest of the Mollusk fields where there's a cross dependency between some of them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting envvars is unsafe as tests are run in the same process but in different threads. Hence having multiple tests playing with the SBF_TRACE_DIR var is UB.

Why wouldn't the SBF_TRACE_DIR variable just be readonly across all threads?

It could be but if we need to enable tracing for one test and skip it for another the envvar is not very convenient IMO.

Comment on lines 686 to 705
/// Create a new Mollusk instance just like the `new` method but
/// with register tracing enabled using a default callback.
#[cfg(feature = "register-tracing")]
pub fn with_register_tracing(program_id: &Pubkey, program_name: &str) -> Self {
let mut mollusk = Mollusk {
enable_register_tracing: true,
..Self::default()
};
let program_cache = ProgramCache::new(
&mollusk.feature_set,
&mollusk.compute_budget,
mollusk.enable_register_tracing,
);
mollusk.program_cache = program_cache;
mollusk.invocation_inspect_callback =
Box::new(register_tracing::DefaultRegisterTracingCallback {});
mollusk.add_program(program_id, program_name, &DEFAULT_LOADER_KEY);
mollusk
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this new constructor? The existing Mollusk::new will call into Self::default, and the logic gate where you check for the SBF_TRACE_DIR environment variable will be evaluated. So people can just use Mollusk::new!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this new constructor? The existing Mollusk::new will call into Self::default, and the logic gate where you check for the SBF_TRACE_DIR environment variable will be evaluated. So people can just use Mollusk::new!

I'd be glad to have your review up to now and if you prefer I'll rework it to have everything gated behind the environment variable - I have in mind something like this:

diff --git a/harness/src/lib.rs b/harness/src/lib.rs
index 815ec07..b89830f 100644
--- a/harness/src/lib.rs
+++ b/harness/src/lib.rs
@@ -689,32 +689,6 @@ impl Mollusk {
         mollusk
     }
 
-    /// Create a new Mollusk instance just like the `new` method but
-    /// with register tracing enabled using a default callback.
-    ///
-    /// If `SBF_TRACE_DIR` is set it will override the passed `sbf_trace_dir`.
-    #[cfg(feature = "register-tracing")]
-    pub fn with_register_tracing(
-        program_id: &Pubkey,
-        program_name: &str,
-        sbf_trace_dir: &str,
-    ) -> Self {
-        let mut mollusk = Mollusk::default();
-        mollusk.invocation_inspect_callback =
-            Box::new(register_tracing::DefaultRegisterTracingCallback {
-                sbf_trace_dir: std::env::var("SBF_TRACE_DIR").unwrap_or(sbf_trace_dir.into()),
-            });
-        mollusk.program_cache = ProgramCache::new(
-            &mollusk.feature_set,
-            &mollusk.compute_budget,
-            mollusk
-                .invocation_inspect_callback
-                .is_register_tracing_callback(),
-        );
-        mollusk.add_program(program_id, program_name, &DEFAULT_LOADER_KEY);
-        mollusk
-    }
-
     /// Add a program to the test environment.
     ///
     /// If you intend to CPI to a program, this is likely what you want to use.
diff --git a/harness/src/register_tracing.rs b/harness/src/register_tracing.rs
index ddb3cc0..7c18b1e 100644
--- a/harness/src/register_tracing.rs
+++ b/harness/src/register_tracing.rs
@@ -85,7 +85,7 @@ impl InvocationInspectCallback for DefaultRegisterTracingCallback {
 
     // This callback is specifically implemented to handle register tracing.
     fn is_register_tracing_callback(&self) -> bool {
-        true
+        std::env::var("SBF_TRACE_DIR").is_ok()
     }
 }
 
diff --git a/harness/tests/register_tracing.rs b/harness/tests/register_tracing.rs
index ba9e85c..0c30d15 100644
--- a/harness/tests/register_tracing.rs
+++ b/harness/tests/register_tracing.rs
@@ -87,11 +87,11 @@ fn test_custom_register_tracing_callback() {
     }
 
     std::env::set_var("SBF_OUT_DIR", "../target/deploy");
+    std::env::set_var("SBF_TRACE_DIR", "/tmp/sbf_trace_dir");
 
     let program_id = Pubkey::new_unique();
     let payer_pk = Pubkey::new_unique();
-    let mut mollusk =
-        Mollusk::with_register_tracing(&program_id, "test_program_primary", "sbf_trace_dir");
+    let mut mollusk = Mollusk::new(&program_id, "test_program_primary");
 
     // Phase 1 - basic register tracing test.
 

procdump and others added 4 commits November 24, 2025 09:09
Bury the logic in the callback structure - if there's a register tracing
callback then have the functionality enabled. With the current
with_register_tracing constructor it's now possible
to switch on register tracing without playing with envvars. This is
shown in the provided test in this commit.
Tweaking the SBF_TRACE_DIR envvar, however, is useful for existing
codebases where Mollusk users are willing to enable register tracing
instantly without making any changes to their code.
@buffalojoec
Copy link
Collaborator

@procdump I see why this is such a pain. We have to coordinate initial cache compilation as well as per-invocation tracing, all while giving users the ability to customize behavior.

I threw a few commits up to see what you think. Basically I've implemented the following:

  • Register tracing is always enabled for programs being added to the cache if the register-tracing feature is enabled. This is fine, since if you trace the flag all the way through Executable::load it's actually unused for ELF loading. It's only used for the VM.
  • Since the above is true, users can update the callback using the field directly and not have any issue, since it will be called on-the-fly during process_instruction. Thus I've eliminated some of the setup steps involved you had in Mollusk:default.
  • Added your enable_register_tracing field back to Mollusk, for disabling the trace during instruction invocation.

With those changes, we can lose the extra constructor and simplify a few things.

Considering what I said in bullet 1, we actually could just remove it from add_program completely. What do you think?

1. fixes a comment to be above the statement it pertains to
2. removes with_register_tracing
3. changes the register_tracing test to use the enable_register_tracing
variable for switching on/off instead of playing with the callback.
Comment on lines +147 to +148
mollusk.enable_register_tracing = false;

Copy link
Author

@procdump procdump Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added your enable_register_tracing field back to Mollusk, for disabling the trace during instruction invocation.

There's a subtle discrepancy here. I've reworked the test so that I can show you what I mean and eventually you can debug it yourself.
Let's say you want to set enable_register_tracing to false and you'd expect that this will disable the trace during instruction invocation because in process_instruction_inner the false value is passed through InvokeContext:

            let _enable_register_tracing = false;
            #[cfg(feature = "register-tracing")]
            let _enable_register_tracing = self.enable_register_tracing;

            let program_runtime_environments: ProgramRuntimeEnvironments =
                ProgramRuntimeEnvironments {
                    program_runtime_v1: Arc::new(
                        create_program_runtime_environment_v1(
                            &runtime_features,
                            &execution_budget,
                            /* reject_deployment_of_broken_elfs */ false,
                            /* debugging_features */ _enable_register_tracing,
                        )
                        .unwrap(),
                    ),
                    program_runtime_v2: Arc::new(create_program_runtime_environment_v2(
                        &execution_budget,
                        /* debugging_features */ _enable_register_tracing,
                    )),
                };

            let mut invoke_context = InvokeContext::new(
                &mut transaction_context,
                &mut program_cache,
                EnvironmentConfig::new(
                    Hash::default(),
                    /* blockhash_lamports_per_signature */ 5000, // The default value
                    &callback,
                    &runtime_features,
                    &program_runtime_environments,
                    &program_runtime_environments,
                    &sysvar_cache,
                ),
                self.logger.clone(),
                self.compute_budget.to_budget(),
                self.compute_budget.to_cost(),
            );

However register tracing still remains enabled surprisingly. The reason for this goes deep into the runtime - particularly in the execution of the instruction within the VM (let's look in the case of being into interpreted mode):
https://github.com/anza-xyz/sbpf/blob/c0c1197edbd1379a2f1ae86140fec30ec46b08e9/src/interpreter.rs#L176
https://github.com/anza-xyz/sbpf/blob/c0c1197edbd1379a2f1ae86140fec30ec46b08e9/src/interpreter.rs#L190
Basically it reads enable_register_tracing from the Executable instead of InvokeContext's ProgramRuntimeEnvironments.

Due to the circumstances upper the register tracing unit test fails now. Isn't it more appropriate to read the flag from InvokeContext's EnvironmentConfig's program_runtime_v2's Config?
With such a change we'd be able to switch on/off tracing as you'd suggested.
Otherwise it's confusing for users that if they tweak the flag it won't influence the tracing.

NB: I've pushed the amended test on purpose so that you see that it fails.

Considering what I said in bullet 1, we actually could just remove it from add_program completely. What do you think?

Doesn't this suggestion depend on fixing the issue explained upper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants