Vouch provides comprehensive metrics to check the health and performance of its activities. This document describes the metrics available for Prometheus and similar monitoring systems.
The metrics server listens on the address provided by the metrics.address configuration value, and makes metrics available at the /metrics endpoint.
There are a number of metrics that provide general information about Vouch. Specifically:
vouch_releasecontains the version of Vouch, in theversionlabelvouch_readyis set to1when Vouch is ready to start attesting, and0otherwise. If this number stays at 0 it implies a configuration or connection issue that should be addressedvouch_epochs_processed_totalis set to the number of epochs for which Vouch has been attesting. This number resets to 0 when Vouch restarts, and increments every time Vouch starts to process an epoch; if it fails to increment it implies that Vouch has stopped processingvouch_start_time_secsis the unix timestamp of the time that Vouch started. This value will remain the same throughout a run of Vouch; if it increments it implies that Vouch has restarted.
In addition, high level metrics track the latest slot for which Vouch carried out a successful operation:
vouch_attestation_process_latest_slotthe latest slot for which Vouch carried out an attestation. As long as Vouch has validators, this is expected to increment approximately every 6.5 minutes, although it is possible to have to wait up to 13 minutes due to the random assignment of slots to validatorsvouch_attestationaggregation_process_latest_slotthe latest slot for which Vouch carried out an attestation aggregation. This is a relatively infrequent occurrencevouch_beaconblockproposal_process_latest_slotthe latest slot for which Vouch carried out a proposal. This is a relatively infrequent occurrencevouch_synccommitteeaggregation_process_latest_slotthe latest slot for which Vouch carried out a sync committee aggregation. This is a very infrequent occurrencevouch_synccommitteemessage_process_latest_slotthe latest slot for which Vouch generated a sync committee message. This is a very infrequent occurrence
There are also counts for each process. The specific metrics are:
vouch_beaconblockproposal_process_requests_totalnumber of beacon block proposal processes;vouch_attestation_process_requests_totalnumber of attestation processes;vouch_beaconcommitteesubscription_process_requests_totalnumber of beacon committee subscription processes; andvouch_attestationaggregation_process_requests_totalnumber of attestation aggregation processes.
All of the metrics have the label "result" with the value either "succeeded" or "failed". Any increase in the latter values implies the validator is not completing all of its activities, and should be investigated.
Vouch keeps track of the number of accounts for which it is validating in the vouch_accountmanager_accounts_total metric. This metric has one label, state, which can take one of the following values:
unknownthe validator is not known to the Ethereum 2 networkpending_initializedthe validator is known to the Ethereum 2 network but not yet in the queue to be activatedpending_queuedthe validator is in the queue to be activatedactive_ongoingthe validator is activeactive_exitingthe validator is active but stopping its dutiesexited_unslashedthe validator has exited without being slashedexited_slashedthe validator has exited after being slashedwithdrawal_possiblethe validator's funds are applicable for withdrawal (although withdrawal is not possible in phase 0)
Vouch will attest for accounts that are either active_ongoing or active_exiting. Any increase in active_exiting should be matched with valid exit requests. Any increase in active_slashed suggests a problem with the validator setup that should be investigated as a matter of urgency.
Vouch uses marks to show the point in time within a slot at which it completes its various operations. The mark is made after the operation has submitted any results of its work to its beacon nodes, and so can be used to confirm that Vouch is acting in a timely fashion. Each mark is a histogram from 0 to 12 seconds, in 0.1 second increments. The marks are as follows:
vouch_attestation_mark_secondsis the time in the slot at which the attestation(s) for the slot have been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 6 second mark. Any significant number of these after the 7 second mark suggests that part of the validating infrastructure may be slow, and should be investigatedvouch_beaconblockproposal_mark_secondsis the time in the slot at which the block for the slot has been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 1 second mark. Any significant number of these after the 2 second mark suggests that part of the validating infrastructure may be slow, and should be investigatedvouch_synccommitteemessage_mark_secondsis the time in the slot at which the sync committee message(s) for the slot has been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 6 second mark. Any significant number of these after the 7 second mark suggests that part of the validating infrastructure may be slow, and should be investigatedvouch_attestationaggregation_mark_secondsis the time in the slot at which attestation aggregation message(s) for the slot has been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 9 second mark. Any significant number of these after the 10 second mark suggests that part of the validating infrastructure may be slow, and should be investigated. It would also be expected that all of these message would be after the 8 second mark. Any number of these before the 8 second mark suggests a problem with system time, and should be investigatedvouch_synccommitteeaggregation_mark_secondsis the time in the slot at which sync committee aggregation message(s) for the slot has been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 9 second mark. Any significant number of these after the 10 second mark suggests that part of the validating infrastructure may be slow, and should be investigated. It would also be expected that all of these message would be after the 8 second mark. Any number of these before the 8 second mark suggests a problem with system time, and should be investigated
Performance metrics provide a mechanism to understand how quickly Vouch is carrying out its activities. The following information is provided:
vouch_attestation_process_duration_secondstime taken to carry out the attestation processvouch_attestationaggregation_process_duration_secondstime taken to carry out the attestation aggregation processvouch_beaconblockproposal_process_duration_secondstime taken to carry out the beacon block proposal processvouch_beaconcommitteesubscription_process_duration_secondstime taken to carry out the beacon committee subscription processvouch_synccommitteeaggregation_process_duration_secondstime taken to carry out the sync committee aggregation processvouch_synccommitteemessage_process_duration_secondstime taken to carry out the sync committee message processvouch_synccommitteesubscription_process_duration_secondstime taken to carry out the sync committee subscription process
These metrics are provided as histograms, with buckets in increments of 0.1 seconds up to 2 seconds.
A major part of Vouch's work is in the strategy section, where it selects the appropriate data to sign. Data that combines the provider of the data along with the time taken to obtain and evaluate it contained in the vouch_strategy_operation_duration_seconds metric. This is a histogram with buckets in increments of 0.1 seconds up to 4 seconds. It has three labels:
strategyis the strategy for the operationprovideris the provider for the operationoperationis the operation that took place (e.g. "beacon block proposal")
Operations metrics provide information about Vouch's internal operations. These are generally lower-level information that can be useful to monitor activities for fine-tuning of server parameters, comparing one instance to another, etc.
Vouch's job scheduler provides a number of metrics. The specific metrics are:
vouch_scheduler_jobs_scheduled_totalnumber of jobs scheduled. This is expected to increment periodically throughout Vouch's runtimevouch_scheduler_jobs_cancelled_totalnumber of jobs cancelled. This increments when chain reorganizations occur, and pre-scheduled jobs are no longer validvouch_scheduler_jobs_started_totalnumber of jobs started. This has a labeltriggerwhich can be "timer" if the job ran due to reaching its designated start time or "signal" if the job ran due to being triggered before its designated start time
Each of the above metrics also has a class label which defines the general class of the job running. Possible values include:
Aggregate attestationsjobs relating to aggregating attestationsAggregate sync committee messagesjobs relating to aggregating sync committee messagesAttestjobs relating to attestingEpochjobs relating to operations run in preparation for or at the start of epochsGenerate sync committee messagesjobs relating to generating sync committee messagesPrepare for sync committee messagesjobs relating to preparation of sync committee message generationProposejobs relating to proposing blocksRefresh accountsjobs relating to updating internal account information
Client operations metrics provide information about the response time of beacon nodes, as well as if the request to them succeeded or failed. This can be used to understand how quickly and how well beacon nodes are responding to requests, for example if Vouch using multiple beacon nodes in different data centres this can be used to obtain data about their response times due to network latency.
vouch_client_operation_duration_seconds is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It has two labels:
proposeris the endpoint for the operationoperationis the operation that took place (e.g. "beacon block proposal")
There is also a companion metric vouch_client_operation_requests_total, which is a simple count of the number of operations that have taken place. It has three labels:
proposeris the endpoint for the operationoperationis the operation that took place (e.g. "beacon block proposal")resultis the result of the operation, either "succeeded" or "failed"
vouch_strategy_operation_used provides details of the outcome of strategies, where one piece of data is obtained from a number of providers. It has three labels:
operationis the operation that took place (e.g. "beacon block proposal")provideris the provider of the information selected by the strategystrategyis the strategy used to select the outcome
Network metrics provide information about the network from Vouch's point of view. Although these are not under Vouch's control, they have an impact on the performance of the validator. The specific metrics are:
vouch_block_receipt_delay_secondsthe delay between the start of a slot and the arrival of the block for that slot. This metric is provided as a histogram, with buckets in increments of 0.1 seconds up to 12 seconds. This has a labelepoch_slotwhich is the position of the slot in the epoch (0 through 31, inclusive)vouch_attestationaggregation_coverage_ratiothe ratio of the number of attestations included in the aggregate to the total number of attestations for the aggregate. This metric is provided as a histogram, with buckets in increments of 0.1 up to 1.vouch_synccommitteeaggregation_coverage_ratiothe ratio of the number of sync committee messages included in the aggregate to the total number of members of the sync committee for the aggregate. This metric is provided as a histogram, with buckets in increments of 0.1 up to 1.
Relay metrics provide information about the performance, both individually and comparatively, of the block relays configured for use.
vouch_relay_auction_block_duration_seconds is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It provides details of the total time taken for Vouch to obtain the best bid from competing relays. There is also a companion metric vouch_relay_auction_block_duration_seconds_count, which is a simple count of the number of operations that have taken place.
vouch_relay_auction_block_used_total provides the number of blocks used. It has two labels:
provideris the address of the relay used from which the winning bid comescategoryis the categorization of the builder from which the winning bid comes. This is free-form text, and supplied by the user in the builder confguration (defaults to "standard" if no category is supplied)
vouch_relay_builder_bid_delta_meth_bucket is provided as a histogram, with buckets in increments of 10 milliEther up to 1 Ether. It provides details of the difference in value between the winning bid and the bid from the given provider. It has a single label:
provideris the address of the relay used from which a losing bid comes
There is also a companion metric vouch_relay_auction_block_duration_seconds_count, which is a simple count of the number of operations that have taken place.
vouch_relay_builder_bid_duration_seconds_bucket is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It provides details of the total time taken for Vouch to serve builder bid requests from beacon nodes. There is also a companion metric vouch_relay_builder_bid_duration_seconds_count, which is a simple count of the number of operations that have taken place.
vouch_relay_execution_config_duration_seconds_bucket is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It provides details of the total time taken for Vouch to obtain the execution configuration from the local or remote source. There is also a companion metric vouch_relay_execution_config_duration_seconds_count, which is a simple count of the number of operations that have taken place.
vouch_relay_validator_registrations_duration_seconds_bucket is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It provides details of the total time taken for Vouch to serve validator registration requests from beacon nodes. There is also a companion metric vouch_relay_validator_registrations_duration_seconds_count, which is a simple count of the number of operations that have taken place.
Sync Committee Verification metrics can be enabled using the controller.verify-sync-committee-inclusion flag in the configuration. This gives more insight in to the participation of Sync Committee duties:
vouch_synccommitteeverification_current_assignedis a gauge that is set to the current number of vouch validators that are participating in Sync Committee duty.vouch_synccommitteeverification_mismatches_totalis a counter that increments for each Sync Committee participating validator when vouch receives a head event where the parent block root does not match the root vouch broadcast in the Sync Committee messages.vouch_synccommitteeverification_found_totalis a counter that increments for each vouch validator that has been included in the SyncAggregate. This is not incremented if we already detected a root mismatch or if we didn't record the Sync Committee head (expected after a restart)vouch_synccommitteeverification_missing_totalis a counter that increments for each vouch validator that has NOT been included in the SyncAggregate. This is not incremented if we already detected a root mismatch or if we didn't record the Sync Committee head (expected after a restart)vouch_synccommitteeverification_get_head_failures_totalis a counter that increments if we fail to get the head block to verify validator inclusion in the SyncAggregate. This is incremented once per slot we fail.
Note: It is expected that the sum of (found_total + missing_total + mismatches_total) == number of Sync Committee participating validators for a given slot.