Skip to content

Redesigning bdk_chain around a "ChainOracle" #895

@LLFourn

Description

@LLFourn

BDK currently depends heavily on a SparseChain to tell it whether transactions are in the chain or not. This proposal is to redesign it around a more generic ChainOracle trait which can tell you whether a (height, block_hash) is in the chain or not and use this information to decide whether a transaction is in the chain. We can do this because for each transaction we can always know a block header and hash that implies that it is in the chain.

A SparseChain can also fill the role of a ChainOracle but this change would allow us use other sources of chain data to fulfil the need. For example with our pending nakamoto cbf implementation we could use the on-chain block headers as the ChainOracle.

This proposal is to remove SparseChain as a core mechanism and instead pass in a impl ChainOracle into methods that need to know about the current state of the chain.

trait ChainOracle {
    type Error;
    fn is_in_best_chain(&self, block_height: u32, block_hash: BlockHash) -> Result<bool, Self::Error>;
}

Removing transactions from SparseChain

I think we should remove transactions from the sparse chain. Without needing to "re-org" the chain there is no real need to have transaction location so closely coupled to the blocks.

Instead of storing the "position" of a transaction within SparseChain we could easily store it in TxGraph in a monotone way. Imagine a field like this in TxGraph:

positions: HashSet<(Txid, (u32, BlockHash))>

Then when you want to figure out if txid is in the chain you can query the heights and hashes that it has been seen in and see if any of those blocks are in the canonical chain.

Note that we don't need this position to be the blockhash the transaction is in. We could call this the "anchor block" -- if it's in the chain then this txid is in the chain. This means we don't need to make additional queries to electrum/esplora to figure out the block hash that it's actually in.

So maybe it could be more like this:

positions: HashSet<(Txid, A)>

where A does the Anchor trait which has one method to return the block hash and height whose presence implies the existence -- it may be the height the transaction is actually at or it may be simple the tip that implies its existence. This allows different applications to store different data about their transaction's positions.

Indexing the chain data

Now that the chain is out of the way and outside of the transaction system we can put the indexing of transactions next to the graph.

pub struct IndexedTxGraph<I> {
    graph: TxGraph<T>,
    indexer: I
}

// Where I does a trait like this
pub trait TxIndexer {
    // index a whole tx
    fn index_tx(&mut self, tx: &Transaction);
    fn index_txout(&mut self, outpoint: OutPoint, txout: &TxOut);
}

Getting a UTXO set

How do we get a a UTXO set from a set of TxOuts (and outpoints) owned by the wallet, a transaction graph and information about the blocks in the longest chain. The first thing to note is that we can built a predicate is_in_chain(chain: &_, tx_graph: &TxGraph, txid) from the information about the longest chain. We simply iterate through all positions of the txid (stored in TxGraph) and return true if one of
those blocks exist in the longest chain. Keep in mind that the longest chain can be modelled a SparseChain or could be a header chain etc.

The naive algorithm

Given is_in_chain we can easily create a predicate is_unspent for any OutPoint. With is_unspent we can just call it on every OutPoint in that we own to find a utxo set.

fn is_unspent(chain, tx_graph, txout) {
     if !is_in_chain(chain, tx_graph, txid_of_txout) {
           return false;
     }
     for spending_txid in tx_graph.spends(outpoint_of_txout) {
         if is_in_chain(chain, tx_graph, spending_txid) {
             return false;
         }
     }
    true
}

Note that this algorithm has complexity of O(T) where T is the number of txouts you've ever owned. It also makes multiple calls to is_in_chain per outpoint. This might be slow especially if you have to do disk access or query an external service. To make it a bit faster you could cache the result of is_in_chain and apply positive results to each of the transaction's parents (a child being in chain implies a parent).

Optimized algorithms for later development

There should be faster algorithms for discovering this that don't require iterating over the entire txout set. For example, the TxGraph could index the set of UTXOs. From there you can visit each one and check if the tx containing it is in the chain. If so, add it to the UTXO set, if not add the parent outputs of the transaction to the candidates list.

How to handle unconfirmed transactions

We can model unconfirmed transactions as those with no anchors to blocks that are in the chain. We of course need to distinguish between those unconfirmed transactions that are invalid because they double spend (or their parents double spend) a tx that is in the chain and those that could be in the mempool.

To the naive algorithm above, when you encounter a output that is not in the chain you can add it to "unconfirmed" list and then from the unconfirmed list decide which of these outputs can exist in the mempool by resolving conflicts in the graph.

Work plan

While writing this document I discovered how I felt about priorities:

  1. Create a type that incorporates a TxGraph but also has the "anchor" positions. It could just be done in TxGraph itself (e.g. TxGraph<Anchor>).
  2. Create a ChainOracle trait which has a single method to return whether a tx is confirmed given a (block hash, height).
  3. Add *_in_chain functions that exist on ChainGraph to TxGraph but take a something implementing the ChainOracle trait.
  4. Create a type that has a TxGraph and something implementing the TxIndexer trait.
    Every time a tx is inserted into this type it will add it to the underlying graph and the TxIndexer (e.g. IndexedTxGraph<Anchor, Indexer>). Implement TxIndexer for KeychainTxOutIndex and SpkTxOutIndex.
  5. Move the functionality from KeychainTracker to an impl of IndexedTxGraph<Anchor, KeychainTxOutIndex>. wd
    For things that need chain data like full_utxos you'll need to pass in a ChainOracle.
  6. Create a new type that is SparseChain without the transactions (SparseChain2). Implement ChainOracle for SparseChain2.
  7. Make esplora and electrum return a TxGraph<A> and a SparseChain2 as their update
  8. Design bdk's Wallet around this
  9. Remove SparseChain (or rather rename SparseChain2 to SparseChain), ChainGraph and KeychainTracker

Downsides

The downsides of this idea are:

  • It's harder to use the information about what's in the mempool from esplora and electrum. Right now we are able to evict transactions that conflict with things that we find in the mempool from the SparseChain. With this change we don't really have a way to use this information so which of the conflicting unconfirmed txs you choose as canonical will be somewhat arbitrary. To fix this we could consider having the "last-time-seen" in mempool as part of the TxGraph position for that. This can be monotone so whenever you see it again the in the mempool you replace the existing value if it's larger. Then when you are trying to figure out who's in the mempool and who isn't (from your limited view) you can choose the one that was seen most recently between two conflicts.

  • It's hard to see how this improves Wallet. This gives much greater flexibility to users of bdk_chain but I'm not sure if we'll be able to realize this improvements for users of Wallet. To do so we'd have to not model the canonical chain inside wallet and leave it to the user to provide a impl ChainOracle. The biggest problem here is that if the user was using a SparseChain they'd have to persist changes to it separately which muddies the "batteries included" design goal of Wallet .

  • Some ChainOracle's will have to return Result<bool, _> rather than just bool for whether the block hash and height are in the chain. Some of them might even want to be async! So we're kind of back with offering multiple method for each method that needs to know what's in chain e.g. balance, try_balance and try_balance_async!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions