-
Notifications
You must be signed in to change notification settings - Fork 438
Description
BDK currently depends heavily on a SparseChain to tell it whether transactions are in the chain or not. This proposal is to redesign it around a more generic ChainOracle trait which can tell you whether a (height, block_hash) is in the chain or not and use this information to decide whether a transaction is in the chain. We can do this because for each transaction we can always know a block header and hash that implies that it is in the chain.
A SparseChain can also fill the role of a ChainOracle but this change would allow us use other sources of chain data to fulfil the need. For example with our pending nakamoto cbf implementation we could use the on-chain block headers as the ChainOracle.
This proposal is to remove SparseChain as a core mechanism and instead pass in a impl ChainOracle into methods that need to know about the current state of the chain.
trait ChainOracle {
type Error;
fn is_in_best_chain(&self, block_height: u32, block_hash: BlockHash) -> Result<bool, Self::Error>;
}Removing transactions from SparseChain
I think we should remove transactions from the sparse chain. Without needing to "re-org" the chain there is no real need to have transaction location so closely coupled to the blocks.
Instead of storing the "position" of a transaction within SparseChain we could easily store it in TxGraph in a monotone way. Imagine a field like this in TxGraph:
positions: HashSet<(Txid, (u32, BlockHash))>
Then when you want to figure out if txid is in the chain you can query the heights and hashes that it has been seen in and see if any of those blocks are in the canonical chain.
Note that we don't need this position to be the blockhash the transaction is in. We could call this the "anchor block" -- if it's in the chain then this txid is in the chain. This means we don't need to make additional queries to electrum/esplora to figure out the block hash that it's actually in.
So maybe it could be more like this:
positions: HashSet<(Txid, A)>where A does the Anchor trait which has one method to return the block hash and height whose presence implies the existence -- it may be the height the transaction is actually at or it may be simple the tip that implies its existence. This allows different applications to store different data about their transaction's positions.
Indexing the chain data
Now that the chain is out of the way and outside of the transaction system we can put the indexing of transactions next to the graph.
pub struct IndexedTxGraph<I> {
graph: TxGraph<T>,
indexer: I
}
// Where I does a trait like this
pub trait TxIndexer {
// index a whole tx
fn index_tx(&mut self, tx: &Transaction);
fn index_txout(&mut self, outpoint: OutPoint, txout: &TxOut);
}Getting a UTXO set
How do we get a a UTXO set from a set of TxOuts (and outpoints) owned by the wallet, a transaction graph and information about the blocks in the longest chain. The first thing to note is that we can built a predicate is_in_chain(chain: &_, tx_graph: &TxGraph, txid) from the information about the longest chain. We simply iterate through all positions of the txid (stored in TxGraph) and return true if one of
those blocks exist in the longest chain. Keep in mind that the longest chain can be modelled a SparseChain or could be a header chain etc.
The naive algorithm
Given is_in_chain we can easily create a predicate is_unspent for any OutPoint. With is_unspent we can just call it on every OutPoint in that we own to find a utxo set.
fn is_unspent(chain, tx_graph, txout) {
if !is_in_chain(chain, tx_graph, txid_of_txout) {
return false;
}
for spending_txid in tx_graph.spends(outpoint_of_txout) {
if is_in_chain(chain, tx_graph, spending_txid) {
return false;
}
}
true
}Note that this algorithm has complexity of O(T) where T is the number of txouts you've ever owned. It also makes multiple calls to is_in_chain per outpoint. This might be slow especially if you have to do disk access or query an external service. To make it a bit faster you could cache the result of is_in_chain and apply positive results to each of the transaction's parents (a child being in chain implies a parent).
Optimized algorithms for later development
There should be faster algorithms for discovering this that don't require iterating over the entire txout set. For example, the TxGraph could index the set of UTXOs. From there you can visit each one and check if the tx containing it is in the chain. If so, add it to the UTXO set, if not add the parent outputs of the transaction to the candidates list.
How to handle unconfirmed transactions
We can model unconfirmed transactions as those with no anchors to blocks that are in the chain. We of course need to distinguish between those unconfirmed transactions that are invalid because they double spend (or their parents double spend) a tx that is in the chain and those that could be in the mempool.
To the naive algorithm above, when you encounter a output that is not in the chain you can add it to "unconfirmed" list and then from the unconfirmed list decide which of these outputs can exist in the mempool by resolving conflicts in the graph.
Work plan
While writing this document I discovered how I felt about priorities:
- Create a type that incorporates a
TxGraphbut also has the "anchor" positions. It could just be done inTxGraphitself (e.g.TxGraph<Anchor>). - Create a
ChainOracletrait which has a single method to return whether a tx is confirmed given a (block hash, height). - Add
*_in_chainfunctions that exist onChainGraphtoTxGraphbut take a something implementing theChainOracletrait. - Create a type that has a
TxGraphand something implementing theTxIndexertrait.
Every time a tx is inserted into this type it will add it to the underlying graph and theTxIndexer(e.g.IndexedTxGraph<Anchor, Indexer>). ImplementTxIndexerforKeychainTxOutIndexandSpkTxOutIndex. - Move the functionality from
KeychainTrackerto an impl ofIndexedTxGraph<Anchor, KeychainTxOutIndex>. wd
For things that need chain data likefull_utxosyou'll need to pass in aChainOracle. - Create a new type that is
SparseChainwithout the transactions (SparseChain2). ImplementChainOracleforSparseChain2. - Make esplora and electrum return a
TxGraph<A>and aSparseChain2as their update - Design
bdk's Wallet around this - Remove
SparseChain(or rather renameSparseChain2toSparseChain),ChainGraphandKeychainTracker
Downsides
The downsides of this idea are:
-
It's harder to use the information about what's in the mempool from esplora and electrum. Right now we are able to evict transactions that conflict with things that we find in the mempool from the
SparseChain. With this change we don't really have a way to use this information so which of the conflicting unconfirmed txs you choose as canonical will be somewhat arbitrary. To fix this we could consider having the "last-time-seen" in mempool as part of the TxGraph position for that. This can be monotone so whenever you see it again the in the mempool you replace the existing value if it's larger. Then when you are trying to figure out who's in the mempool and who isn't (from your limited view) you can choose the one that was seen most recently between two conflicts. -
It's hard to see how this improves
Wallet. This gives much greater flexibility to users ofbdk_chainbut I'm not sure if we'll be able to realize this improvements for users ofWallet. To do so we'd have to not model the canonical chain inside wallet and leave it to the user to provide aimpl ChainOracle. The biggest problem here is that if the user was using aSparseChainthey'd have to persist changes to it separately which muddies the "batteries included" design goal ofWallet. -
Some
ChainOracle's will have to returnResult<bool, _>rather than justboolfor whether the block hash and height are in the chain. Some of them might even want to be async! So we're kind of back with offering multiple method for each method that needs to know what's in chain e.g.balance,try_balanceandtry_balance_async!
Metadata
Metadata
Assignees
Labels
Type
Projects
Status