Redesigning bdk_chain around a "ChainOracle"

BDK currently depends heavily on a `SparseChain` to tell it whether transactions are in the chain or not. This proposal is to redesign it around a more generic `ChainOracle` trait which can tell you whether a `(height, block_hash)` is in the chain or not and use this information to decide whether a transaction is in the chain. We can do this because for each transaction we can always know a block header and hash that implies that it is in the chain.

A `SparseChain` can also fill the role of a `ChainOracle` but this change would allow us use other sources of chain data to fulfil the need. For example with our pending `nakamoto` cbf implementation we could use the on-chain block headers as the `ChainOracle`.

This proposal is to remove `SparseChain` as a core mechanism and instead pass in a `impl ChainOracle` into methods that need to know about the current state of the chain.

```rust
trait ChainOracle {
    type Error;
    fn is_in_best_chain(&self, block_height: u32, block_hash: BlockHash) -> Result<bool, Self::Error>;
}
```

## Removing transactions from `SparseChain`

I think we should remove transactions from the sparse chain. Without needing to "re-org" the chain there is no real need to have transaction location so closely coupled to the blocks.

Instead of storing the "position" of a transaction within `SparseChain` we could easily store it in `TxGraph` in a monotone way. Imagine a field like this in `TxGraph`:

```
positions: HashSet<(Txid, (u32, BlockHash))>
```

Then when you want to figure out if `txid` is in the chain you can query the heights and hashes that it has been seen in and see if any of those blocks are in the canonical chain.

Note that we don't need this position to be the blockhash the transaction is in. We could call this the "anchor block" -- if it's in the chain then this txid is in the chain. This means we don't need to make additional queries to electrum/esplora to figure out the block hash that it's actually in.

So maybe it could be more like this:

```rust
positions: HashSet<(Txid, A)>
```

where `A` does the `Anchor` trait which has one method to return the block hash and height whose presence implies the existence -- it may be the height the transaction is actually at or it may be simple the tip that implies its existence. This allows different applications to store different data about their transaction's positions.


## Indexing the chain data

Now that the chain is out of the way and outside of the transaction system we can put the indexing of transactions next to the graph.

```rust
pub struct IndexedTxGraph<I> {
    graph: TxGraph<T>,
    indexer: I
}

// Where I does a trait like this
pub trait TxIndexer {
    // index a whole tx
    fn index_tx(&mut self, tx: &Transaction);
    fn index_txout(&mut self, outpoint: OutPoint, txout: &TxOut);
}
```

## Getting a UTXO set

How do we get a a UTXO set from a set of TxOuts (and outpoints) owned by the wallet, a transaction graph and information about the blocks in the longest chain. The first thing to note is that we can built a predicate `is_in_chain(chain: &_, tx_graph: &TxGraph, txid)` from the information about the longest chain. We simply iterate through all positions of the txid (stored in `TxGraph`) and return true if one of
those blocks exist in the longest chain. Keep in mind that the longest chain can be modelled a `SparseChain` or could be a header chain etc.

### The naive algorithm

Given `is_in_chain` we can easily create a predicate `is_unspent` for any `OutPoint`. With `is_unspent` we can just call it on every `OutPoint` in that we own to find a utxo set.

``` rust
fn is_unspent(chain, tx_graph, txout) {
     if !is_in_chain(chain, tx_graph, txid_of_txout) {
           return false;
     }
     for spending_txid in tx_graph.spends(outpoint_of_txout) {
         if is_in_chain(chain, tx_graph, spending_txid) {
             return false;
         }
     }
    true
}
```

Note that this algorithm has complexity of `O(T)` where `T` is the number of txouts you've ever owned. It also makes multiple calls to `is_in_chain` per outpoint. This might be slow especially if you have to do disk access or query an external service. To make it a bit faster you could cache the result of `is_in_chain` and apply positive results to each of the transaction's parents (a child being in chain implies a parent).

### Optimized algorithms for later development

There should be faster algorithms for discovering this that don't require iterating over the entire txout set. For example, the `TxGraph` could index the set of UTXOs. From there you can visit each one and check if the tx containing it is in the chain. If so, add it to the UTXO set, if not add the parent outputs of the transaction to the candidates list.

## How to handle unconfirmed transactions

We can model unconfirmed transactions as those with no anchors to blocks that are in the chain. We of course need to distinguish between those unconfirmed transactions that are invalid because they double spend (or their parents double spend) a tx that is in the chain and those that *could* be in the mempool.

To the naive algorithm above, when you encounter a output that is not in the chain you can add it to "unconfirmed" list and then from the unconfirmed list decide which of these outputs can exist in the mempool by resolving conflicts in the graph.

## Work plan

While writing this document I discovered how I felt about priorities:

1. Create a type that incorporates a `TxGraph` but also has the "anchor" positions. It could just be done in `TxGraph` itself (e.g. `TxGraph<Anchor>`).
2. Create a `ChainOracle` trait which has a single method to return whether a tx is confirmed given a (block hash, height).
3. Add `*_in_chain` functions that exist on `ChainGraph` to `TxGraph` but take a something implementing the `ChainOracle` trait.
4. Create a type that has a `TxGraph` and something implementing the `TxIndexer` trait.
   Every time a tx is inserted into this type it will add it to the underlying graph and the `TxIndexer` (e.g. `IndexedTxGraph<Anchor, Indexer>`). Implement `TxIndexer` for `KeychainTxOutIndex` and `SpkTxOutIndex`.
5. Move the functionality from `KeychainTracker` to an impl of `IndexedTxGraph<Anchor, KeychainTxOutIndex>`. wd
   For things that need chain data like `full_utxos` you'll need to pass in a `ChainOracle`.
6. Create a new type that is `SparseChain` without the transactions (`SparseChain2`). Implement `ChainOracle` for `SparseChain2`.
7. Make esplora and electrum return a `TxGraph<A>` and a `SparseChain2` as their update
8. Design `bdk`'s  Wallet around this
9. Remove `SparseChain` (or rather rename `SparseChain2` to `SparseChain`), `ChainGraph` and `KeychainTracker`


## Downsides

The downsides of this idea are:

- It's harder to use the information about what's in the mempool from esplora and electrum. Right now we are able to evict transactions that conflict with things that we find in the mempool from the `SparseChain`. With this change we don't really have a way to use this information so which of the conflicting unconfirmed txs you choose as canonical will be somewhat arbitrary. To fix this we could consider having the "last-time-seen" in mempool as part of the TxGraph position for that. This can be monotone so whenever you see it again the in the mempool you replace the existing value if it's larger. Then when you are trying to figure out who's in the mempool and who isn't (from your limited view) you can choose the one that was seen most recently between two conflicts.

- It's hard to see how this improves `Wallet`. This gives much greater flexibility to users of `bdk_chain` but I'm not sure if we'll be able to realize this improvements for users of `Wallet`. To do so we'd have to not model the canonical chain inside wallet and leave it to the user to provide a `impl ChainOracle`. The biggest problem here is that if the user was using a `SparseChain` they'd have to persist changes to it separately which muddies the "batteries included" design goal of `Wallet`  .

- Some `ChainOracle`'s will have to return `Result<bool, _>` rather than just `bool` for whether the block hash and height are in the chain. Some of them might even want to be async! So we're kind of back with offering multiple method for each method that needs to know what's in chain e.g. `balance`, `try_balance` and `try_balance_async`!





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesigning bdk_chain around a "ChainOracle" #895

Removing transactions from `SparseChain`

Indexing the chain data

Getting a UTXO set

The naive algorithm

Optimized algorithms for later development

How to handle unconfirmed transactions

Work plan

Downsides

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Redesigning bdk_chain around a "ChainOracle" #895

Description

Removing transactions from SparseChain

Indexing the chain data

Getting a UTXO set

The naive algorithm

Optimized algorithms for later development

How to handle unconfirmed transactions

Work plan

Downsides

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Removing transactions from `SparseChain`