Conversation
…e tree insertion work with commit queue.
…sed node sharing.
…built using that tree by deferring the removal.
|
I started looking at the PR, but it is big a big one (I did not look at the bench part yet), so I am summing up here my initial questions.
|
|
Thanks for the comment. Will try and answer.
The TreeReader stuff is there because we’re using Value table addresses directly as node addresses. Hence we need some way for a client to signal that it’s using nodes from an existing tree and the Db shouldn’t touch them (move or remove). This is what the TreeReader does.
The reference count table and cache are purely optimizations so yes could be in a separate PR. They only store reference counts > 1 so are expected to be sparse (most nodes have ref count of 1).
Yeah, this is just an optimization so we don’t have to store (and access) ref counts for most of the nodes at all. The first implementation just used the already existing ref counting that Value tables have.
Only the roots have entries in the index table.
When commiting a tree it ‘claims’ value table entries immediately (in commit) which can be done using the table header data and the free list without reading or writing any table data or log overlays. You’re right that the ordered list isn’t actually used yet! That would be used in the future for optimizing trees by laying out nodes contiguously.
Thanks, I’ll have a look at that! |
|
thanks for the replies,
I don't know if we need this, an error sounds fine to me. But ok, I guess I got it: if user try to access at a given address A and in between it got deleted and used again (very likelly due to the way we use the address back), then it will get an incorrect result but would not know it. I was just thinking it would get an error but I was wrong. I guess the tree viewer make sense (I could think of checking root presence in some cache every time we access node, but it is certainly better to just update an atomic over reader when we drop the tree, and then just return an error to get_node if tree_viewer is deprecated. So for a start I would find it easier to go with a treeviewer that do not defer commit, commit will just make all treeviewer session invalid.
I did not think about it, can be sparse indeed.
I understand that putting the reference count in their own table can be good, but I would have expected the table to get indexed indentically to the values tables, is it?
Then what happens for the reference count?? I am quite sure I don't get the reference count model, and the need to reindex it.
👍 |
Right. The client would have to cope with potentially failed commits? I think the Db would still need to track which TreeReaders were active when a dereference happened.
It uses a hash of the Value table address. This is because only a small proportion of nodes need a reference count. So didn’t want a ref count entry for every node.
Roots act like a standard Hash column entry. They have an index table entry and a Value table entry with potentially ref counting or not depending on if the client code wanted ref counted roots.
|
cheme
left a comment
There was a problem hiding this comment.
I think I finally get how the rc table works .
I tried to slimdown the comment I did at the time, especially the incorrect ones, but some may have slip through.
admin/src/lib.rs
Outdated
| db_options.columns.push(info_column); | ||
| } | ||
|
|
||
| let info_column = &mut db_options.columns[1]; |
There was a problem hiding this comment.
could use multitree_bench::INFO_COLUMN rather than 1 and 2 .
src/db.rs
Outdated
| #[derive(Debug, PartialEq, Eq)] | ||
| pub enum NodeChange { | ||
| /// (address, value, compressed value, compressed) | ||
| NewValue(u64, RcValue, RcValue, bool), |
There was a problem hiding this comment.
Isn't compressed defined per column ? (if so would be better to not have it here)
There was a problem hiding this comment.
This is whether the specific value was actually compressed or not (eg, if compressed length isn't smaller than the initial value then it is not compressed).
Though values are never compressed right now as discussed elsewhere so I could just remove it for now and store a single value.
src/table.rs
Outdated
| self.last_removed.store(next_removed, Ordering::Relaxed); | ||
| if let Some(mut free_entries) = free_entries_guard { | ||
| let last = free_entries.stack.pop().unwrap(); | ||
| assert_eq!(last, last_removed); |
There was a problem hiding this comment.
Wouldn't it be possible to skip reading last_removed here? (and remove assert_eq).
Otherwise would replace assert_eq by a debug_assert here.
There was a problem hiding this comment.
Is this actually used (or is claim_next_free mainly use)?
There was a problem hiding this comment.
Made it debug_assert. next_free does need to maintain the free list correctly as root nodes can share value tables with child nodes and root nodes still use standard committing (with next_free).
src/table.rs
Outdated
| let last_removed = self.last_removed.load(Ordering::Relaxed); | ||
| let index = if last_removed != 0 { | ||
| let last = free_entries.stack.pop().unwrap(); | ||
| assert_eq!(last, last_removed); |
There was a problem hiding this comment.
| assert_eq!(last, last_removed); | |
| debug_assert!(last == last_removed); |
| } | ||
| } | ||
| } | ||
| // TODO: Remove TreeReader from Db. |
There was a problem hiding this comment.
Not too sure about the meaning of this TODO?
Thanks. Will have a look at them and fix/reply. |
There was a problem hiding this comment.
Ideally this should've been a --tree option to the strees command, rather than a separate command. I'm fine with this version for now, but consider merging them in the future.
src/column.rs
Outdated
| 1 + data.len() + num_children as usize * 8 | ||
| } | ||
|
|
||
| pub fn pack_node_data(data: Vec<u8>, child_data: Vec<u8>, num_children: u8) -> Vec<u8> { |
There was a problem hiding this comment.
Right, the optimal layout should be [data][children][num_children]. This way unpacking function can just shrink the input vec and avoid a memmove. Packing would also be appending to data and returning it.
src/table.rs
Outdated
|
|
||
| pub fn complete_plan(&self, log: &mut LogWriter) -> Result<()> { | ||
| let _free_entries_guard = if let Some(free_entries) = &self.free_entries { | ||
| Some(free_entries.write()) |
There was a problem hiding this comment.
Yes, read should work here. Done.
src/column.rs
Outdated
| let mut data_buf = [0u8; 8]; | ||
| data_buf.copy_from_slice(&address.to_le_bytes()); | ||
| data.append(&mut data_buf.to_vec()); |
There was a problem hiding this comment.
| let mut data_buf = [0u8; 8]; | |
| data_buf.copy_from_slice(&address.to_le_bytes()); | |
| data.append(&mut data_buf.to_vec()); | |
| data.extend_from_slice(&address.to_le_bytes()); |
src/db.rs
Outdated
| } | ||
| if !self.options.columns[col as usize].append_only && external_call { | ||
| return Err(Error::InvalidConfiguration( | ||
| "get_node can only be called on a column with append_only option.".to_string(), |
There was a problem hiding this comment.
Should be allowed for now. We won't be using the TreeReader in substrate initially
There was a problem hiding this comment.
There won’t be any guarantee that the NodeAddress is still valid though, as it might have been removed. The Db won’t even be able to warn if it has happened.
Is there some external guarantee that this won’t happen?
There was a problem hiding this comment.
Yes, in substrate there's a higher level state pinning mechanism. It prevents the tree from being deleted while there are active readers.
There was a problem hiding this comment.
Ok. Should I add an option so the client has to choose to forego any checks/guarantees?
There was a problem hiding this comment.
Added the option.
| old_bytes: usize, | ||
| old_id: u64, | ||
| new_id: Option<u64>, | ||
| ) -> Result<()> { |
There was a problem hiding this comment.
Alternative solution is to keep nodes referenced by TreeReaders in a shared memory storage. Once the node that's referenced by any of the live nodes in the cache is deleted from the disk, it is evicted to the memory storage. Something to cosider doing later.
For now I'd agree with @cheme. It looks like we can get away with simply detecting reading a dereferenced tree and producing an error.
| *ops += 1; | ||
| } | ||
| for change in self.node_changes.iter() { | ||
| match change { |
There was a problem hiding this comment.
Would it possible to move all that code and write_dereference_children_plan to the column module?
There was a problem hiding this comment.
I tried this but it's a bit messy as the code potentially needs to create TreeReaders and they need DbInner.
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod test { |
There was a problem hiding this comment.
Would be good to get some tests working
I think at the end of the review I got accustomed to this usage, so ok for keeping key then. |
cheme
left a comment
There was a problem hiding this comment.
Looks like most concerns are addressed.
Maybe the one with passing db two time in function:
I drafted this branch here for it:
https://github.com/MattHalpinParity/parity-db/compare/multi_tree...cheme:parity-db:cheme/multi-tree?expand=1
specifically MattHalpinParity@f2c8dce
but it is a bit verbose (lot of .0 added), maybe just changing proto of
fn fn_name(&self, db: &Arc, .....
to
fn fn_name(db: &Arc, .....
would be better (even if then we have some Self::fn_name calls instead of self.fn_name calls).
(I will be updating to this branch on one of my polkadot sdk, and try to open a draft pr as soon as)
| return Err(Error::InvalidValueData) | ||
| } | ||
| let data_len = data.len() - (child_buf_len + 1); | ||
| let mut children = Children::new(); |
There was a problem hiding this comment.
| let mut children = Children::new(); | |
| let mut children = Children::with_capacity(num_children); |
| tier_index: &mut HashMap<usize, usize>, | ||
| node_values: &mut Vec<NodeChange>, | ||
| data: &mut Vec<u8>, | ||
| ) -> Result<()> { |
There was a problem hiding this comment.
Would almost propose to return impl Iterator<Item = Address> by using Iterator::from_fn (not passing data). This way we can keep the pack_data function (next to unpack it makes it easy to check format is consistent).
But I am not sure if it make things easier to follow here.
I'd rather just use |
Issue #199
This introduces a multitree option for columns with related tree commit operations as well as querying functions.