Conversation
|
👍 TBH: I've problems to get the whole picture. Is this still working with leveldb, also now being synchronous? Why was this then asynchronous in the first place? Is this now slower when being used with leveldb and faster with maps? |
I’m still determining how we want to handle leveldb features and functionality, hence my prior questions around them. Previously every trie query used leveldb.get and .set, which are rather slow async operations, at least compared to a native Maps implementation - especially when walking down a trie. These changes totally convert the lib to use maps as its trie data store, except for some code to preserve initializing a Trie with a leveldb file. Remaining features that could be implemented to get closer to prior feature parity could be to save to the passed-in leveldb on every op, keeping it in sync, however I think a user of the lib could just do that themselves if it was important to them, is anyone using it like that right now? (We wouldn’t even need to provide leveldb as an init parameter if we provided a helper func to convert a leveldb to map, which would solve the init race condition detailed in the todo above). I think I would be able to find more optimizations in the code now being in a sync environment, however this PR does a pretty good job at quickly converting the lib in how it used to function while also reducing complexity (e.g. I was able to fully refactor out the walkController). |
3bbb6d7 to
1381cf1
Compare
There was a problem hiding this comment.
this assumes that the tree at runtime is only backed by an in-memory synchronously readable store
and removes the possibility of lazily reading from disk/db
this is not an appropriate assumption for the state tree
this breaks compatibility with existing features in ganache
This will prevent the ethereumjs-vm from being used with any chain with a non trivial amount of data (long running ganache instance, public testnet, mainnet)
@ryanio I appreciate your effort here trying to simplify, improve perf, and remove the aging leveldb
but I OPPOSE this change, do not change the interface to make get/sets async
|
In my opinion, the correct change here is to identify a different generic async datastore interface, like https://github.com/ipfs/interface-datastore/ |
|
@ryanio how does perf differ with a leveldb interface wrapper over Map? |
|
@kumavis thanks for your comments/feedback. Could you link me to how ganache uses MPT or ethereumjs-vm in a long running context? Are you passing in a leveldb? I believe the ethereumjs-vm StateManager would remain async and could still interface with external db stores. You are right that these changes don’t allow for lazy reading of a large trie, which was one of my main questions before getting started, but it doesn’t seem to be a primary use case of MPT so the performance enhancement is preferred?
Hm what do you mean “with a leveldb interface wrapper over Map”? |
implementation of the DB class merkle-patricia-tree/src/db.ts Line 21 in 7583aa9 but around a Mapperf comparison vs an in memory leveldb instance |
I've pinged a ganache dev to comment
databases in general dont load everything into memory before use. if this is a toy, or only used for small tries its fine, but it was not originally built for that purpose. is this refactor primarily guided by optimization? as an aside, i still think theres potential fat perf gains to be had by making the hashing lazy, as there are cases where we do a lot of hashing for values that become stale before we commit / query the hash. this is mostly speculation, but may offer practical perf gains for the vm |
|
Hey @ryanio great PR. As it turns out me and @msieczko were working on the same thing: Our code includes some improvements on the test and benchmark side as well as the async to sync change. Maybe we can discuss this together. I send you an email (using the address from your commits). |
|
Ganache needs to reliably persist the data to disk somehow; it is totally possible for users to create a multi-gigabyte trie database while using Ganache, and this is a use case we'd like to continue to support. I don't think you'll be able to fit a 4 GB+ trie in a That said, multi-GB database are likely a very extreme edge case. Possibly relevant: in the future Ganache will actually have two modes of db storage: disk and in-memory.
For perspective, users have requested the ability to use a geth database with ganache. IIRC, Mainnet's pruned state trie is well over 20GB. I'd love for Ganache to be able to support a full (non-archive) state in the future. |
|
@kumavis @davidmurdoch Just to give you a bit more context: In our understanding Ganache and Ethereumjs-vm is used mostly for this use case and we would like to optimise for that. Do you know any actual projects that use ethereumjs-vm for storing gigabytes of data? |
|
Hi @marekkirejczyk, on the EthereumJS side we also have our ethereumjs-client project, where we want to keep the flexibility on storing larger amounts of data to disk and beside @kumavis and @davidmurdoch have mentioned significant current use cases respectively usage scenarios. So removing leveldb support from this library won't be an option and we have to think of other possible ways of moving forward here, if we decide that it's worth it. PR from @ryanio is nevertheless super-useful to have some discussion ground here. Eventually possible:
@s1na also mentioned in our internal chat that it might be a mid-term goal worth following to support the flat DB structure from turbo-geth, will quote him here directly (Sina hope that's ok):
Maybe that's also something to really consider seriously, this would likely rather happen on the StateManager level (to be extracted to its own package from the VM along with the next v5 release). |
|
Closing this for now. I think there are strong use cases for both async and sync uses of this library, so I will investigate how easily we may be able to add a Sync API to the existing implementation without creating a maintenance burden. The flat DB structure from turbo-geth also seems like a great optimization to keep in mind as high priority for the async side of things. |
This PR tests removing
leveldbdependencies inmerkle-patricia-tree.By using native
mapsas suggested from EthWork's performance review, we get the benefit of extremely fast and synchronous key-value lookups. Keys are encoded as hex strings.Changes
Mapin place ofleveldbreadStream,scratchReadStreamsemaphore,prioritizedTaskExecutor,walkControllerToDo
Diff files
db.ts, baseTrie.ts
Performance
Benchmark result run for
v3.0.0usingmemdown:Benchmark result run for this PR: