-
Notifications
You must be signed in to change notification settings - Fork 21.6k
core/state, cmd/geth: Streaming json output for command #15475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Just as a curiosity, Go does have a capability to generate huge json's in a streaming way and also to parse it as such (https://golang.org/pkg/encoding/json/#example_Decoder_Decode_stream). It needs a bit of manual work, but at least on the Go side we could keep the dump a valid json without needing to store it in memory. Wouldn't that perhaps be a better choice? |
|
@karalabe JSON-object-per-line is a pretty common and easily parsed format, though. |
caner1234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
ff4d6c6 to
3e6a0fa
Compare
|
I have now modified this PR to add I would prefer to keep it cc @Arachnid |
|
Perhaps since the |
|
Well, that kind of makes sense, however, there are quite a lot of options that could potentially be interesting to add, depending on what you want to use it for at that particular time. Such as "only with this segment of code" or those two you mentioned. However, the idea behind --nocode and --nostorage is that having those options there makes the dump a lot faster, simply because we don't have to do additional lookups into databases for Aside from that, there's really not much we can do that will speed up the dump -- remaining things like filtering for low balance can be done in a post-processing step pretty easily by a bash script, and it will be roughly the same speed as if it was done within geth anyway. |
cmd/geth/chaincmd.go
Outdated
| fmt.Printf("%s\n", state.Dump()) | ||
| excludeCode := ctx.GlobalIsSet(utils.ExcludeCodeFlag.Name) | ||
| excludeStorage := ctx.GlobalIsSet(utils.ExcludeStorageFlag.Name) | ||
| if ctx.GlobalIsSet(utils.IterativeOutputFlag.Name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlobalIsSet only returns whether the flag is set, not whether the boolean value is true. If I set --iterative=false, then GlobalIsSet will return true. You're looking for GlobalBool.
cmd/utils/flags.go
Outdated
| } | ||
| IterativeOutputFlag = cli.BoolFlag{ | ||
| Name: "iterative", | ||
| Usage: "Print streaming json iteratively as json objects, delimited by lines", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps json -> JSON
delimited by lines -> delimited by newlines
cmd/utils/flags.go
Outdated
| } | ||
| ExcludeStorageFlag = cli.BoolFlag{ | ||
| Name: "nostorage", | ||
| Usage: "When dumping state, do not include storage (saves db lookups)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "When dumping state" is needed, since the flag is only used during dump.
karalabe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've enumaretad some changes. I can imagine there will be some more, but lets get these fixed to have an overview of how the code looks like afterwards.
cmd/utils/flags.go
Outdated
| } | ||
| ExcludeCodeFlag = cli.BoolFlag{ | ||
| Name: "nocode", | ||
| Usage: "When dumping state, do not include code (saves db lookups)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "When dumping state" is needed, since the flag is only used during dump.
core/state/dump.go
Outdated
| "github.com/ethereum/go-ethereum/rlp" | ||
| "github.com/ethereum/go-ethereum/trie" | ||
| "io" | ||
| "os" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do the same goimports fixup as previously.
core/state/dump.go
Outdated
| json, err := json.MarshalIndent(self.RawDump(), "", " ") | ||
| // RawDump returns the entire state an a single large object | ||
| func (self *StateDB) RawDump(excludeCode, excludeStorage bool) Dump { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please drop this newline.
core/state/dump.go
Outdated
| Storage map[string]string `json:"storage"` | ||
| } | ||
|
|
||
| // For output in a collected format, as one large map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the format Dump is/represents ...
core/state/dump.go
Outdated
| Storage map[string]string `json:"storage"` | ||
| } | ||
|
|
||
| // For line-by-line json output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the format "IterativeDump is ..."
core/state/dump.go
Outdated
|
|
||
| // IterativeDump dumps out accounts as json-objects, delimited by linebreaks on stdout | ||
| func (self *StateDB) IterativeDump(excludeCode, excludeStorage bool) { | ||
| self.performDump(newIterativeDump(os.Stdout), excludeCode, excludeStorage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good design to hard code the output stream here. Please pass it as a parameter. Bonus points if you specify a *json.Encoder as a parameter to make it more obvious what it does.
core/state/dump.go
Outdated
| CodeHash string `json:"codeHash"` | ||
| Code string `json:"code"` | ||
| Storage map[string]string `json:"storage"` | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of duplicating everything, you could achieve a similar effect by adding
Address *string `json:"address,omitempty"`
to DumpAccount.
core/state/dump.go
Outdated
| return &Dump{ | ||
| Accounts: make(map[string]DumpAccount), | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we only ever use this method once, I don't think it's worth the namespace pollution of adding one extra method. Lets just inline it.
core/state/dump.go
Outdated
| }) | ||
| } | ||
|
|
||
| func (self *StateDB) performDump(c collector, excludeCode, excludeStorage bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just call this dump instead.
|
Thanks for the pointers @karalabe , I think I've addressed all those concerns now. |
core/state/dump.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like these defensive code paths. They introduce weird unexpected behavior and promote bad use. I don't think we should check if output is nil. If the programmer called the method with invalid parameters, crash the thing.
core/state/dump.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, I think we should only use an indent of 2 characters. 4 seems a bit excessive tbh.
core/state/dump.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting construct here. If we don't have the preimage, won't this return common.Address{}? In that case, in the old dump, we override a ton of accounts with one another for which we don't have the preimage. In the new case, we'll have a ton of accounts with 0x0..0 address.
core/state/dump.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's cleaner if here you only pass the address, and don't try to convert it. It will simplify code later.
core/state/dump.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets pass common.Address instead of string. It's a cleaner API.
core/state/dump.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use map[common.Address]DumpAccount instead of map[string]DumpAccount?
core/state/dump.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use map[common.Hash][]byte instead of map[string]string?
core/state/dump.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addr will always be set if you do it like this, so it beats the purpose of omitempty. A better solution is to create the DumpAccount first without this field set, and then set it only if addr != 0x0..0.
core/state/dump.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(*json.Encoder)(&self).Encode(struct {
Root common.Hash `json:"root"`
}{root})84e8934 to
91e2e5e
Compare
|
Sorry for taking so long, I have addressed the concerns now and rebased on master. |
|
...and now it's out of sync again ;). Do you still want this? |
|
Yes, I do. I'll rebase it again one of these days... |
|
We've been debating whether the |
|
Makes sense, however, isn't that basically what the dump.go is? |
91e2e5e to
81c358f
Compare
|
I just rebased and squashed. Regarding
It's pretty simple to just pipe the output to whatever handler you want, e..g a simple python script that does whatever it is you want to do, whether it's count all ether or send it to another database. Example 1 Example 2 |
|
Pushed a new change, so that tests pass (?) and so that stream-consumers can get the hashed key if the address (preimage) is missing. In case ppl want to e.g. investigate code, it might be ok to not have the preimage but still see the code. |
|
Is there any update on this PR? |
a60ccb3 to
4528c8f
Compare
karalabe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Currently, doing a dump of the entire state results in a ~9G large json file, which is very heavy to process.
This PR makes it possible to dump out the state in a more machine-friendly manner, whereby the dump output consists of a stream of json-objects, the first line containing the state
root, and each line afterwards representing an account in the trie. Thus, a consumer of the resulting output does not need to have the full 9 Gb in memory during processing, nor does thegethexport node.Examples
Existing functionality:
New functionality: