Conversation
Codecov Report
@@ Coverage Diff @@
## main #1586 +/- ##
=======================================
Coverage 50.96% 50.96%
=======================================
Files 92 92
Lines 5751 5751
=======================================
Hits 2931 2931
Misses 2520 2520
Partials 300 300 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
this is dope! I appreciate the similarities to the tendermint e2e test, and see how we can add the desired tests.
Are we wanting to eventually enable celestia-node to use this as well? I could see them being able to reuse a lot of the code here.
mainly just a had a few questions, but other than that I think it LGTM, and we should merge and start adding tests/iterating as needed
| services: | ||
| {{- range .Nodes }} | ||
| {{ .Name }}: | ||
| labels: | ||
| e2e: true | ||
| container_name: {{ .Name }} | ||
| image: ghcr.io/celestiaorg/celestia-app:{{ index .Versions 0 }} | ||
| entrypoint: ["/bin/celestia-appd"] | ||
| command: ["start"] | ||
| init: true |
There was a problem hiding this comment.
my local docker setup might be wonk, beacuse I had to add
user: root in order to get the test to work. If not, the test will appear to be waiting, but none of the nodes actually start because they do not have permission to write to the volume.
does anyone know what I should do to fix that locally? change the group permissions I have for docker? cc @sweexordious
There was a problem hiding this comment.
Hmm. The docker image runs using user 1001 however I made the permissions of the volume 0o777 so everyone should have read write execute access.
Yeah Potentially. Matt mentioned having a common repository for e2e testing of various node types in a system. If we are to do this I'd definitely recommend reusing parts of this code. |
rach-id
left a comment
There was a problem hiding this comment.
Awesome stuff 🎉
When I run this locally, the validators start correctly, but when I want to stop them using ctrl + c, everything hangs until I kill the terminal session.
When this happens, I check the running docker images and I see none, so probably this is not related to docker runtime.
Is that the correct way to stop the framework?
Is anyone having the same issue or it's just me?
It should catch the sig term and just cancel the context. Let me dig in a little. In any case running |
|
cc @celestiaorg/testing for visibility |
|
Few questions from my side why we opted for in-house framework instead of using interchaintest?state-sync example why not just re-use their implementation of chain creation using docker-compose and networking?It has the same capabilities as we have rn. Same with network pruning/setup imho, it looks like they have took care of docker networks cleanup which is clumsy if we want more control over implementation - why not reuse compose-spec like libp2p does?Seeing the matrix versioning schema that libp2p does - we can def reuse the hard work they pushed. Pros of aboveBoth have CI/CD implementation already meaning that we can speed up dev time and contribute back if something we really need |
|
My main concern over this PR - we are signing up on maintaining long-term, while there are tools that can help us achieve the same without maintaining it (interchaintest is constantly being updated, same is for libp2p/test-plan) |
|
Hey,
The simple answer is we have a modified fork of the SDK which is not compatible with Secondly, interchaintest is very IBC orientated. This means some of the initial goals of upgrade testing (we will have a different upgrade mechanism to most cosmos chains), compatibility testing and sync testing don't seem to natively fit their framework. Lastly, for an open source library, I think there documentation and tutorials around using their library are lacking (although the inline commenting is relatively helpful). As well as maintenance we have to also consider onboarding costs to understanding a new (and large) codebase (when something simpler might suffice).
I'm sympathetic to this. Ideally we reach our testing goals with little duplication and overlap. When we spoke prior to this it seemed like testground was not suitable. Rene tells me that robusta uses a different more light-weight framework that may be ideal for the outlined purposes. I'm not sure about the other (libp2p) library you mention but I definitely think that what's done here is quite generic: It's simply managing a series of binaries within a network and using the RPC endpoints to make assertions. What I'm more concerned about is that it seems like these testing goals are perhaps the same across the other teams (node have their "swamp" testing) which seems, at least to me, to heavily overlap with what core/app is doing. If we have a "testing" team then perhaps efforts from individual teams can be consolidated |
rootulp
left a comment
There was a problem hiding this comment.
Overall LGTM. I'm encountering an error while trying to run locally
$ ./build/e2e -f networks/simple.toml
Setting up network simple-53126
Spinning up testnet
Starting validator01 at height 0 on http://localhost:4202
Error: starting network: exec: "docker compose": executable file not found in $PATHI have docker compose installed in my $PATH:
$ docker-compose --version
Docker Compose version v2.15.1
$ which docker-compose
/usr/local/bin/docker-compose- Should we list that installing docker compose is a prerequisite (potentially in README.md)?
- Am I missing a different prerequisite step?
| github.com/dgraph-io/badger/v2 v2.2007.4 // indirect | ||
| github.com/dgraph-io/ristretto v0.1.0 // indirect | ||
| github.com/dgryski/go-farm v0.0.0-20200201041132-a6ae2369ad13 // indirect | ||
| github.com/docker/docker v20.10.21+incompatible |
There was a problem hiding this comment.
[informational] I was concerned by the +incompatible suffix but it seems ok per https://stackoverflow.com/a/57356777
testing/e2e/pkg/rpc.go
Outdated
| return nil, errors.New("network is not running") | ||
| } | ||
|
|
||
| // return heights in ascending order |
There was a problem hiding this comment.
[nit] this line claims to sort heights in ascending order but the function godoc comment claims to return heights in descending order.
// GetHeights loops through all running nodes and returns an array of heights
// in order of highest to lowest.
func GetHeights(ctx context.Context, testnet *Testnet) ([]int64, error) {
|
There are 2 topics that got confused and we clarified them here. The original idea behind this PR is for teams to do nightly tests using minimal e2e framework. I thought that this is only for core/app and integration ones executed on every PR. Going forward, we separated into 2 topics:
Nightly tests@celestiaorg/devops will implement infra features for robusta-nightlies after @cmwaters shares the notion doc with us. Integration testsThis PR should be used on Integration level per PR run. In addition, we need to migrate it to a standalone repo, so @celestiaorg/celestia-node and @rollkit can use to fulfil their testing strategies In addition, we will reuse the compose-spec that libp2p/test-plan has to make it CI/CD ready out-of-the-box Thanks for the input and clarifications 🚀 |
staheri14
left a comment
There was a problem hiding this comment.
Nice work on this PR. 👍
| "path/filepath" | ||
| ) | ||
|
|
||
| // execute executes a shell command. |
There was a problem hiding this comment.
| // execute executes a shell command. | |
| // exec executes a shell command. |
| Key crypto.PrivKey | ||
| } | ||
|
|
||
| func LoadTestnet(manifest Manifest, file string) (*Testnet, error) { |
There was a problem hiding this comment.
[nit-optional] I think filePath is closer to what this parameter means.
| func LoadTestnet(manifest Manifest, file string) (*Testnet, error) { | |
| func LoadTestnet(manifest Manifest, filePath string) (*Testnet, error) { |
testing/e2e/pkg/testnet.go
Outdated
|
|
||
| // Address returns an RPC endpoint address for the node. | ||
| func (n Node) AddressRPC() string { | ||
| return fmt.Sprintf("%v:26657", n.IP.String()) |
There was a problem hiding this comment.
[nit-optional]: Introducing a constant variable to represent the port number 26657 would enhance the code's future maintainability and readability.
| } | ||
|
|
||
| func (n Node) IsValidator() bool { | ||
| return n.SelfDelegation != 0 |
There was a problem hiding this comment.
[optional] Since a non-zero SelfDelegation is the only requirement for being recognized as a validator, why not use a boolean type for SelfDelegation instead?
There was a problem hiding this comment.
Because we may want to set the self delegation to an actual number which will reflect the initial voting power of the validator
|
morge? |
Sorry for missing this @rootulp. |
96fae19
Ref: #1256
This PR puts together the base pieces for an e2e testing suite. It includes a CLI that can:
Further work aims to support: