Skip to content

Conversation

@charlielewisme
Copy link
Collaborator

@charlielewisme charlielewisme commented Dec 19, 2022

Increase reliability for polygon_export_dag

Task export_geth_traces takes up to 8 hours. If it fails just before it completes, data is very late. This PR helps by splitting the task in 24 tasks, each for 1 hour of block time. For simplicity, I also split up the downstream tasks (extract_contracts & extract_tokens) into 24 tasks each as well. In the load_dag, rather than 24 wait operators for each of these 3 entities, I consolidated all the wait operators into one.

Greater efficiency could be achieved by calling get_block_range() just once, and chunking the block range into 24 chunks (equal in terms of block_number, rather than block_timestamp). But in this case, I've chosen a naive approach, making this call 24 times, since it's very fast already, and simpler.

Testing

  • ran the export and load dags in a dev environment for 2022-12-18 -> 2022-12-20
  • checked files in GCS, confirm that files look as expected
  • ran full row-by-row diff test comparing dev and prod partitions in BigQuery, confirm that everything exactly the same, except "total_supply" for 18 rows in tokens
  • run load_dag with load_all_partitions=True (to check that nothing is broken there)
  • re-run diff tests

@charlielewisme charlielewisme force-pushed the feat/split-task-export-geth-traces branch from fb1f643 to 333fd7e Compare December 21, 2022 11:21
@charlielewisme charlielewisme marked this pull request as ready for review December 21, 2022 14:09
Copy link
Collaborator

@gulshngill gulshngill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but I have a couple of questions.

  • Would we need to scale prod composer to avoid tasks from being killed randomly?
  • I notice that we sometime hit the rate limit for traces (for example here). It's not a blocker since it seems to work after retrying but I remember we had an issue in evmchain-etl where where the code did not handle retries correctly and we had duplicate blocks. Should we reduce the number of active tasks?

@TimNooren TimNooren removed their assignment Dec 21, 2022
@charlielewisme
Copy link
Collaborator Author

lgtm but I have a couple of questions.

  • Would we need to scale prod composer to avoid tasks from being killed randomly?
  • I notice that we sometime hit the rate limit for traces (for example here). It's not a blocker since it seems to work after retrying but I remember we had an issue in evmchain-etl where where the code did not handle retries correctly and we had duplicate blocks. Should we reduce the number of active tasks?

@gulshngill Thx for review.

  1. The review composer does have a very slightly different config from Nansen prod, nothing much but maybe worth looking at in case the auto-scaling acts a bit funny with so many small tasks. If necessary we can bump our lower worker limit to stop the auto-scaling getting too busy.
  2. Our node has been holding up surprisingly well (I'm running the EIP1559 backfill, plus our normal production runs and all off the same node and it hasn't fallen down). Again I might take a look once we're in production but I'm fairly sure it's okay for now.

I take your point, though -- the main purpose here is reliability, not performance... although the performance increase is pretty huge... so we can definitely afford to slow it down a bit if needs be. If you're okay with it, I suggest we leave as is, then revisit this if necessary.

@gulshngill
Copy link
Collaborator

Yup it's something we can definitely adjust later on too 👍

@charlielewisme charlielewisme merged commit 6ca2c6f into main Dec 21, 2022
@charlielewisme charlielewisme deleted the feat/split-task-export-geth-traces branch December 21, 2022 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants