-
Notifications
You must be signed in to change notification settings - Fork 126
Description
Summary
Add a RetryStacIO (probably inheriting from DefaultStacIO) that retries network requests in a configurable way. It'll probably be modeled on urllib3, though I am not suggesting we add urllib3 as a dependency. E.g.
from pystac import Item
from pystac.stac_io import RetryStacIO
url = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/aster-l1t/items/AST_L1T_00310012006175412_20150516104359"
item = Item.from_file(url, RetryStacIO()) # This will retried with default settings
item = Item.from_file(url, RetryStacIO(total=3, backoff_factor=2)) # Retry can be configured
# If you want to enable retries for all PySTAC operations
from pystac import StacIO
StacIO.set_default(RetryStacIO)Motivation
When doing a large number of operations with PySTAC, sometimes in parallel, it's possible to overwhelm servers or otherwise get transient errors. My specific example involved Item.to_dict -- I forgot to specify transform_hrefs=False (clapback to #546 (comment)), and did this ~1 million times:
d = item.to_dict(include_self_href=False)Under the hood, this does a network request to resolve the root, and those requests would sometimes error out. I solved my specific problem by disabling href transforms:
d = item.to_dict(include_self_href=False, transform_hrefs=False)But a retry-enabled PySTAC would be useful in any large-scale/batch processing context.
Alternatives
We could add urllib3 as a dependency and use it's Retry explicitly. I don't think that's a terrible thing -- urllib3 is widely used and popular. requests is built on it, so adding urllib3 could represent a gradual slide towards just biting the bullet and adding requests too.