Skip to content

New feature: StacIO that retries network requests #958

@gadomski

Description

@gadomski

Summary

Add a RetryStacIO (probably inheriting from DefaultStacIO) that retries network requests in a configurable way. It'll probably be modeled on urllib3, though I am not suggesting we add urllib3 as a dependency. E.g.

from pystac import Item
from pystac.stac_io import RetryStacIO
url = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/aster-l1t/items/AST_L1T_00310012006175412_20150516104359"
item = Item.from_file(url, RetryStacIO())  #  This will retried with default settings
item = Item.from_file(url, RetryStacIO(total=3, backoff_factor=2))   # Retry can be configured

# If you want to enable retries for all PySTAC operations
from pystac import StacIO
StacIO.set_default(RetryStacIO)

Motivation

When doing a large number of operations with PySTAC, sometimes in parallel, it's possible to overwhelm servers or otherwise get transient errors. My specific example involved Item.to_dict -- I forgot to specify transform_hrefs=False (clapback to #546 (comment)), and did this ~1 million times:

d = item.to_dict(include_self_href=False)

Under the hood, this does a network request to resolve the root, and those requests would sometimes error out. I solved my specific problem by disabling href transforms:

d = item.to_dict(include_self_href=False, transform_hrefs=False)

But a retry-enabled PySTAC would be useful in any large-scale/batch processing context.

Alternatives

We could add urllib3 as a dependency and use it's Retry explicitly. I don't think that's a terrible thing -- urllib3 is widely used and popular. requests is built on it, so adding urllib3 could represent a gradual slide towards just biting the bullet and adding requests too.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions