Skip to content

Conversation

@Pijukatel
Copy link
Collaborator

@Pijukatel Pijukatel commented Jan 6, 2025

Description

Align Python implementation of use_state wirth JS

Add lock when accessing the cache or kvs to prevent race conditions.
Add unit tests for race conditions.

Breaking

Remove key argument from use_state and hard code it to BasicCrawler.CRAWLEE_STATE_KEY.
(Change in experimental feature is not marked as hard breaking with !.)

Issues

Remove key argument from use_state and hardcode it instead.
Add unit tests for race conditions.
Add lock when accessing the cache or kvs to prevent race conditions.
@Pijukatel Pijukatel added bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. labels Jan 6, 2025
It is pretty uggly without Barrier.
@Pijukatel Pijukatel changed the title Align Python implementation of use_state wirth JS fix!: Align Python implementation of use_state wirth JS Jan 6, 2025
@Pijukatel Pijukatel changed the title fix!: Align Python implementation of use_state wirth JS fix: Align Python implementation of use_state wirth JS Jan 7, 2025
@github-actions github-actions bot added this to the 105th sprint - Tooling team milestone Jan 7, 2025
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Jan 7, 2025
@Pijukatel Pijukatel marked this pull request as ready for review January 7, 2025 07:29
@B4nan
Copy link
Member

B4nan commented Jan 7, 2025

the PR title should be informative about what the change does, "align with JS version" is not gonna tell much to our users, they don't know the JS version

@janbuchar
Copy link
Collaborator

May not be related, but I observed flaky behavior in state handling here

@Pijukatel
Copy link
Collaborator Author

May not be related, but I observed flaky behavior in state handling here

I will check it. It is supposed to be autosaved when specific event happens. It is supposed to happen every 50ms, so 1s seemed like a long enough reserve. Maybe there is some hard block in some cases and no amount of reserve time is enough?
https://github.com/apify/crawlee-python/blob/master/tests/unit/storages/test_key_value_store.py#L29

@Pijukatel Pijukatel changed the title fix: Align Python implementation of use_state wirth JS fix: Avoid use_state race conditions. Remove key argument to use_state. Jan 8, 2025
Copy link
Collaborator

@Mantisus Mantisus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit, but I want to re-read the tests after my comments are resolved.

assert await autosaved_within_deadline(key=key_name, expected_value={'hello': 'new_world'})


async def test_get_auto_saved_value_auto_save_race_conditions(key_value_store: KeyValueStore) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this reliably reproduce the error if you remove the lock?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried stress testing it 1000x times and it failed with old implementation all the time. But I acknowledge that it is not guaranteed and in theory it can pass. Same as in reality if such delay in kvs access occurs, it will most likely end in race condition, but in theoretical case it does not have to.
I did try coming up with test without such weakness, but I failed to find better approach.

This test should not have any false positives and should not be flaky, but has theoretical chance for false negative. In CI we run the same test for various Python implementations which makes already purely theoretical chance completely negligible.

@vdusek vdusek changed the title fix: Avoid use_state race conditions. Remove key argument to use_state. fix: Avoid use_state race conditions. Remove key argument to use_state Jan 9, 2025
@bigcartelo

This comment was marked as outdated.

@Pijukatel Pijukatel requested a review from janbuchar January 10, 2025 08:19
Copy link
Collaborator

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Pijukatel Pijukatel merged commit 000b976 into master Jan 10, 2025
23 checks passed
@Pijukatel Pijukatel deleted the align-use-state-with-JS-implementation branch January 10, 2025 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Align use_state with JS implementation

7 participants