[GSK-1597] Push typo perturbation stochasticity #1362

rabah-khalek · 2023-09-01T12:38:23Z

I had to hash the typo transformation in order to be able to retrieve the same random perturbation when the user go back and forth in the debugger.

The problem is described by two screenshots on linear.

linear · 2023-09-01T12:38:27Z

GSK-1597 Text perturbation random seed is not fixed

…bation-stochasticity

Push feature --------- Co-authored-by: Mathieu Roques <[email protected]> Co-authored-by: Andrey Avtomonov <[email protected]> Co-authored-by: Kevin Messiaen <[email protected]> Co-authored-by: Rabah Abdul Khalek <[email protected]> Co-authored-by: Henrique Chaves <[email protected]> Co-authored-by: Henrique Chaves <[email protected]>

python-client/giskard/push/perturbation.py

andreybavt · 2023-09-01T16:02:48Z

python-client/giskard/push/perturbation.py

+                f"{', '.join(map(lambda x: repr(x), ds_slice_copy.df.values))}".encode("utf-8")
+            )
+            if _hash not in hashed_typo_transformations.keys():
+                hashed_typo_transformations.update({_hash: ds_slice_copy.transform(t)})


this cache will never be emptied as long as the worker is running, can you make it a fixed-sized LRU cache?

why only caching typo transformations and not the rest?

this cache will never be emptied as long as the worker is running, can you make it a fixed-sized LRU cache?

I think it's better to keep the cache as long as possible, it'll be confusing to use for instance @lru_cache(maxsize=16), as user might loose a push notification about typo transformation at some point without knowing why. Is there a problem if the caches is never emptied (for that specific case)?

why only caching typo transformations and not the rest?

Among these, typo transformations are the only ones that are random, thus need to be cached

TextGenderTransformation, TextLowercase, TextPunctuationRemovalTransformation, TextTitleCase, TextTypoTransformation, TextUppercase,

Is there a problem if the caches is never emptied (for that specific case)?

it'll lead to an unpredictable RAM consumption, basically the longer you have a worker running the more RAM it'll consume.

user might loose a push notification about typo transformation at some point

It shouldn't happen, you'd just call hashed_typo_transformations[_hash] = ds_slice_copy.transform(t) more often as you could in case of unlimited cache size. Every time you'll have a cache miss in case of limited cache you'll just recompute a transformation and refresh the cache

Co-authored-by: Andrey Avtomonov <[email protected]>

…bation-stochasticity

sonarqubecloud · 2023-09-05T16:45:52Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

100.0% Coverage
0.0% Duplication

andreybavt · 2023-09-06T08:34:01Z

python-client/giskard/push/perturbation.py

+        kwargs = {}
+        if _is_typo_transformation:
+            hashed_seed = hash(f"{', '.join(map(lambda x: repr(x), ds_slice_copy.df.values))}".encode("utf-8"))
+            positive_hashed_seed = hashed_seed % ((sys.maxsize + 1) * 2)


@rabah-khalek could you add a commend why we're doing it? In a week we won't remember by heart

hashed TextTypoTransformation

5767a4f

Merge branch 'feature/merging-push-feature' into GSK-1597-text-pertur…

fbc2c99

…bation-stochasticity

rabah-khalek requested a review from andreybavt September 1, 2023 12:50

rabah-khalek changed the title ~~hashed TextTypoTransformation~~ [GSK-1597] Test typo perturbation stochasticity Sep 1, 2023

rabah-khalek changed the title ~~[GSK-1597] Test typo perturbation stochasticity~~ [GSK-1597] Push typo perturbation stochasticity Sep 1, 2023

rabah-khalek added Python Pull requests that update Python code push-feature labels Sep 1, 2023

rabah-khalek changed the base branch from feature/merging-push-feature to main September 1, 2023 14:58

Merge branch 'main' into GSK-1597-text-perturbation-stochasticity

54d9855

rabah-khalek changed the base branch from main to feature/merging-push-feature September 1, 2023 15:32

andreybavt suggested changes Sep 1, 2023

View reviewed changes

Update python-client/giskard/push/perturbation.py

9d3de8b

Co-authored-by: Andrey Avtomonov <[email protected]>

rabah-khalek requested a review from andreybavt September 4, 2023 10:16

rabah-khalek added 2 commits September 5, 2023 17:22

removed hashing logic, using seed-based approach instead

cb4f38f

Merge branch 'feature/merging-push-feature' into GSK-1597-text-pertur…

f0a139e

…bation-stochasticity

andreybavt reviewed Sep 6, 2023

View reviewed changes

added comment about hashed_seed

c6354ee

andreybavt approved these changes Sep 6, 2023

View reviewed changes

andreybavt merged commit b4b1967 into feature/merging-push-feature Sep 6, 2023

rabah-khalek added a commit that referenced this pull request Sep 6, 2023

merging #1362

c69ccaa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[GSK-1597] Push typo perturbation stochasticity #1362

[GSK-1597] Push typo perturbation stochasticity #1362

Uh oh!

rabah-khalek commented Sep 1, 2023 •

edited

Loading

Uh oh!

linear bot commented Sep 1, 2023

Uh oh!

Uh oh!

andreybavt Sep 1, 2023

Uh oh!

andreybavt Sep 1, 2023

Uh oh!

rabah-khalek Sep 4, 2023 •

edited

Loading

Uh oh!

andreybavt Sep 5, 2023 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Sep 5, 2023

Uh oh!

andreybavt Sep 6, 2023

Uh oh!

rabah-khalek Sep 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Uh oh!

[GSK-1597] Push typo perturbation stochasticity #1362

[GSK-1597] Push typo perturbation stochasticity #1362

Uh oh!

Conversation

rabah-khalek commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Sep 1, 2023

Uh oh!

Uh oh!

andreybavt Sep 1, 2023

Choose a reason for hiding this comment

Uh oh!

andreybavt Sep 1, 2023

Choose a reason for hiding this comment

Uh oh!

rabah-khalek Sep 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreybavt Sep 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Sep 5, 2023

Uh oh!

andreybavt Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

rabah-khalek Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

rabah-khalek commented Sep 1, 2023 •

edited

Loading

rabah-khalek Sep 4, 2023 •

edited

Loading

andreybavt Sep 5, 2023 •

edited

Loading