-
Notifications
You must be signed in to change notification settings - Fork 2
Deduplicate productions #505
Conversation
79b3f39 to
f31eae5
Compare
src/pyk/kast/outer.py
Outdated
| prods = _prods | ||
| _LOGGER.warning(f'Discarding {len(prods) - len(_prods)} equivalent productions') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this always log
Discarding 0 equivalent productions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Sorry, I copy-pasted the code above without making sure that it works. I also fixed the original code.
|
@virgil-serbanuta do you have a minimal example where multiple productions are produced? I guess that causes a crash, because in several places we assert that a single production is produced, so w eshould add a test of that to our integration tests, to make sure the filtering is working correctly. I haven't seen a case where this happens yet, but maybe I haven't used fancy enough import structure yet. Maybe we also just need to change the way we recursively collect sentences from imported modules instead of deduplicating the productions after the fact. Right now |
a.k: Search for in a-kompiled/compiled.json |
|
I also added a test which fails on master. |
98e32ce to
719126d
Compare
|
I still don't understand how we are getting duplicate productions in the first place, which is what I think needs to be addressed rather than just filtering them out. Here is where we gather the productions: Line 985 in 8b8f791
Which uses the gathering of the modules: Line 981 in 8b8f791
Which uses the gathering of the module names: Line 966 in 8b8f791
Which as far as I can tell, should only gather each module a single time (as long as it gets the same name). So in the example of your test, the module |
|
@ehildenb I am not importing INT at all. In this case, B has one definition of isInt and C has the second one. These definitions are generated automatically by A imports B and C, so they are included in the loop in |
src/pyk/kast/outer.py
Outdated
| # Automatically defined symbols like isInt may get multiple | ||
| # definitions in different modules. | ||
| unique_no_att: list[tuple[KProduction, KProduction]] = [] | ||
| for prod in prods: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After fixing _freeze, a Counter-base solution should work (not thoroughly tested):
prod_cnt = Counter(prod.let_att(remove_source_map_from_katt(prod.att)) for prod in prods)
try:
prod = single(prod_cnt)
except ValueError as err:
raise ValueError(f'Expected a single production for label {klabel}, not: {list[prod_cnt]}') from err
num_eq_prods = prod_cnt[prod] - 1
if num_eq_prods:
_LOGGER.warning(f'Discarding {num_eq_prods} equivalent productions')
self._production_for_klabel[klabel] = prodThis version always returns a production without source map. If retaining it is important (in the case whe there is a single production), a dict-based variant can be implemented with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| if len(_prods) < len(prods): | ||
| _LOGGER.warning(f'Discarding {len(prods) - len(_prods)} equivalent productions') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(_prods) is the number of keys in counter. counter is a Mapping[KProduction, int] where the key is the representative of the equivalence class (i.e. the production without the source attribute), and the value is the number of productions that fall into the equivalence class (i.e. those that equal the representative when dropping the source attribute).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's correct. I'm not sure why you explained this, you probably wanted to point out something that I'm still missing.
All I'm saying here is that if I have 'n' production, which I group in 'm' equivalence classes, and I keep one production form each class, this means that I dropped all the other productions, i.e. I dropped n-m productions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped n-m productions
I see, thanks for the clarification.
For me, the following order seems more logical though:
- If there's more than one equivalence class, fail.
- Otherwise, emit the log message about the number of discarded equivalent productions from the single class.
By the way, using this n-m trick, the Counter is unnecessary for both orderings: you can just use a set instead (converting to list is not necessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to a set.
To me, this code looks like a series of filters (remove 'unparseAvoid', remove equivalent productions), and other filters may be added in the future. To me, it makes sense to log each filter's activity, then move to the next step (and, for the last filter, the next step is to check how many productions are left and return)
Co-authored-by: Tamás Tóth <[email protected]>
|
@virgil-serbanuta please get in the habit of adding a description to the main PR comment body (which gets included in the git changelog). It helps the reviewer, and it helps people who need to look through the history to find out why a change was introduced. |
Co-authored-by: devops <[email protected]> Co-authored-by: Tamás Tóth <[email protected]>
Co-authored-by: devops <[email protected]> Co-authored-by: Tamás Tóth <[email protected]>
Co-authored-by: devops <[email protected]> Co-authored-by: Tamás Tóth <[email protected]>
Co-authored-by: devops <[email protected]> Co-authored-by: Tamás Tóth <[email protected]>
No description provided.