Skip to content
This repository was archived by the owner on Apr 25, 2024. It is now read-only.

Conversation

@virgil-serbanuta
Copy link
Contributor

No description provided.

@virgil-serbanuta virgil-serbanuta force-pushed the fix-deduplicate-productions branch from 79b3f39 to f31eae5 Compare June 19, 2023 22:57
@virgil-serbanuta virgil-serbanuta marked this pull request as ready for review June 19, 2023 23:10
Comment on lines 1051 to 1052
prods = _prods
_LOGGER.warning(f'Discarding {len(prods) - len(_prods)} equivalent productions')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this always log

Discarding 0 equivalent productions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Sorry, I copy-pasted the code above without making sure that it works. I also fixed the original code.

@ehildenb
Copy link
Member

@virgil-serbanuta do you have a minimal example where multiple productions are produced? I guess that causes a crash, because in several places we assert that a single production is produced, so w eshould add a test of that to our integration tests, to make sure the filtering is working correctly.

I haven't seen a case where this happens yet, but maybe I haven't used fancy enough import structure yet. Maybe we also just need to change the way we recursively collect sentences from imported modules instead of deduplicating the productions after the fact. Right now KDefinition.sentences, for example, just grabs all the sentences in all the modules. It could instead, for example, walk the import DAG, build a traversal that makes sense given the import structure that only visits each module once, then collect the sentences from the modules in that order.

@virgil-serbanuta
Copy link
Contributor Author

virgil-serbanuta commented Jun 20, 2023

@virgil-serbanuta do you have a minimal example where multiple productions are produced?

a.k:

module A
  imports B
  imports C
endmodule

module B
  syntax Int
endmodule

module C
  syntax Int
endmodule
kompile a.k --backend haskell --emit-json

Search for

{"node":"KProduction","klabel":{"node":"KLabel","name":"isInt","params":[]}

in a-kompiled/compiled.json

@virgil-serbanuta
Copy link
Contributor Author

I also added a test which fails on master.

@virgil-serbanuta virgil-serbanuta force-pushed the fix-deduplicate-productions branch from 98e32ce to 719126d Compare June 20, 2023 16:24
@ehildenb
Copy link
Member

I still don't understand how we are getting duplicate productions in the first place, which is what I think needs to be addressed rather than just filtering them out.

Here is where we gather the productions:

def productions(self) -> tuple[KProduction, ...]:
.

Which uses the gathering of the modules:

def modules(self) -> tuple[KFlatModule, ...]:
.

Which uses the gathering of the module names:

def module_names(self) -> tuple[str, ...]:

Which as far as I can tell, should only gather each module a single time (as long as it gets the same name). So in the example of your test, the module INT should only be collected a single time, not twice as it traverses both sides of the import path.

@virgil-serbanuta
Copy link
Contributor Author

@ehildenb I am not importing INT at all.

In this case, B has one definition of isInt and C has the second one. These definitions are generated automatically by kompile. If I would import INT in A, I would also get a third definition (though I'm not sure what would happen if I imported it in B or C).

A imports B and C, so they are included in the loop in module_names, so we get two definitions in productions.

# Automatically defined symbols like isInt may get multiple
# definitions in different modules.
unique_no_att: list[tuple[KProduction, KProduction]] = []
for prod in prods:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After fixing _freeze, a Counter-base solution should work (not thoroughly tested):

prod_cnt = Counter(prod.let_att(remove_source_map_from_katt(prod.att)) for prod in prods)
try:
    prod = single(prod_cnt)
except ValueError as err:
    raise ValueError(f'Expected a single production for label {klabel}, not: {list[prod_cnt]}') from err

num_eq_prods = prod_cnt[prod] - 1
if num_eq_prods:
    _LOGGER.warning(f'Discarding {num_eq_prods} equivalent productions')

self._production_for_klabel[klabel] = prod

This version always returns a production without source map. If retaining it is important (in the case whe there is a single production), a dict-based variant can be implemented with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +1041 to +1042
if len(_prods) < len(prods):
_LOGGER.warning(f'Discarding {len(prods) - len(_prods)} equivalent productions')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(_prods) is the number of keys in counter. counter is a Mapping[KProduction, int] where the key is the representative of the equivalence class (i.e. the production without the source attribute), and the value is the number of productions that fall into the equivalence class (i.e. those that equal the representative when dropping the source attribute).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's correct. I'm not sure why you explained this, you probably wanted to point out something that I'm still missing.

All I'm saying here is that if I have 'n' production, which I group in 'm' equivalence classes, and I keep one production form each class, this means that I dropped all the other productions, i.e. I dropped n-m productions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped n-m productions

I see, thanks for the clarification.

For me, the following order seems more logical though:

  • If there's more than one equivalence class, fail.
  • Otherwise, emit the log message about the number of discarded equivalent productions from the single class.

By the way, using this n-m trick, the Counter is unnecessary for both orderings: you can just use a set instead (converting to list is not necessary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to a set.

To me, this code looks like a series of filters (remove 'unparseAvoid', remove equivalent productions), and other filters may be added in the future. To me, it makes sense to log each filter's activity, then move to the next step (and, for the last filter, the next step is to check how many productions are left and return)

Co-authored-by: Tamás Tóth <[email protected]>
@rv-jenkins rv-jenkins merged commit bdba8da into master Jun 21, 2023
@rv-jenkins rv-jenkins deleted the fix-deduplicate-productions branch June 21, 2023 22:25
@ehildenb
Copy link
Member

@virgil-serbanuta please get in the habit of adding a description to the main PR comment body (which gets included in the git changelog). It helps the reviewer, and it helps people who need to look through the history to find out why a change was introduced.

Baltoli pushed a commit to runtimeverification/k that referenced this pull request Apr 9, 2024
Baltoli pushed a commit to runtimeverification/k that referenced this pull request Apr 9, 2024
Baltoli pushed a commit to runtimeverification/k that referenced this pull request Apr 10, 2024
Baltoli pushed a commit to runtimeverification/k that referenced this pull request Apr 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants