Skip to content

Reading lockfiles if specified in the runcard#818

Closed
siranipour wants to merge 2 commits intomasterfrom
correcting_lockfiles
Closed

Reading lockfiles if specified in the runcard#818
siranipour wants to merge 2 commits intomasterfrom
correcting_lockfiles

Conversation

@siranipour
Copy link
Contributor

I think the first time I did this i forgot to realise that if you specified the lockfile in the runcard then you should load in the rules in the lockfile

@siranipour siranipour requested review from Zaharid and voisey June 30, 2020 11:46
@Zaharid
Copy link
Contributor

Zaharid commented Jul 1, 2020

Shouldn't we get the rules in the lockfile when doing something like this?

pdf: NNPDF31_nlo_as_0118
theoryid: 52
use_cuts: internal

dataset_input:
    dataset: ATLASTTBARTOT

actions_:
    - plot_fancy

@voisey
Copy link
Contributor

voisey commented Jul 1, 2020

Intuitively use_cuts: internal makes me think you should have to specify the cuts in the runcard

@siranipour
Copy link
Contributor Author

Because that runcard doesn't explicitly override the rules, the rule reading is done by

def default_filter_rules_input():

Which isn't written into the lockfile under the current implementation right?

@Zaharid
Copy link
Contributor

Zaharid commented Jul 1, 2020

I guess I am saying it should be written in the runcard. The idea is that we can change the rules that are hardcoded in the source (so the "internal" cuts) while having a way to reproduce the runcard as it was when ran.

@siranipour
Copy link
Contributor Author

The thing is if default_filter_rules is not in the runcard, then it isn't parse'd.

If it's not parsed, then the configparser.record_from_defaults wrapper is never called.

So the result of load_defaults_{spec} is never appended to the runcard.

and thereby the whole lockfiles tool chain is avoided!! We skip straight to the production rule which doesn't handle any lockfile writing... Honestly, I think the lockfiles mechanism needs some work... It doesn't seem possible to do what we're trying to do with the current implementation.

@siranipour siranipour marked this pull request as draft July 24, 2020 13:49
@siranipour siranipour marked this pull request as ready for review January 19, 2021 10:55
@siranipour siranipour requested a review from wilsonmr January 19, 2021 10:55
@siranipour siranipour force-pushed the correcting_lockfiles branch from 8649edb to fe48936 Compare January 19, 2021 10:57
@siranipour
Copy link
Contributor Author

This needs to be merged for lockfiles to work correctly

Copy link
Contributor

@wilsonmr wilsonmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than not recording the defaults if we don't specify a lockfile, looks like it's doing what we want!

Comment on lines +1073 to 1077
elif default_filter_rules is not None:
# For when a rules spec is specified
filter_rules = self.load_default_default_filter_rules(default_filter_rules)
else:
filter_rules = default_filter_rules_input()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right in thinking that these filter rules in validphys.cuts.filters.yaml are the exact same as in validphys.cuts.lockfiles.31_filters.lock.yaml?

if so I really think this block should be (please double check indents are correct, editing on github is horrible):

Suggested change
elif default_filter_rules is not None:
# For when a rules spec is specified
filter_rules = self.load_default_default_filter_rules(default_filter_rules)
else:
filter_rules = default_filter_rules_input()
else:
if default_filter_rules is None:
# no defaults file specified, use <whatever>
default_filter_rules = self.parse_default_filter_rules("<whatever>")
# save loaded defaults to lockfile.
filter_rules = self.load_default_default_filter_rules(default_filter_rules)

If the defaults are not the same then make a legacy defaults lockfile and read that instead. That way if the defaults ever get changed then the lockfile still contains a record of what they were at the time.Also in the future we can change the default to be 40 or whatever we have come up with by then and old lockfiles still remain valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean validphys.cuts.filters.yaml at the moment is identical to validphys.cuts.lockfiles.31_filters.lock.yaml, but it need not be in general.

The way I thought it was meant to be was that we have a copy of the 3.1 rules, but the current iteration of the rules is always in validphys.cuts.filters.yaml and we make a new lock file at every release, e.g 4.0 or 4.1 etc...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? I did a hasty print and they looked slightly different but I didn't bother working out how (the prints were different lengths).

I don't see the advantage to have some defaults loaded outside of the lockfile system when you have implemented the lockfile system. If I run a report where I use cuts internal but don't specify the set of defaults to use then it falls back on this. But those defaults never get saved to the lockfile for that particular report. So in the future if anybody ever edits validphys.cuts.filters.yaml then the lockfile for my report is invalidated.

Also if you use lockfiles to load the fallback option then you can even change what the fallback option is (now it's 3.1 maybe but in the future could be something else) but all lockfiles will still tell you exactly what set of rules were used at the time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes they're actually different because some datasets were being added as we moved towards 4.0.

Okay I'll commit this suggestion, my heads a bit of a spin with the amount of defaults im seeing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If validphys.cuts.filters.yaml has been edited whilst also being the fallback default then do you see how that demonstrates why I think it should be a lockfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. Where should we specify the fallback default then? Some sort of global variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when would validphys.cuts.filters.yaml ever be used then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh this quickly gets messy and confusing and my suggestion above still isn't the right way to do things. I think the best demonstration of how to do the fallback is with the data_grouping mechanism. That's all handled in its own production rule but you can apply the same ordering and logic in this function.

  1. So first you check if filter_rules is None, if not you can use that (which is already done)
  2. Else Check if default_filter_rules is None, if it is then the user didn't specify it and we must set it such that the fallback defaults is loaded, I'm never good with names but probably that should be called current or something better - see the note on this at the end of the next point though
  3. Check if default_filter_rules_recorded_spec_ is None, if not then we are running from a lockfile and we return the defaults recorded at the time. The point is that we need to have a value for default_filter_rules because we use: filter_rules = default_filter_rules_recorded_spec_[default_filter_rules] if the lockfile was produced from a fallback then we need to set default_filter_rules to be the fallback key. Also this highlights that the fallback key must never change or else old lockfiles will be broken, in that sense I don't think the value of the fallback default_filter_rules should be global, and perhaps there should be a comment next to it saying don't change this.
  4. Else we load the defaults as per filter_rules = self.load_default_default_filter_rules(default_filter_rules) which ensures they get saved to the lockfile.

Just to reiterate, the defaults associated with the fallback value of default_filter_rules can be changed (i.e the contents of the yaml file) however the actual value must never change i.e if default_filter_rules falls back to current now then it basically always has to. In the same way as the data grouping must now always fallback to standard_report

A minor comment is that the files from which the defaults are being loaded from i.e validphys.cuts.lockfiles.31_filters.lock.yaml are not lockfiles, they are just the defaults. The lockfiles are where those defaults get saved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validphys.cuts.filters.yaml would be moved to validphys.cuts.lockfiles.<whatever the fallback default_filter_rules value is>_filters.lock.yaml and would get loaded whenever default_filter_rules is not specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1162 to 1168
elif default_filter_settings is not None:
# If the user requests to read from a pre existing settings lockfile
filter_defaults = self.load_default_default_filter_settings(default_filter_settings)
defaults_loaded = True
elif not filter_defaults:
filter_defaults = default_filter_settings_input()
defaults_loaded = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise. Let's keep a record of all the defaults.

try:
theory_parameters = theoryid.get_description()
except AttributeError:
raise ConfigError("Missing theoryid for processing rules")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this supposed to fire?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: from validphys.api import API

In [2]: API.rules(use_cuts="internal")
...

ConfigError: Missing theoryid for processing rules

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite follow on the discussion and don't see why theoryid shouldn't be required rather than optional. Also the test really should be theoryid is None which is way more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either theoryid should be optional or rules should be later on. If I have use_cuts: fromfit or use_cuts: nocuts I shouldn't have to supply a theoryid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since this function already returns None when cuts are not interal I thought the easiest change is to just move when we get the theory description and only require it in the case that cuts are internal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it seems much more reasonable to make rules optional wherever it is actually optional, no?

Copy link
Contributor

@wilsonmr wilsonmr Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps yes, that would be much easier if we had the part of #1057 which improves massively the way dataset and experiment (EDIT: and data) are produced since they both take rules as well as the cuts production rule.

@siranipour siranipour force-pushed the correcting_lockfiles branch from ee39132 to 9105785 Compare February 9, 2021 15:16
@Zaharid
Copy link
Contributor

Zaharid commented Feb 17, 2021

(I had started the message by saying that this is all overcomplicated and I am changing it to say I am going to propose something even more complicated now). Mostly thinking aloud so far.

First of all we should decide what do we want the semantics to be. Do we want that the "NNPDF40" cuts are completely immutable, so we may end up writing NNPDF40_final_final_v2_submission_fit_2023-01_4 to specify the cuts or do we want the NNPDF40 label to allow for bugfixes. Seems to me the later. E.g. I may want to write explicitly NNPDF31 and trust that those are the best bugfixed cuts for the "epoch" rather than having to go and check what the final tag is (see also #224).

With that in mind, let us say that there are certain keys that are "defaultable". These can involve relatively complex settings (filter rules, default cfactors, variants and the like) or simler settings (the default family for the cuts "NNPDF40").

We would like to not write explicitly those choices to the runcard but have then in the lockfile so we do not need to relt that they stay immutable in the code.

E.g. we don't really want to change what you need to write in the runcard every time we find a minor bug in some dataset, yet we want to be able to tell whether a given report was produced with the bug or not (ideally in an automated way, but let`s go step by step).

It seems to me a lot of the complication comes from the fact that we now have "keyed" defaults. Indeed Michael point 3 is where I start getting confused.

We declare that a defaultable can take no parameters whatsoever conceptually. In the context of filters, let's call that default_filter_rules. Now conceptually that is going to correspond to all defaults recorded anywhere. So it would be e.g. a dict of the form {"NNPDF31": <rueles for 3.1>, "NNPDF40": <rules for 4.0>} and so on.
The whole of the dictionary. Or even better

{"NNPDF31": [<rules for 3.1 version 1 >, ..., <rules for 3.1 version last >  ], "NNPDF40": [<rules for 4.0 version 1>, ... , <rules for 4.0 version last>]}

The there is default_filter_rules_key or some such which is the same but for the string NNPDF40.

The logic is then: If that key, default_filter_rules_recorded_spec_, is present in the runcard then we read the data from there. If not we call a function that reads them from the code, and another function that writes them to the lockfile. The last part being the only one that needs any sort of support from reportengine. The

So in the end we have something like:

def produce_filter_rules(explicit_filter_rules=None, defaul_filter_rules=None, filter_epoch=None, filter_version=-1):
    if explicit_filter_rules is not None:
        return explicit_filter_rules
    return default_filter_rules[filter_epoch][filter_version]

def produce_default_filter_rules(self, default_filter_rules_recorded_spec_=None):
   if default_filter_rules_recorded_spec_ is not None:
        return default_filter_rules_recorded_spec_
    # return  load from vp and write to lockfile

def filter_epoch(self, filter_epoch_recorded_spec_=None):
    if filter_epoch_recorded_spec_ is not None:
        return filter_epoch_recorded_spec_
    # "return  'NNPDF40', maybe loading it from some defaults fire and write to lockfile "


PS: This is indeed confusing. Writing this comment took me more than an hour and still not sure I got it right.

@Zaharid
Copy link
Contributor

Zaharid commented Feb 17, 2021

I think in the example above one can get away without defaults for filter_version because all the specification gets written explicitly. But I had to think about it.

@wilsonmr
Copy link
Contributor

wilsonmr commented Feb 17, 2021

hmm ok so (I think) you're suggesting that the defaults get held in a single mappable and absolutely all of them get dumped to lockfile everytime. Then I think we would no longer require load_default_* functions to be a special function and instead would want to tell the record_defaults what the defaults mappable was (either an explicit dictionary or a from_yaml=<path>) I think in total this would greatly reduce the verbosity of the lockfile system!

Then I think I agree with everything.

My only question (which I was trying to address in 3. with the fallback is when you do

default_filter_rules[filter_epoch]

and if filter_epoch=None then what would be the mechanism for choosing which epoch we want? My suggestion is that in the default_filter_rules there is always a latest default which may get mutated over time but the key remains the same and then your lockfile will always produce the same results even if latest changed (because those defaults are in the lockfile) whereas if we were to say

if filter_epoch is None:
    filter_epoch = "NNPDF4.0" # for now

then that line would undoubtedly want to be changed in the future but bthat would invalidate the lockfile ran before. i.e consider this: it's the year 2030, NNPDF10.0 has just been released and Professor @siranipour is tired of the defaults still being for "NNPDF4.0" so changes it to

if filter_epoch is None:
    filter_epoch = "NNPDF10.0" # finally!

but then if you run on the lockfile which was produced before those defaults were added to default_filter_rules you'd get a key error, and if you ran on a lockfile produced after the defaults were added but implicitly used filter_epoch = "NNPDF4.0" then the results would change.

Finally every release you would get the settings for latest and semi-immortalise them under the key of that epoch and you accept that they may change, which doesn't matter as you point out, but probably change less frequently and is there purely as a convenience for when people want to run legacy fits or w/e

@Zaharid
Copy link
Contributor

Zaharid commented Feb 17, 2021

I was trying to address that with the functions below it, but didn't do it very well. The answer is that filter_epoch would be itself a "defaultable" thing that obeys the same rules.

@Zaharid
Copy link
Contributor

Zaharid commented Feb 17, 2021

Overall I think we made it a bit too general in reportengine and added a few levels of indirection than then we can't really understand ourselves (admittedly that was always with the hope that those things may be used as building blocks for more obvious approaches). I think is sensible to just declare defaults are stateless (also because anything else leads to headaches when considering the interaction with namespaces).

@siranipour siranipour marked this pull request as draft March 17, 2021 15:49
@siranipour siranipour closed this Mar 24, 2021
@scarrazza scarrazza deleted the correcting_lockfiles branch August 27, 2021 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants