Enhancement/validations on update#541
Merged
nasaul merged 13 commits intoNixtla:mainfrom Jan 19, 2026
Merged
Conversation
Contributor
Author
nasaul
requested changes
Jan 5, 2026
Contributor
nasaul
left a comment
There was a problem hiding this comment.
This is a good start for this issue, however it requires some more thinking about what validation is actually being done in the function.
- Should we validate on the aggregate level or on a individual series approach? If we are doing on an individual series we should be using something like:
for uid in df[self.id_col].unique():
if uid in self.uids:
expected_start = self.last_dates[uid] + offset(self.freq)
actual_start = df[df[self.id_col] == uid][self.time_col].min()
if actual_start != expected_start:
raise ValueError(f"Series {uid} starts at {actual_start}, expected {expected_start}")- I think that we should focus right on the tests and build up functionality from there, the test should include:
- Valid continuous updates
- Invalid gaps in data
- Invalid starting dates
- New series
- Different frequencies
- Both pandas and polars DataFrames
Feel free to discuss if the proposed test are good enough or if we should focus on other validations also.
nasaul
reviewed
Jan 14, 2026
Contributor
nasaul
left a comment
There was a problem hiding this comment.
Overall looks good, however in order to merge you should address the following:
- Polars categorical encoding mismatch (core.py) - All 5 Polars tests fail. The join operation fails because the categorical columns have different encodings. Needs string casting before join, like the pandas branch does.
- Type hint is wrong (core.py) - Says pd.DataFrame but should be DataFrame since it handles both pandas and polars
- Add docstring for
validate_inputparameter in both both forecast.py and core.py
…m/janrth/mlforecast into enhancement/validations_on_update t pull
nasaul
reviewed
Jan 16, 2026
Contributor
nasaul
left a comment
There was a problem hiding this comment.
Hey Jan, I've actually updated the tests in order to capture different frequencies and the solution doesn't hold. We have to use the offset from utilsforecast ufp.offset_times in order to make it work.
…ate update function
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tries to solve #358
Validates that the update df has the expected shape so that each unique_id starts from the last ds as seen in the previous df and contains the expected number of ds.
For each unique_id the number of ds date points are counted from the observed update df and this is then compared to the expected number of date points, which is calculated by the estimated start and end date. The estimated start date is observed based on the stored series last date + offset(freq).
There is an option to turn off the validate_input step. While overall the performance is pretty fast, it might be a bit annoying if one has hundreds of millions of rows.
Initially I started just checking if the first ds of the update is in the future for each unique_id, but then I felt this is not checking much really and started to implement a stronger logic. The issue itself is a bit vague and I am open for any changes as I implemented based on my interpretation of the task.
Description
Tries to implement more checks on update df
Checklist: