Skip to content

Incorrect task_n_future_covariates Counting and Misordered Covariate Rows in validate_and_prepare_single_dict_task #345

@HarvestStars

Description

@HarvestStars

Describe the bug

There are two logic bugs in the function validate_and_prepare_single_dict_task, which can lead to incorrect covariate alignment and wrong task_n_future_covariates counting.

  1. Incorrect task_n_future_covariates counting
    The function currently sets:

    task_n_future_covariates = len(task_future_covariates_list)

    However, task_future_covariates_list is built by iterating through all past_covariates keys (both past-only and known-future).
    This makes task_n_future_covariates incorrectly equal to the total number of covariates, not the number of known-future covariates.
    The returned value thus disrupts the following function _construct_slice, and its task_n_past_only_covariates, which are for bacth-building when trainning.

  2. Covariate row order not guaranteed
    Even though the keys are sorted, the current logic does not ensure that “known-future” covariates appear as the last rows in task_future_covariates_tensor. Also mislead _construct_slice in

        task_future_covariates = task_past_tensor[
              -task_n_future_covariates:, slice_idx : slice_idx + self.prediction_length
          ]

Expected behavior

  • task_n_future_covariates should reflect only the number of known-future covariates (len(task_future_covariates_keys)).
  • Both tensors (task_context_tensor, task_future_covariates_tensor) should have consistent row ordering:
    • Past-only covariates first
    • Known-future covariates last

To reproduce

You can reproduce the issue using the following minimal example:

task = {
    "target": np.arange(10),
    "past_covariates": {
        "temp": np.arange(10),
        "holiday": np.array(["yes", "no", "no", "yes", "no", "yes", "no", "no", "no", "yes"]),
    },
    "future_covariates": {
        "holiday": np.array(["no", "yes", "yes", "no"]),
    },
}

out = validate_and_prepare_single_dict_task(task, idx=0, prediction_length=4)
print(out[-1])  # task_n_future_covariates

Expected result: 1 (only holiday has future values)
Current result: 2 (counts both temp and holiday)


Proposed fix

This PR(#344) separates past-only and future-known covariates explicitly, and fixes the counting logic:

Code diff
# Separate keys to ensure correct ordering
past_only_keys = [k for k in task_past_covariates_keys_all if k not in task_future_covariates_keys]
ordered_covariate_keys = past_only_keys + task_future_covariates_keys

# Build tensors in the same order
for key in ordered_covariate_keys:
    ...

# Correct counting logic
task_n_future_covariates = len(task_future_covariates_keys)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions