Skip to content

Add Time-MQA TSQA dataset#101

Merged
WenjieDu merged 4 commits into
mainfrom
copilot/add-time-mqa-tsqa-dataset
Apr 9, 2026
Merged

Add Time-MQA TSQA dataset#101
WenjieDu merged 4 commits into
mainfrom
copilot/add-time-mqa-tsqa-dataset

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 10, 2026

Integrates the TSQA (Time Series Question Answering) pretraining dataset from the Time-MQA paper (ACL 2025) into TSDB. The dataset is hosted on HuggingFace at Time-MQA/TSQA.

Changes

  • tsdb/database.py — New "tsqa" entry using an hf:// URL prefix to distinguish HuggingFace-hosted datasets from direct file downloads
  • tsdb/utils/downloading.pydownload_and_extract() now short-circuits on hf:// URLs (creates the local directory, defers actual download to the loader)
  • tsdb/loading_funcs/tsqa.py — New loader using datasets.load_dataset("Time-MQA/TSQA", cache_dir=local_path); returns a dict keyed by split name (train, validation, test) with pandas DataFrames as values. The datasets package is an optional dependency with a clear ImportError if absent.
  • tsdb/loading_funcs/__init__.py / tsdb/data_processing.py — Wired up load_tsqa into the standard dispatch chain
  • dataset_profiles/tsqa/README.md — Citation metadata

Usage

import tsdb

# requires: pip install datasets
data = tsdb.load("tsqa")
# data.keys() → dict_keys(['train', 'test', ...])
# data['train'] → pandas.DataFrame

This adds the hf:// convention as a lightweight extension point for future HuggingFace-hosted datasets without requiring datasets as a base dependency.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • huggingface.co
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)
    • Triggering command: /usr/bin/curl curl -s REDACTED (dns block)
    • Triggering command: /usr/bin/curl curl -v REDACTED /home/REDACTED/work/TSDB/TSDB/dataset_profiles/solar_alabama/README.md /home/REDACTED/work/TSDB/TSDB/dataset_profiles/beijing_multisite_air_quality/README.md /home/REDACTED/work/TSDB/TSDB/README.md /home/REDACTED/work/TSDB/TSDB/.github/PULL_REQUEST_TEMPLATE.md /home/REDACTED/work/TSDB/TSDB/CODE_OF_CONDUCT.md (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Add Time-MQA TSQA dataset</issue_title>
<issue_description>### 1. Dataset description

TSQA from Time-MQA, as a pretraining dataset for time-series llm, should be included in TSDB

@inproceedings{kong2025timemqa,
title={Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement},
author={Kong, Yaxuan and Yang, Yiyuan and Hwang, Yoontae and Du, Wenjie and Zohren, Stefan and Wang, Zhangyang and Jin, Ming and Wen, Qingsong},
booktitle={Proceedings of the 63nd Annual Meeting of the Association for Computational Linguistics},
year={2025}
}

2. Check open-source status

  • The dataset is publicly available

3. Provide useful information for the implementation

The dataset is available at https://huggingface.co/datasets/Time-MQA/TSQA</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits March 10, 2026 17:40
Co-authored-by: WenjieDu <17807970+WenjieDu@users.noreply.github.com>
Co-authored-by: WenjieDu <17807970+WenjieDu@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Time-MQA TSQA dataset to TSDB Add Time-MQA TSQA dataset Mar 10, 2026
@WenjieDu WenjieDu marked this pull request as ready for review April 9, 2026 13:55
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 9, 2026

@WenjieDu WenjieDu merged commit 322ca6f into main Apr 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Time-MQA TSQA dataset

2 participants