Skip to content

wsc get_train_examples doesn't match wsc data format #1245

@eritain

Description

@eritain

In jiant/jiant/tasks/lib/wsc.py, _create_examples() dies at the statement span1_idx=line["span1_index"] (line 177) with KeyError: 'span1_index' because it mismatches the structure of the JSON task data.

The statement should be span1_idx=line["target"]["span1_index"] and similarly for the next 3 statements.

To Reproduce

  1. Install jiant v2 from any recent commit (where wsc.py hasn't been touched since 3bd801c)
  2. I doubt this matters, but running Python version 3.8 in a recent linux on a 40 core, 80 thread skylake CPU with 384 GB of RAM and a VT100/16GB GPU.
  3. In the Python REPL,
from jiant.proj.simple import runscript as run
import jiant.scripts.download_data.runscript as downloader
downloader.download_data(["wsc"], "/home/rasmussen.63/wsc-speed/tasks")
args = run.RunConfiguration(
   run_name="wsc-speed",
   exp_dir="wsc-speed",
   data_dir="wsc-speed/tasks",
   model_type="roberta-base",
   tasks="wsc",
   train_batch_size=16,
   num_train_epochs=3
)
run.run_simple(args)

Watch stderr come back to you until it stops at

Tokenizing Task 'wsc' for phases 'train,val,test'
WSCTask
  [train]: /home/rasmussen.63/wsc-speed/tasks/data/wsc/train.jsonl
  [val]: /home/rasmussen.63/wsc-speed/tasks/data/wsc/val.jsonl
  [test]: /home/rasmussen.63/wsc-speed/tasks/data/wsc/test.jsonl
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rasmussen.63/jiant/jiant/proj/simple/runscript.py", line 148, in run_simple
    tokenize_and_cache.main(
  File "/home/rasmussen.63/jiant/jiant/proj/main/tokenize_and_cache.py", line 165, in main
    examples=task.get_train_examples(),
  File "/home/rasmussen.63/jiant/jiant/tasks/lib/wsc.py", line 160, in get_train_examples
    return self._create_examples(lines=read_json_lines(self.train_path), set_type="train")
  File "/home/rasmussen.63/jiant/jiant/tasks/lib/wsc.py", line 177, in _create_examples
    span1_idx=line["span1_index"],
KeyError: 'span1_index'

Expected behavior
WSCTask should initialize an Example from the downloaded data.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions