tokenize_and_cache cooks up wacky paths

**Describe the bug**
Supplying a relative path to the data downloader lays a trap for `tokenize_and_cache.py`. 

**To Reproduce**
Call `jiant/scripts/download_data/runscript.py` to download some task data. Use a relative `--output_path` such as `experiment/tasks`.

Download a model, including its tokenizer.

Call `jiant/proj/main/tokenize_and_cache.py` to preprocess the task data for the model. Use a relative `--task_config_path` such as `experiment/tasks/configs/taskname_config.json`. It will die: 

```
FileNotFoundError: [Errno 2] No such file or directory: 'experiment/tasks/configs/experiment/tasks/data/taskname/train.jsonl'
```

**Expected behavior**
A clear and concise description of what you expected to happen.

`tokenize_and_cache` formulates the correct path `experiment/tasks/data/taskname/train.jsonl`.

**Additional context**
Giving an absolute path to the downloader allows `tokenize_and_cache` to formulate the correct path and produce correct outputs. Hand-patching absolute paths into `experiment/tasks/configs/taskname_config.json` after the downloader creates it, but before `tokenize_and_cache` uses it, appears to work too.

At a minimum, or while working on a better solution, stick a warning on all examples of using the downloader, including `README.md` and `guides/tutorials/quick_start_main.md`. For extra credit, stick it in the source of both `download_data/runscript.py` and `tokenize_and_cache.py` as a comment. But the ideal thing would be to patch `tokenize_and_cache` to handle relative paths correctly. Forcing the downloader to build absolute paths before writing the task config would be OK too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenize_and_cache cooks up wacky paths #1281

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tokenize_and_cache cooks up wacky paths #1281

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions