-
Notifications
You must be signed in to change notification settings - Fork 297
Description
Describe the bug
Supplying a relative path to the data downloader lays a trap for tokenize_and_cache.py.
To Reproduce
Call jiant/scripts/download_data/runscript.py to download some task data. Use a relative --output_path such as experiment/tasks.
Download a model, including its tokenizer.
Call jiant/proj/main/tokenize_and_cache.py to preprocess the task data for the model. Use a relative --task_config_path such as experiment/tasks/configs/taskname_config.json. It will die:
FileNotFoundError: [Errno 2] No such file or directory: 'experiment/tasks/configs/experiment/tasks/data/taskname/train.jsonl'
Expected behavior
A clear and concise description of what you expected to happen.
tokenize_and_cache formulates the correct path experiment/tasks/data/taskname/train.jsonl.
Additional context
Giving an absolute path to the downloader allows tokenize_and_cache to formulate the correct path and produce correct outputs. Hand-patching absolute paths into experiment/tasks/configs/taskname_config.json after the downloader creates it, but before tokenize_and_cache uses it, appears to work too.
At a minimum, or while working on a better solution, stick a warning on all examples of using the downloader, including README.md and guides/tutorials/quick_start_main.md. For extra credit, stick it in the source of both download_data/runscript.py and tokenize_and_cache.py as a comment. But the ideal thing would be to patch tokenize_and_cache to handle relative paths correctly. Forcing the downloader to build absolute paths before writing the task config would be OK too.