Replies: 1 comment
-
|
IMO it should be possible to have both ways, because otherwise there are more steps to annotate a run (i'd have to create workflow + run, even if i just want to annotate a single run). Some metadata for e.g. the execution of a script is better than none, and every extra step (in this case meaning creating a workflow as well as the run) adds to inertia. I see having re-usable workflows (prospective provenance) as the maximum goal, something only a fraction of all arcs will have or even need, while a reproducible run that tracks retrospective provenance (i.e. 'what happened in my hyper-specific script?') is something that has immediate benefits and lower barrier of entry. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
In the current ARC-specification, every
runMUST contain arun.cwlfile. This is necessary, as in CWL, this is the means to specify which CWL workflow should be executed for a given run.Now we have the situtation that there are in principle multiple ways to orchestrate and run command line tools and (CWL) workflows.
workflow.cwl. Then have therun.cwlonly as a kind of wrapper for thisworkflow.cwlrun.cwlHow Option 1 and 2 compare visually:
Direct Orchestration of a Script or Tool
flowchart TD subgraph "Option 1" tool[CLI Tool] workflow[workflow.cwl] run[run.cwl] end subgraph "Option 2" srun[run.cwl] script["Script (Capsule)"] end workflow --"orchestrates"--> tool run --"orchestrates"--> workflow srun --"orchestrates"--> scriptComplex Orchestration of Workflows
flowchart TD subgraph "Option 1" tool1[CLI Tool] workflow1[workflow.cwl] tool2[CLI Tool] workflow2[workflow.cwl] tworkflow[workflow.cwl] trun[run.cwl] end subgraph "Option 2" stool1[CLI Tool] sworkflow1[workflow.cwl] stool2[CLI Tool] sworkflow2[workflow.cwl] strun[run.cwl] end workflow1 --"orchestrates"--> tool1 workflow2 --"orchestrates"--> tool2 tworkflow --"orchestrates"--> workflow1 tworkflow --"orchestrates"--> workflow2 trun --"orchestrates"--> tworkflow sworkflow1 --"orchestrates"--> stool1 sworkflow2 --"orchestrates"--> stool2 strun --"orchestrates"--> sworkflow1 strun --"orchestrates"--> sworkflow2Creating a workflow increases reusability of the computational routine by a lot and I think this will be the most used case for generic tools like Proteomics Pipelines. Experts would create workflows and ship them to users.
On the other hand, some
cwlfiles will probably be far from generic. This is e.g. the case for scripts specifically created to evaluate the data of a specific experimental setup. In this case I'd say it would be okay to just put the script directly into the run folder (maybe as code capsule) and annotate its execution in therun.cwl. The reproducibility is still given through therun.cwl.My question would be now whether we should encourage users to make use of both options, i.e. allow orchestration directly in the run? I'd say both have their application.
@kMutagene @muehlhaus @caroott @floWetzels @chgarth @dnlbauer
Beta Was this translation helpful? Give feedback.
All reactions