MUST runs reference at least one workflow? #15

HLWeil · 2025-03-26T10:09:29Z

HLWeil
Mar 26, 2025
Maintainer

Disclaimer: With run and workflow I mean ARC runs and ARC workflows. If I mean other entities with the same name, I will prefix them as such, e.g. CWL workflow.

In the current ARC-specification, every run MUST contain a run.cwl file. This is necessary, as in CWL, this is the means to specify which CWL workflow should be executed for a given run.
Now we have the situtation that there are in principle multiple ways to orchestrate and run command line tools and (CWL) workflows.

Define a workflow which orchestrates one or many other command line tools or (CWL) workflows in a workflow.cwl. Then have the run.cwl only as a kind of wrapper for this workflow.cwl
Orchestrate tools and (CWL) workflows directly in the run.cwl

How Option 1 and 2 compare visually:

Direct Orchestration of a Script or Tool

flowchart TD

subgraph "Option 1"
    tool[CLI Tool]
    workflow[workflow.cwl]
    run[run.cwl]
end


subgraph "Option 2"
    srun[run.cwl]
    script["Script (Capsule)"]
end

workflow --"orchestrates"--> tool
run --"orchestrates"--> workflow

srun --"orchestrates"--> script

Complex Orchestration of Workflows

flowchart TD

subgraph "Option 1"
    tool1[CLI Tool]
    workflow1[workflow.cwl]
    tool2[CLI Tool]
    workflow2[workflow.cwl]
    tworkflow[workflow.cwl]
    trun[run.cwl]
end


subgraph "Option 2"
    stool1[CLI Tool]
    sworkflow1[workflow.cwl]
    stool2[CLI Tool]
    sworkflow2[workflow.cwl]
    strun[run.cwl]
end

workflow1 --"orchestrates"--> tool1
workflow2 --"orchestrates"--> tool2
tworkflow --"orchestrates"--> workflow1
tworkflow --"orchestrates"--> workflow2
trun --"orchestrates"--> tworkflow


sworkflow1 --"orchestrates"--> stool1
sworkflow2 --"orchestrates"--> stool2
strun --"orchestrates"--> sworkflow1
strun --"orchestrates"--> sworkflow2

Creating a workflow increases reusability of the computational routine by a lot and I think this will be the most used case for generic tools like Proteomics Pipelines. Experts would create workflows and ship them to users.
On the other hand, some cwl files will probably be far from generic. This is e.g. the case for scripts specifically created to evaluate the data of a specific experimental setup. In this case I'd say it would be okay to just put the script directly into the run folder (maybe as code capsule) and annotate its execution in the run.cwl. The reproducibility is still given through the run.cwl.

My question would be now whether we should encourage users to make use of both options, i.e. allow orchestration directly in the run? I'd say both have their application.

@kMutagene @muehlhaus @caroott @floWetzels @chgarth @dnlbauer

kMutagene · 2025-03-26T11:43:51Z

kMutagene
Mar 26, 2025
Maintainer

IMO it should be possible to have both ways, because otherwise there are more steps to annotate a run (i'd have to create workflow + run, even if i just want to annotate a single run).

Some metadata for e.g. the execution of a script is better than none, and every extra step (in this case meaning creating a workflow as well as the run) adds to inertia.

I see having re-usable workflows (prospective provenance) as the maximum goal, something only a fraction of all arcs will have or even need, while a reproducible run that tracks retrospective provenance (i.e. 'what happened in my hyper-specific script?') is something that has immediate benefits and lower barrier of entry.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataPLANT

MUST runs reference at least one workflow? #15

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

DataPLANT

MUST runs reference at least one workflow? #15

Uh oh!

HLWeil Mar 26, 2025 Maintainer

Direct Orchestration of a Script or Tool

Complex Orchestration of Workflows

Replies: 1 comment

Uh oh!

Uh oh!

kMutagene Mar 26, 2025 Maintainer

HLWeil
Mar 26, 2025
Maintainer

kMutagene
Mar 26, 2025
Maintainer