Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/components/NoWrapTable.astro
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@


<div style={{ overflowX: 'auto', whiteSpace: 'nowrap' }}>
<slot />
</div>
293 changes: 293 additions & 0 deletions src/content/docs/core-concepts/inputs-outputs.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
---
title: Linking inputs and outputs
lastUpdated: 2026-02-02
authors:
- dominik-brilhaus
sidebar:
order: 1
---

import { FileTree } from '@astrojs/starlight/components';
import { Tabs, TabItem } from '@astrojs/starlight/components';
import Mermaid from '@components/mdx/Mermaid.astro'
import NoWrapTable from "@components/NoWrapTable.astro"

A key objective of the ARC is to trace each finding or result back to its specific biological experiment. Achieving this requires linking dataset files to their corresponding individual samples. To accomplish this, we follow a sequence of processes with defined **inputs** and **outputs**.


## Example

Consider the example experiment from the [Start Here guide](/nfdi4plants.knowledgebase/start-here) where six *Arabidopsis thaliana* plants were exposed to cold stress, and the sugar content was measured as a response. The ARC structure for this experiment could look like this:

<FileTree>
- AthalianaColdStressSugar
- studies
- AthalianaColdStress
- protocols
- plant-sampling.md
- assays
- SugarContent
- dataset
- sugar_result.csv
- protocols
- sugar_extraction.md
- sugar_measurement.md
- isa.assay.xlsx
- README.md
- ...

</FileTree>

The ARC contains one study (`AthalianaColdStress`) and one assay (`SugarContent`). The study includes a protocol for plant sampling describing how the plants were grown and treated, while the assay contains protocols for sugar extraction and sugar measurement. The dataset file `sugar_result.csv` holds the measured sugar content data.

## Annotation tables describe processes

The following three annotation tables describe the three consecutive processes:
- Plant Sampling (part of the Study `AthalianaColdStress`),
- Sugar Extraction (part of the Assay `SugarContent`), and
- Sugar Measurement (part of the Assay `SugarContent`).

Each table starts with an `Input` column specifying the input entity (sample, material or data) for the respective process, followed by a `ProtocolREF` column indicating the protocol used, and ends with an `Output` column specifying the output entity resulting from the process.

The annotation tables (and effectively the studies and assays) are linked by reusing the respective identifiers of the `Input` and `Output` entities (samples, materials or dataset files) across the different processes – i.e. the `Output` of one process becomes the `Input` of the next process.


export const Highlight = ({ children }) => (
<span
style={{
backgroundColor: '#fff59d',
padding: '0 0.25rem',
borderRadius: '0.125rem',
}}
>
{children}
</span>
)

In this example we follow <Highlight>one line of highlighted samples</Highlight> through the processes:

<Tabs>

<TabItem label="1. Plant Sampling">

| `Input`[Source Name] | `ProtocolREF` | [...] | `Output`[Sample Name] |
| -------------------- | --------------------------- | ---- | -------------------- |
| <Highlight>Cold1</Highlight> | ./protocols/plant-sampling.md | ... | <Highlight>Cold1_leaf</Highlight> |
| Cold2 | ./protocols/plant-sampling.md | ... | Cold2_leaf |
| Cold3 | ./protocols/plant-sampling.md | ... | Cold3_leaf |
| RT1 | ./protocols/plant-sampling.md | ... | RT1_leaf |
| RT2 | ./protocols/plant-sampling.md | ... | RT2_leaf |
| RT3 | ./protocols/plant-sampling.md | ... | RT3_leaf |

</TabItem>

<TabItem label="2. Sugar Extraction">

| `Input`[Sample Name] | `ProtocolREF` | [...] | `Output`[Sample Name] |
| -------------------- | ----------------------------- | ---- | -------------------- |
| <Highlight>Cold1_leaf</Highlight> | ./protocols/sugar_extraction.md | ... | <Highlight>Cold1_sugar-ext</Highlight> |
| Cold2_leaf | ./protocols/sugar_extraction.md | ... | Cold2_sugar-ext |
| Cold3_leaf | ./protocols/sugar_extraction.md | ... | Cold3_sugar-ext |
| RT1_leaf | ./protocols/sugar_extraction.md | ... | RT1_sugar-ext |
| RT2_leaf | ./protocols/sugar_extraction.md | ... | RT2_sugar-ext |
| RT3_leaf | ./protocols/sugar_extraction.md | ... | RT3_sugar-ext |

</TabItem>

<TabItem label="3. Sugar Measurement">


<NoWrapTable>

| `Input` [Sample Name] | `ProtocolREF` | [...] | `Output` [Data] |
| --------------------- | ----------------------------- | --- | -------------------------------------------------- |
| <Highlight>Cold1_sugar-ext</Highlight> | ./protocols/sugar_measurement.md | ... | <Highlight>./assays/SugarMeasurement/dataset/sugar_result.csv</Highlight> |
| Cold2_sugar-ext | ./protocols/sugar_measurement.md | ... | ./assays/SugarMeasurement/dataset/sugar_result.csv |
| Cold3_sugar-ext | ./protocols/sugar_measurement.md | ... | ./assays/SugarMeasurement/dataset/sugar_result.csv |
| RT1_sugar-ext | ./protocols/sugar_measurement.md | ... | ./assays/SugarMeasurement/dataset/sugar_result.csv |
| RT2_sugar-ext | ./protocols/sugar_measurement.md | ... | ./assays/SugarMeasurement/dataset/sugar_result.csv |
| RT3_sugar-ext | ./protocols/sugar_measurement.md | ... | ./assays/SugarMeasurement/dataset/sugar_result.csv |

</NoWrapTable>

</TabItem>

</Tabs>

:::note
For simplicity, only the relevant columns to introduce the linking concept are shown here. The `[...]` indicates that there are additional columns in the actual tables, which would typically include other necessary metadata and annotations from the protocols.
:::


## Linking samples to data

Following the simple approach of reusing sample and data identifiers in different parts of the ARC, we were able to concisely link the samples through the different lab processes in studies and assays to the data produced from those samples.

The tables above contain all information visualized in the following flowchart to show how the study and assay processes are connected:

<Mermaid>
```mermaid
flowchart LR
linkStyle default stroke:#2d3e50,stroke-width:2px;
classDef studyStyle fill:#dae7c1,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
classDef assayStyle fill:#ffe080,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
classDef processStyle fill:#E08F9C,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef sampleStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef dataStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;

subgraph study1[Study:AthalianaColdStress]
s1[Plants] ---p1[plant-sampling]--> s2[Leaves]
end

subgraph assay1[Assay:SugarContent]
s2 ---p2[SugarExtraction]--> s3[Sugar extracts]
s3 ---p3[SugarMeasurement]--> d1@{ shape: doc, label: sugar_result.csv}
end
class study1 studyStyle;
class assay1 assayStyle;
class p1,p2,p3 processStyle;
class s1,s2,s3 sampleStyle;
class d1 dataStyle;
```

</Mermaid>

Zooming in on the sample level, we can follow the samples through the processes from their biological origin to the data:

<Mermaid>
```mermaid

%%{init: {
"flowchart": {
"nodeSpacing": 40,
"rankSpacing": 30
}
}}%%


flowchart LR
linkStyle default stroke:#2d3e50,stroke-width:2px;
classDef studyStyle fill:#dae7c1,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
classDef assayStyle fill:#ffe080,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
classDef processStyle fill:#E08F9C,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef sampleStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef dataStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef cold1 fill:#fff59d,color:#2d3e50,font-weight:bold;

subgraph study1["Study:AthalianaColdStress"]

subgraph p1[Plant Sampling]

subgraph in1[Plants]
is1a[Cold1]
is1b[Cold2]
is1c[Cold3]
is1d[RT1]
is1e[RT2]
is1f[RT3]
end

subgraph out1[Leaves]
os2a[Cold1_leaf]
os2b[Cold2_leaf]
os2c[Cold3_leaf]
os2d[RT1_leaf]
os2e[RT2_leaf]
os2f[RT3_leaf]
end

is1a --> os2a
is1b --> os2b
is1c --> os2c
is1d --> os2d
is1e --> os2e
is1f --> os2f

end
end

os2a[Cold1_leaf] --> is2a[Cold1_leaf]
os2b[Cold2_leaf] --> is2b[Cold2_leaf]
os2c[Cold3_leaf] --> is2c[Cold3_leaf]
os2d[RT1_leaf] --> is2d[RT1_leaf]
os2e[RT2_leaf] --> is2e[RT2_leaf]
os2f[RT3_leaf] --> is2f[RT3_leaf]

subgraph assay1[Assay: SugarContent]

subgraph p2[Sugar Extraction]

subgraph in2[Leaves]
is2a
is2b
is2c
is2d
is2e
is2f
end

subgraph out2[Sugar extracts]
os3a[Cold1_sugar-ext]
os3b[Cold2_sugar-ext]
os3c[Cold3_sugar-ext]
os3d[RT1_sugar-ext]
os3e[RT2_sugar-ext]
os3f[RT3_sugar-ext]
end

is2a --> os3a
is2b --> os3b
is2c --> os3c
is2d --> os3d
is2e --> os3e
is2f --> os3f

end

os3a --> is3a[Cold1_sugar-ext]
os3b --> is3b[Cold2_sugar-ext]
os3c --> is3c[Cold3_sugar-ext]
os3d --> is3d[RT1_sugar-ext]
os3e --> is3e[RT2_sugar-ext]
os3f --> is3f[RT3_sugar-ext]

subgraph p3[Sugar Measurement]

subgraph in3[Sugar extracts]
is3a
is3b
is3c
is3d
is3e
is3f
end

is3a --> d1@{ shape: doc, label: sugar_result.csv}
is3b --> d1
is3c --> d1
is3d --> d1
is3e --> d1
is3f --> d1

end

end

class study1 studyStyle;
class assay1 assayStyle;
class p1,p2,p3 processStyle;
class in1,in2,in3,out1,out2,out3 sampleStyle;
class d1 dataStyle;
class is1a,os2a,is2a,os3a,is3a cold1;
```

</Mermaid>


## Tracing back the data to its biological origin

Using this approach, we can trace back the dataset file to its specific biological origin. For example, the sugar content measurement for the sample `Cold1_sugar-ext` can be traced back through the processes to the original plant sample `Cold1`. Looking at it from this other perspective (i.e. starting from the data): all metadata enriched in the preceding annotation tables aid in contextualizing the data in `sugar_result.csv`, such as the protocols used, conditions applied, and sample origins. Hence, this linkage is crucial for understanding the context of the data and ensuring its reliability and reproducibility in scientific research.

:::note
In order to go a level deeper, and trace each **data point** in the `sugar_result.csv` to the sample, the ARC employs the concept of [fragment selectors and DataMAPs](/nfdi4plants.knowledgebase/arctrl/datamap).
:::
5 changes: 4 additions & 1 deletion src/content/docs/guides/review-arc.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,10 @@ For each study and assay in your ARC, check that:
To check the overall consistency of your ARC, make sure that the full connection from sample to raw data to processed data is there.
<Steps>
1. For every output, can you trace its origins back through the ARC? Is the provenance of all data fully described?
- A mermaid graph is a quick way to visualize this provenance, showing how the inputs and outputs from your studies and assays are connected. Learn more about the arcIsaProcessMermaid tool [here](https://www.nuget.org/packages/arcIsaProcessMermaid/#readme-body-tab)
:::tip
Check out [this article](/nfdi4plants.knowledgebase/core-concepts/inputs-outputs) on how inputs and outputs – specifically samples and dataset files – are linked in an ARC.
:::
2. A mermaid graph is a quick way to visualize this provenance, showing how the inputs and outputs from your studies and assays are connected. Learn more about the arcIsaProcessMermaid tool [here](https://www.nuget.org/packages/arcIsaProcessMermaid/#readme-body-tab)
- The `arc-summary.md` lists the files that are properly linked in the ARC. This is provided as a downloadable artifact by the CI/CD job "Create ARC json".

</Steps>
Expand Down