Skip to content

RM Anova - 6/6 Nodes#15

Merged
ahmed-elghazi merged 6 commits intomainfrom
AE_RmANOVA
Mar 25, 2026
Merged

RM Anova - 6/6 Nodes#15
ahmed-elghazi merged 6 commits intomainfrom
AE_RmANOVA

Conversation

@ahmed-elghazi
Copy link
Copy Markdown
Collaborator

@ahmed-elghazi ahmed-elghazi commented Mar 16, 2026

Summary

Adds a Repeated Measures ANOVA node for KNIME for the one-factor repeated-measures case.

The node compares the same participants across multiple conditions or time points, supports both long and wide input layouts, and provides:

  • A Basic summary output for quick interpretation
  • An Advanced output with the full ANOVA breakdown, sphericity checks, and corrected results

Design decisions

Why statsmodels alone was not selected

statsmodels.stats.anova.AnovaRM was evaluated first, but its output is intentionally minimal. It returns only core fields (F, df, p-value). The KNIME node requires additional reporting fields such as:

  • Mauchly’s test of sphericity
  • Greenhouse–Geisser epsilon
  • Greenhouse–Geisser corrected p-value
  • Sum of squares / Mean squares
  • Partial eta squared
  • A fixed KNIME-friendly output schema

Why pingouin was not selected

pingouin introduced a dependency conflict within the knime-python-base 5.10.0 environment. Specifically, it requires pandas >= 2.1.1, whereas our stable environment pins pandas to 2.0.3. Upgrading pandas would break compatibility with existing nodes in the scientific stack.

Implementation approach

The node computes the missing statistics manually using NumPy and SciPy to maintain environment stability.
Main pieces:

  • Reshape/validate long and wide input
  • Compute ANOVA components (SS, MS, F)
  • Compute Mauchly’s W, Greenhouse–Geisser epsilon, and corrected p-values
  • Calculate partial eta squared effect size

Core formulas

For a one-way repeated-measures ANOVA with $n$ participants and $k$ conditions:

Degrees of freedom

$$df_{\text{factor}} = k - 1$$ $$df_{\text{error}} = (n - 1)(k - 1)$$

Mean squares

$$MS_{\text{factor}} = \frac{SS_{\text{factor}}}{df_{\text{factor}}}$$ $$MS_{\text{error}} = \frac{SS_{\text{error}}}{df_{\text{error}}}$$

F statistic

$$F = \frac{MS_{\text{factor}}}{MS_{\text{error}}}$$

Partial eta squared

$$\eta_p^2 = \frac{SS_{\text{factor}}}{SS_{\text{factor}} + SS_{\text{error}}}$$

Greenhouse–Geisser correction

If sphericity is violated, degrees of freedom are adjusted using $\epsilon_{GG}$:

$$df_{\text{factor}}^{\prime} = \epsilon_{GG} \cdot df_{\text{factor}}$$

$$df_{\text{error}}^{\prime} = \epsilon_{GG} \cdot df_{\text{error}}$$

The corrected p-value is then computed from the observed F statistic using the adjusted degrees of freedom.

Output behavior

Basic output

Concise summary: corrected p-value, effect size, and significance conclusion.

Advanced output

Full statistical breakdown including factor/error rows, Mauchly’s test values, and both uncorrected and corrected p-values.

Testing

Reused the pytest-based validation approach to verify:

  • Long/Wide format input
  • Significant vs. non-significant cases
  • Corrected p-value accuracy against R/SPSS benchmarks
  • Schema/field name consistency

Notes

This implementation maintains alignment with the KNIME-supported dependency set while delivering the diagnostics missing from the built-in statsmodels output.

@ahmed-elghazi ahmed-elghazi changed the title Ae rm anova RM Anova - 6/6 Nodes Mar 18, 2026
…de all metrics desird, and pingouin required pandas >= 2.1.1, incompatible with KNIME bundled pandas 2.03. Upgrade led to statsmodel breaking
@ahmed-elghazi ahmed-elghazi merged commit 7f6b6ad into main Mar 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants