RM Anova - 6/6 Nodes by ahmed-elghazi · Pull Request #15 · utdal/knime-utd-statistics

ahmed-elghazi · 2026-03-16T06:34:21Z

Summary

Adds a Repeated Measures ANOVA node for KNIME for the one-factor repeated-measures case.

The node compares the same participants across multiple conditions or time points, supports both long and wide input layouts, and provides:

A Basic summary output for quick interpretation
An Advanced output with the full ANOVA breakdown, sphericity checks, and corrected results

Design decisions

Why statsmodels alone was not selected

statsmodels.stats.anova.AnovaRM was evaluated first, but its output is intentionally minimal. It returns only core fields (F, df, p-value). The KNIME node requires additional reporting fields such as:

Mauchly’s test of sphericity
Greenhouse–Geisser epsilon
Greenhouse–Geisser corrected p-value
Sum of squares / Mean squares
Partial eta squared
A fixed KNIME-friendly output schema

Why pingouin was not selected

pingouin introduced a dependency conflict within the knime-python-base 5.10.0 environment. Specifically, it requires pandas >= 2.1.1, whereas our stable environment pins pandas to 2.0.3. Upgrading pandas would break compatibility with existing nodes in the scientific stack.

Implementation approach

The node computes the missing statistics manually using NumPy and SciPy to maintain environment stability.
Main pieces:

Reshape/validate long and wide input
Compute ANOVA components (SS, MS, F)
Compute Mauchly’s W, Greenhouse–Geisser epsilon, and corrected p-values
Calculate partial eta squared effect size

Core formulas

For a one-way repeated-measures ANOVA with $n$ participants and $k$ conditions:

Degrees of freedom

$$df_{\text{factor}} = k - 1$$ $$df_{\text{error}} = (n - 1)(k - 1)$$

Mean squares

$$MS_{\text{factor}} = \frac{SS_{\text{factor}}}{df_{\text{factor}}}$$ $$MS_{\text{error}} = \frac{SS_{\text{error}}}{df_{\text{error}}}$$

F statistic

$$F = \frac{MS_{\text{factor}}}{MS_{\text{error}}}$$

Partial eta squared

$$\eta_p^2 = \frac{SS_{\text{factor}}}{SS_{\text{factor}} + SS_{\text{error}}}$$

Greenhouse–Geisser correction

If sphericity is violated, degrees of freedom are adjusted using $\epsilon_{GG}$:

$$df_{\text{factor}}^{\prime} = \epsilon_{GG} \cdot df_{\text{factor}}$$

$$df_{\text{error}}^{\prime} = \epsilon_{GG} \cdot df_{\text{error}}$$

The corrected p-value is then computed from the observed F statistic using the adjusted degrees of freedom.

Output behavior

Basic output

Concise summary: corrected p-value, effect size, and significance conclusion.

Advanced output

Full statistical breakdown including factor/error rows, Mauchly’s test values, and both uncorrected and corrected p-values.

Testing

Reused the pytest-based validation approach to verify:

Long/Wide format input
Significant vs. non-significant cases
Corrected p-value accuracy against R/SPSS benchmarks
Schema/field name consistency

Notes

This implementation maintains alignment with the KNIME-supported dependency set while delivering the diagnostics missing from the built-in statsmodels output.

…lows + pytests

…de all metrics desird, and pingouin required pandas >= 2.1.1, incompatible with KNIME bundled pandas 2.03. Upgrade led to statsmodel breaking

…y and difference is negligible.

ahmed-elghazi assigned dsaam94 and Copilot Mar 16, 2026

ahmed-elghazi changed the title ~~Ae rm anova~~ RM Anova - 6/6 Nodes Mar 18, 2026

ahmed-elghazi added 4 commits March 24, 2026 21:10

Starter code, still need to update the README + include testing workf…

d3b2354

…lows + pytests

Lint fix + build fix. Use manual math since statsmodel does not provi…

6f579bc

…de all metrics desird, and pingouin required pandas >= 2.1.1, incompatible with KNIME bundled pandas 2.03. Upgrade led to statsmodel breaking

Formatting changes, renaming schema, tested against statsmodel locall…

d5c4045

…y and difference is negligible.

Removed wide format, rewrote description

3b1f271

ahmed-elghazi force-pushed the AE_RmANOVA branch from fe8be05 to 3b1f271 Compare March 25, 2026 02:14

ahmed-elghazi added 2 commits March 24, 2026 21:17

chore: regenerate pixi.lock to fix duplicate entry error

24f585a

fixed CI build issue by pinning knime-extension-bundling to >= 5.10

e1e0353

ahmed-elghazi merged commit 7f6b6ad into main Mar 25, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RM Anova - 6/6 Nodes#15

RM Anova - 6/6 Nodes#15
ahmed-elghazi merged 6 commits intomainfrom
AE_RmANOVA

ahmed-elghazi commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ahmed-elghazi commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Why statsmodels alone was not selected

Why pingouin was not selected

Implementation approach

Core formulas

Degrees of freedom

Mean squares

F statistic

Partial eta squared

Greenhouse–Geisser correction

Output behavior

Basic output

Advanced output

Testing

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahmed-elghazi commented Mar 16, 2026 •

edited

Loading