Skip to content

COMP: Use itk::Math::MatrixExponential instead of removed vnl_matrix_exp#1452

Open
hjmjohnson wants to merge 1 commit into
SuperElastix:mainfrom
hjmjohnson:migrate-to-itk-matrix-exponential
Open

COMP: Use itk::Math::MatrixExponential instead of removed vnl_matrix_exp#1452
hjmjohnson wants to merge 1 commit into
SuperElastix:mainfrom
hjmjohnson:migrate-to-itk-matrix-exponential

Conversation

@hjmjohnson

Copy link
Copy Markdown

VXL removed core/vnl/vnl_matrix_exp, which AffineLogTransform included via <vnl/vnl_matrix_exp.h> (InsightSoftwareConsortium/ITK#6452). This migrates the two call sites to itk::Math::MatrixExponential, an ITK-supported Eigen-backed replacement.

Depends on InsightSoftwareConsortium/ITK#6454 (adds itk::Math::MatrixExponential / itkMatrixExponential.h). Hold merge until an ITK containing it is required.

Change
-#include <vnl/vnl_matrix_exp.h>             →  #include "itkMatrixExponential.h"
-vnl_matrix_exp(... .GetVnlMatrix())         →  itk::Math::MatrixExponential(...)
-vnl_matrix_exp(A_bar)                        →  itk::Math::MatrixExponential(A_bar)

Call-site argument types (vnl_matrix_fixed, vnl_matrix) are unchanged; the new function provides matching overloads. Both AffineLogTransform and AffineLogStackTransform components rebuild clean against the new header.

@hjmjohnson hjmjohnson force-pushed the migrate-to-itk-matrix-exponential branch from 2e19386 to c33e80a Compare June 16, 2026 22:04
@hjmjohnson

Copy link
Copy Markdown
Author

Performance impact of migrating AffineLogTransform from vnl_matrix_exp to itk::Math::MatrixExponential. Freshly benchmarked on this transform's two call patterns (-O3 -DNDEBUG, best-of-11 min ns/call). Net: ~1.7–2× faster matrix-exp work per parameter-update, more accurate, results unchanged in practice.

Per-call timing + net AffineLogTransform impact
operation vnl_matrix_exp itk (Eigen) Eigen vs vnl
log-domain 2×2 (SetParameters) ~99 ns ~410 ns 4.1× slower
log-domain 3×3 (SetParameters) ~150 ns ~510 ns 3.4× slower
A_bar 4×4 (Jacobian loop) ~1349 ns ~695 ns 1.95× faster
A_bar 6×6 (Jacobian loop) ~2681 ns ~1333 ns 2.0× faster

Per SetParameters (1 log-domain exp + d² augmented A_bar exps in PrecomputeJacobianOfSpatialJacobian):

old (vnl) new (Eigen) net
2D 95 + 4·1331 = 5419 ns 405 + 4·692 = 3173 ns ~1.7× faster
3D 149 + 9·2676 = 24233 ns 503 + 9·1315 = 12338 ns ~2.0× faster

The d² A_bar Jacobian loop dominates and Eigen wins there, so net it's faster despite the single small log-domain exp being slower. Matrix-exp is a negligible slice of full registration wall-time, so end-to-end timing is effectively unchanged.

Numerical equivalence + accuracy

For near-identity log-domain matrices the absolute output difference between the two methods is ~1e-12 — far below any registration tolerance, so registration results are unchanged. Eigen (Higham scaling-and-squaring) is also ~2–3 orders of magnitude more accurate than vnl's Taylor series on the exp(A)·exp(−A)=I identity at every norm, and stays robust where the Taylor series degrades for large norms.

This PR depends on itk::Math::MatrixExponential landing in ITK (InsightSoftwareConsortium/ITK#6454).

@N-Dekker

Copy link
Copy Markdown
Member

Thanks Hans. Would this compile for both ITK 5 and 6, when we add something like #if ITK >= 6 and put the original code in the #else?

@hjmjohnson hjmjohnson force-pushed the migrate-to-itk-matrix-exponential branch from c33e80a to 5aeef7f Compare June 19, 2026 02:15
@hjmjohnson

Copy link
Copy Markdown
Author

@N-Dekker Good catch — yes, and a guard is necessary: elastix's minimum is ITK 5.4.1 (find_package(ITK 5.4.1)), which has vnl_matrix_exp.h but not itkMatrixExponential.h, so the single-path version broke the minimum-version build. Added in 5aeef7f.

Why __has_include rather than #if ITK_VERSION_MAJOR >= 6

InsightSoftwareConsortium/ITK#6454 (which adds itk::Math::MatrixExponential) has no milestone yet, so the first tagged 6.x release containing it is unknown — keying on the header's presence is robust to that. #6454 also restored the (now-deprecated) vnl_matrix_exp shim, so current ITK main takes the new Eigen path while ITK 5.4.x falls back to vnl.

#ifndef ELX_HAS_ITK_MATRIX_EXPONENTIAL
#  if __has_include("itkMatrixExponential.h")
#    define ELX_HAS_ITK_MATRIX_EXPONENTIAL 1
#  else
#    define ELX_HAS_ITK_MATRIX_EXPONENTIAL 0
#  endif
#endif

Verified both branches compile locally: the Eigen path builds the full elastix/transformix against ITK main; the vnl fallback (forced via -DELX_HAS_ITK_MATRIX_EXPONENTIAL=0) compiles clean against the restored shim.

@hjmjohnson hjmjohnson marked this pull request as ready for review June 19, 2026 02:20
Comment thread Components/Transforms/AffineLogTransform/itkAffineLogTransform.hxx Outdated

@N-Dekker N-Dekker left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved unconditionally, but feel free to comment on my question about the initial #ifndef ELX_HAS_ITK_MATRIX_EXPONENTIAL 😃

@N-Dekker

Copy link
Copy Markdown
Member

@hjmjohnson Maybe we should still check if it won't break our regression tests, as it might yield slightly different result, right? But for now, I guess we have no choice, because without this PR, elastix + ITK6 won't build anymore 🤷

@N-Dekker

Copy link
Copy Markdown
Member

Performance impact of migrating AffineLogTransform from vnl_matrix_exp to itk::Math::MatrixExponential. Freshly benchmarked on this transform's two call patterns (-O3 -DNDEBUG, best-of-11 min ns/call). Net: ~1.7–2× faster matrix-exp work per parameter-update, more accurate, results unchanged in practice.

@hjmjohnson This sounds very promising. Can you possibly share the benchmark code?

@hjmjohnson

Copy link
Copy Markdown
Author

Verified this change against an ITK-v6-main downstream build forest: the elastix test suite (163 tests) is behaviorally identical to elastix main with vnl_matrix_exp and to the ITK-5.4 baseline — 161/163 pass, the same 2 pre-existing failures, identical runtime. The itk::Math::MatrixExponential switch costs nothing in test behavior and is the only variant that survives ITK_FUTURE_LEGACY_REMOVE=ON.

3-way test comparison (163 tests each)
Configuration ITK elastix source matrix-exp path Build Passed Failed
1 release-5.4 main vnl/vnl_matrix_exp.h (native VXL) clean 161/163 #24, #163
2 v6 main main vnl/vnl_matrix_exp.h (ITK shim → Eigen) clean 161/163 #24, #163
3 v6 main this PR itk::Math::MatrixExponential (Eigen, direct) clean 161/163 #24, #163

The same two tests fail in all three configurations (both pre-existing, ITK-version-independent):

  • #24 elastix_run_example_COMPARE_IM — baseline-image tolerance mismatch (9993 pixels differ > 5, 50 allowed)
  • #163 TransformixTest

Runtimes were ~23.4–23.6 s in all three runs.

What this establishes
  1. ITK v6 does not regress elastix — configs 1 vs 2 are equivalent; the ITKv6 vnl_matrix_exp.h compatibility shim (Eigen-backed MatrixExponential) is a drop-in for native VXL vnl_matrix_exp.
  2. This PR is behaviorally equivalent — configs 2 vs 3 show that switching AffineLogTransform from vnl_matrix_exp to itk::Math::MatrixExponential produces no test differences (both resolve to the same Eigen Higham scaling-and-squaring under ITKv6).
  3. Forward-compatibility — configs 1 and 2 (#include <vnl/vnl_matrix_exp.h>) would #error under ITK_FUTURE_LEGACY_REMOVE=ON; only this PR builds in that configuration.

VXL removed core/vnl/vnl_matrix_exp.h, breaking AffineLogTransform's
include of <vnl/vnl_matrix_exp.h> (InsightSoftwareConsortium/ITK#6452).
Switch to itk::Math::MatrixExponential (itkMatrixExponential.h), an
ITK-supported Eigen-backed replacement using Higham scaling-and-squaring.

elastix's minimum is ITK 5.4.1, which lacks itkMatrixExponential.h, so
guard the include and both call sites on __has_include and fall back to
vnl_matrix_exp for older ITK. ITK#6454 adds the function and restores the
vnl_matrix_exp shim. Call-site argument types (vnl_matrix_fixed,
vnl_matrix) are unchanged.
@hjmjohnson hjmjohnson force-pushed the migrate-to-itk-matrix-exponential branch from 5aeef7f to df58ef4 Compare June 19, 2026 17:17
@hjmjohnson

Copy link
Copy Markdown
Author

@hjmjohnson This sounds very promising. Can you possibly share the benchmark code?

Happy to, @N-Dekker. The full performance and accuracy write-up (with the per-size timing table and the exp(A)·exp(−A)=I accuracy comparison) lives on the ITK side, where itk::Math::MatrixExponential was added: InsightSoftwareConsortium/ITK#6454performance comment.

The code is below. Two files: a consolidated timing harness (vnl vs Eigen across the sizes AffineLogTransform exercises) and the correctness/equivalence check that uses the exact call types from this PR.

One honesty note on reading the numbers: the published table's small (2×2/3×3 log-domain) rows used fixed-size types (itk::Matrix / vnl_matrix_fixed), where vnl's compile-time path is very fast; the harness below uses dynamic vnl_matrix uniformly for brevity, so its small-size absolute ns differ (the per-call heap alloc inflates both sides). The conclusion is unchanged: Eigen wins on the larger augmented-A_bar matrices that dominate PrecomputeJacobianOfSpatialJacobian, so the net per-SetParameters cost drops. Absolute ns are machine-dependent (these were Apple Silicon, -O3 -DNDEBUG).

Timing harness — vnl_matrix_exp vs itk::Math::MatrixExponential (best-of-11)
// Build: c++ -O3 -DNDEBUG -std=c++17 <ITK + VNL include/link flags> bench_matrix_exp.cxx
#include "itkMatrixExponential.h"
#include <vnl/vnl_matrix.h>
#include <vnl/vnl_matrix_exp.h>
#include <algorithm>
#include <chrono>
#include <cstdio>
#include <random>
#include <vector>

using clk = std::chrono::steady_clock;

template <typename F>
double best_of(F && f, int reps = 11)
{
  double best = 1e300;
  for (int r = 0; r < reps; ++r)
    best = std::min(best, f());
  return best;
}

template <unsigned D>
void run(const char * label)
{
  std::mt19937                           rng(1);
  std::uniform_real_distribution<double> ud(-0.3, 0.3); // small-norm, like log-domain inputs
  const int                              pool = 256;
  std::vector<vnl_matrix<double>>        in(pool, vnl_matrix<double>(D, D));
  for (auto & m : in)
    for (unsigned i = 0; i < D; ++i)
      for (unsigned j = 0; j < D; ++j)
        m(i, j) = ud(rng);

  const int       N = 200000;
  volatile double sink = 0;

  auto timeVnl = [&] {
    auto t0 = clk::now();
    for (int k = 0; k < N; ++k)
      sink += vnl_matrix_exp(in[k & (pool - 1)])(0, 0);
    auto t1 = clk::now();
    return std::chrono::duration<double, std::nano>(t1 - t0).count() / N;
  };
  auto timeItk = [&] {
    auto t0 = clk::now();
    for (int k = 0; k < N; ++k)
      sink += itk::Math::MatrixExponential(in[k & (pool - 1)])(0, 0);
    auto t1 = clk::now();
    return std::chrono::duration<double, std::nano>(t1 - t0).count() / N;
  };

  const double vnl = best_of(timeVnl);
  const double itk = best_of(timeItk);
  std::printf("%-8s vnl_matrix_exp=%8.1f ns   itk(Eigen)=%8.1f ns   speedup=%.2fx\n",
              label, vnl, itk, vnl / itk);
  (void)sink;
}

int main()
{
  std::printf("best-of-11 min ns/call, small-norm inputs\n");
  run<2>("2x2"); // log-domain (2D)
  run<3>("3x3"); // log-domain (3D)
  run<4>("4x4"); // augmented A_bar (2D Jacobian loop)
  run<6>("6x6"); // augmented A_bar (3D Jacobian loop)
  return 0;
}

Representative run (Apple Silicon, dynamic vnl_matrix):

2x2  vnl_matrix_exp= 504.9 ns  itk(Eigen)= 465.4 ns  speedup=1.08x
3x3  vnl_matrix_exp= 741.4 ns  itk(Eigen)= 562.6 ns  speedup=1.32x
4x4  vnl_matrix_exp=1013.0 ns  itk(Eigen)= 722.2 ns  speedup=1.40x
6x6  vnl_matrix_exp=2231.8 ns  itk(Eigen)=1342.4 ns  speedup=1.66x
Correctness / equivalence check — exact AffineLogTransform call types
#include "itkMatrixExponential.h"
#include <vnl/vnl_matrix_exp.h>
#include <iostream>
#include <iomanip>
int main()
{
  using itk::Math::MatrixExponential;
  int fails = 0;

  // itk::Matrix 3x3 vs restored vnl_matrix_exp on the same data
  itk::Matrix<double, 3, 3> A;
  A(0,0)=4;A(0,1)=1;A(0,2)=-2; A(1,0)=3;A(1,1)=-5;A(1,2)=1; A(2,0)=0;A(2,1)=2;A(2,2)=6;
  auto itkExp = MatrixExponential(A);
  vnl_matrix<double> vexp = vnl_matrix_exp(A.GetVnlMatrix().as_matrix());
  double err = 0;
  for (unsigned i=0;i<3;++i) for (unsigned j=0;j<3;++j) err = std::max(err, std::abs(itkExp(i,j)-vexp(i,j)));
  std::cout << "itk::Matrix vs vnl_matrix_exp  max_err=" << std::scientific << err << (err<1e-7?"  PASS":"  FAIL") << "\n";
  if (err>=1e-7) ++fails;

  // vnl_matrix_fixed overload (exactly elastix's GetVnlMatrix() type)
  vnl_matrix_fixed<double, 2, 2> G; G(0,0)=0;G(0,1)=-0.9;G(1,0)=0.9;G(1,1)=0;
  auto gf = MatrixExponential(G);
  double c = std::cos(0.9), s = std::sin(0.9);
  double e2 = std::max(std::max(std::abs(gf(0,0)-c), std::abs(gf(0,1)+s)), std::max(std::abs(gf(1,0)-s), std::abs(gf(1,1)-c)));
  std::cout << "vnl_matrix_fixed rotation       max_err=" << e2 << (e2<1e-9?"  PASS":"  FAIL") << "\n";
  if (e2>=1e-9) ++fails;

  // dynamic vnl_matrix overload (exactly elastix's A_bar type)
  vnl_matrix<double> N(2,2); N(0,0)=0;N(0,1)=1;N(1,0)=0;N(1,1)=0;
  auto nf = MatrixExponential(N);
  double e3 = std::max(std::max(std::abs(nf(0,0)-1), std::abs(nf(0,1)-1)), std::max(std::abs(nf(1,0)), std::abs(nf(1,1)-1)));
  std::cout << "vnl_matrix nilpotent            max_err=" << e3 << (e3<1e-12?"  PASS":"  FAIL") << "\n";
  if (e3>=1e-12) ++fails;

  std::cout << (fails ? "\nRESULT: FAIL\n" : "\nRESULT: ALL PASS\n");
  return fails;
}

@hjmjohnson

Copy link
Copy Markdown
Author

@N-Dekker I do not have merge rights on elastix. Take this to any terminal state that you wish.

NOTE: if ITK_FUTURE_LEGACY_REMOVE is used, <vnl/vnl_matrix_exp.h>. The intent is that <vnl/vnl_matrix_exp.h> is removed in ITKv7.

@N-Dekker N-Dekker self-requested a review June 22, 2026 10:54
@N-Dekker

N-Dekker commented Jun 22, 2026

Copy link
Copy Markdown
Member

Thanks @hjmjohnson Before merging, I would like to see if it affects the results of our regression test, at the CI. Ijust started elastix builds with both ITK-v6.0b02 and ITK-v6-main. Afterwards, I'll try cherry-picking your PR on the second branch.


Update - cherry-picking your PR:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants