Openmp threaded linsolver by hnil · Pull Request #7194 · OPM/opm-simulators

hnil · 2026-06-22T12:44:36Z

Add all infrastructure to make openmp as fast as mpi. Missing pice is amg, but with path in amgcl it is near. Then is the post pre which dominates the difference.

Replace the sequential A.mv / A.usmv (and the hand-rolled interior-row loops in GhostLastMatrixAdapter and WellModelGhostLastMatrixAdapter) with an index-based loop over output rows carrying `#pragma omp parallel for`. Each output row y[i] is written by exactly one thread and the matrix is read-only, so the per-row reduction order is unchanged and the result is bit-identical to the sequential apply. Falls back to the serial loop when _OPENMP is not defined. This is the highest value/line change for the single-node (pure-OpenMP) target and leaves the MPI (GhostLast*) path functionally intact. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add ThreadedScalarProduct.hpp (ThreadedSeqScalarProduct), an OpenMP dot/ norm with a reduction, gated behind a block-count threshold (50k) so it is non-harmful on small systems where fork/join would dominate. Wire it into FlexibleSolver as the sequential scalar product in place of Dune::SeqScalarProduct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add DILU2.hpp (MultithreadDILU2): a multicolor variant of the OpenMP DILU preconditioner. The stock MultithreadDILU uses a level-set (wavefront) schedule whose many thin levels produce one barrier per level (~120 per apply on a typical grid), which does not scale and regresses past a few threads. DILU2 instead reorders the unknowns by a graph coloring of the sparsity pattern: rows of one color are independent, so the triangular solves need only `#colors` parallel sweeps (a handful for grid graphs). Apply scaling reaches ~3.7x at 8 threads where wavefront DILU regresses. Convergence per iteration differs slightly (the factorization is on the color-permuted matrix), but the coloring -- and therefore the iteration count -- is fixed regardless of thread count, so results are reproducible. Register the "dilu2" creator in the serial and MPI preconditioner factories and accept --linear-solver=dilu2 in setupPropertyTree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add AmgclPreconditioner.hpp, a Dune PreconditionerWithUpdate wrapping AMGCL's smoothed-aggregation AMG on the builtin OpenMP backend, for use as the scalar (1x1) pressure-stage solver in CPR. AMGCL's shared-memory backend threads the V-cycle far better than Hypre's OpenMP path on a single node, which makes a fully threaded OpenMP CPR possible (Dune-AMG's MPI-specific code blocks the OpenMP route). update() uses a fast numeric-only re-setup (cached Galerkin, reusing the transfer operators) and reports hasPerfectUpdate()==false so the CPR reuse interval periodically refreshes the aggregation. Registered as the "amgcl" creator for 1x1 systems in both the serial factory and the MPI factory (per-rank, wrapped as a restricted-additive- Schwarz block preconditioner). AMGCL is header-only and optional: CMake enables it (HAVE_AMGCL) only when -DAMGCL_ROOT points at an AMGCL clone, so builds without it are unaffected. Note: the fast numeric re-setup relies on a rebuild() entry point added in a small patch to AMGCL's amg.hpp; that patch lives in the AMGCL clone (AMGCL_ROOT) and should be upstreamed or carried as a tracked patch alongside this branch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add a developer note covering the threaded SpMV / scalar product, the dilu vs dilu2 smoother trade-off, building with AMGCL (-DAMGCL_ROOT, HAVE_AMGCL) for the "amgcl" CPR pressure stage, and single-node performance caveats. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The AMGCL CPR pressure stage (AmgclPreconditioner) relies on a fast numeric-only Galerkin re-setup hooked into AMGCL's existing rebuild() path. Carry that change as a tracked patch against upstream AMGCL (amgcl/amg.hpp) plus a README describing what it does and how to apply it, rather than vendoring a forked AMGCL. Building without AMGCL is unaffected; this is only needed to build the "amgcl" preconditioner. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…DILU AmgclPreconditioner now forwards strong_threshold to coarsening.eps_strong for the ruge_stuben (classical AMG) coarsening, matching how it already forwards to aggregation coarsenings. Lets the classical-AMG strength be tuned toward Hypre BoomerAMG's typical 0.5 (AMGCL default is 0.25). Also refresh patches/amgcl-numeric-galerkin-rebuild.patch to additionally carry an experimental multicolor DILU relaxation (relaxation/dilu.hpp + runtime wiring). NOTE: DILU is experimental — it is unstable on AMGCL's smoothed_aggregation/ruge_stuben coarse operators (use spai0); kept for completeness and possible future well-conditioned aggregations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

hnil and others added 7 commits June 19, 2026 09:34

hnil marked this pull request as draft June 22, 2026 12:44

hnil added the manual:irrelevant This PR is a minor fix and should not appear in the manual label Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Openmp threaded linsolver#7194

Openmp threaded linsolver#7194
hnil wants to merge 7 commits into
OPM:masterfrom
hnil:openmp-threaded-linsolver

hnil commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hnil commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant