Skip to content

Conversation

@thomasmelvin
Copy link

@thomasmelvin thomasmelvin commented Jan 22, 2026

PR Summary

Sci/Tech Reviewer: @tommbendall
Code Reviewer: @mo-rickywong

Add in a number of solver optimisations.
The principle performance improvement in this pull request is to split the application of the mixed operator into two seperate new kernels.

  1. science/gungho/source/kernel/solver/apply_mixed_u_operator_kernel_mod.F90 computes the lhs for the horizontal wind components. This is done in the broken W2h space so that a write access can be used, avoiding any colouring or halo swaps. After this call the broken W2h lhs needs to be assembled in the continuous W2h space (reapplying a single halo swap)
  2. science/gungho/source/kernel/solver/apply_mixed_wp_operator_kernel_mod.F90 computes the lhs for the vertical wind and pressure components. As these fields lie on horizontally discontinuous spaces there is no colouring or halo exchanges needed.

These changes result in a performance improvement for 2 main reasons

  1. Better use of memory by splitting the single large kernel into two kernels and (presumably) getting better cache usage
  2. Reduction in the number of halo swaps from 3 (for the horizontal wind, vertical wind & pressure) into 1 (for the broken horizontal wind lhs). It appears one of these saved halo exchanges is then reinstated elsewhere in the code, likely when the pressure lhs or vertical wind lhs is needed in the halo region in a seperate kernel.

The C224 & C896 lfric atm tests in the test suite were run with these changes giving the following solver times

Code C224 C896
Trunk 50.71 114.57
Branch 41.59 98.69

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Core rose-stem suite
  • If required (e.g. API changes) I have also run the LFRic Apps test suite using this branch
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

trac.log

These results are from before the KGO update. The failure in the lfric_inputs appears not to be due to this pull request as none of the code changes should be used and is likely one of the occasional lfric_inputs failures that we see

Test Suite Results - lfric_apps - solver_improvements/run6

Suite Information

Item Value
Suite Name solver_improvements/run6
Suite User thomas.melvin
Workflow Start 2026-01-22T11:45:40
Groups Run all
Dependency Reference Main Like
casim MetOffice/[email protected] True
jules MetOffice/[email protected] True
lfric_apps thomasmelvin/lfric_apps@solver_improvements False
lfric_core MetOffice/[email protected] True
moci MetOffice/[email protected] True
SimSys_Scripts MetOffice/[email protected] True
socrates MetOffice/[email protected] True
socrates-spectral MetOffice/[email protected] True
ukca MetOffice/[email protected] True

Task Information

❌ failed tasks - 150
Task State
check_gungho_model_agnesi_hyd_cart-BiP120x8-2000x2000_azspice_gnu_fast-debug-64bit failed
check_gungho_model_agnesi_hyd_cart-BiP120x8-2000x2000_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt1-C24s_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt1-C24s_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt2-C24_MG_op_azspice_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt2-C24_MG_op_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt3-C24_MG_azspice_gnu_fast-debug-64bit-rtran32 failed
check_gungho_model_baroclinic-alt3-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-pert-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-pert-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_bryan_fritsch-dry-BiP200x10-100x100_azspice_gnu_fast-debug-64bit failed
check_gungho_model_bryan_fritsch-dry-BiP200x10-100x100_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_dcmip200-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_dcmip200-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_dcmip200_realorog-C48_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_dcmip200_realorog-C48_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_dcmip301-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_dcmip301-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_deep-hot-jupiter-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_deep-hot-jupiter-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_earth-like-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_earth-like-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_held-suarez-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_held-suarez-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_lfric-real-domain-C48_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_lfric-real-domain-C48_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_robert-moist-lam-BiP100x8-10x10_azspice_gnu_fast-debug-64bit failed
check_gungho_model_robert-moist-lam-BiP100x8-10x10_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_robert-moist-smag-BiP100x8-10x10_azspice_gnu_fast-debug-64bit failed
check_gungho_model_robert-moist-smag-BiP100x8-10x10_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_sbr-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_sbr-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_sbr-alt2-C24_MG_op_azspice_gnu_fast-debug-64bit failed
check_gungho_model_sbr-alt2-C24_MG_op_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_sbr-alt3-C24_MG_azspice_gnu_fast-debug-64bit-rtran32 failed
check_gungho_model_sbr-alt3-C24_MG_ex1a_gnu_fast-debug-64bit-rtran32 failed
check_gungho_model_sbr_lam-n96_MG_lam_azspice_gnu_fast-debug-64bit failed
check_gungho_model_sbr_lam-n96_MG_lam_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_sbr_lam-n96_MG_lam_rotate_azspice_gnu_fast-debug-64bit failed
check_gungho_model_sbr_lam-n96_MG_lam_rotate_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_schar_cart-BiP200x8-500x500_azspice_gnu_fast-debug-64bit failed
check_gungho_model_schar_cart-BiP200x8-500x500_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_schar_cart-alt2-BiP100x4-1000x1000_azspice_gnu_fast-debug-64bit failed
check_gungho_model_schar_cart-alt2-BiP100x4-1000x1000_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_semi-implicit-for-linear-C12_azspice_gnu_fast-debug-64bit failed
check_gungho_model_semi-implicit-for-linear-C12_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_shallow-hot-jupiter-C24_MG_azspice_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_shallow-hot-jupiter-C24_MG_ex1a_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_skamarock_klemp_gw_p0-BiP300x8-1000x2000_azspice_gnu_fast-debug-64bit failed
check_gungho_model_skamarock_klemp_gw_p0-BiP300x8-1000x2000_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-BiP256x8-200x200_azspice_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-BiP256x8-200x200_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt1-BiP256x4-200x200_azspice_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt1-BiP256x4-200x200_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt2-BiP256x16-200x50_op_azspice_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt2-BiP256x16-200x50_op_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt3-BiP256x8-200x200_azspice_gnu_fast-debug-64bit-rtran32 failed
check_gungho_model_straka_200m-alt3-BiP256x8-200x200_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_tidally-locked-earth-C24_MG_azspice_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_tidally-locked-earth-C24_MG_ex1a_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_tidally-locked-earth-C24s_rot_MG_azspice_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_tidally-locked-earth-C24s_rot_MG_ex1a_gnu_fast-debug-64bit-crun1 failed
check_jedi_lfric_tests_forecast_gh-si-for-linear-C12_azspice_gnu_fast-debug-64bit failed
check_jedi_lfric_tests_forecast_gh-si-for-linear-C12_azspice_gnu_full-debug-64bit failed
check_jedi_lfric_tests_forecast_gh-si-for-linear-C12_ex1a_cce_fast-debug-64bit failed
check_jedi_lfric_tests_nwp_gal9-C12_azspice_gnu_fast-debug-64bit failed
check_jedi_lfric_tests_nwp_gal9-C12_ex1a_cce_fast-debug-64bit failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_azspice_gnu_fast-debug-64bit failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_ex1a_cce_fast-debug-64bit failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_op_azspice_gnu_fast-debug-64bit failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_op_ex1a_cce_fast-debug-64bit failed
check_lfric_atm_aquaplanet-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_aquaplanet-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_camembert_case3_gj1214b-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_camembert_case3_gj1214b-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9_1T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_1T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_2T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_2T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_4T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_chem-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9_chem-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9_chem_1T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_chem_2T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_comp_tran_ref_3d_l120-BiP64x64-1500x1500_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_hd209458b-C24_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_hd209458b-C24_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_casim-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_casim-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_coma9-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_coma9-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_comorph_dev-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_comorph_dev-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_comorph_tb-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9-C12_azspice_gnu_fast-debug-64bit-crun1 failed
check_lfric_atm_nwp_gal9-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9-C12_ex1a_cce_fast-debug-64bit-crun1 failed
check_lfric_atm_nwp_gal9-C48_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9-pert-C12_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9-pert-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero-C48_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero_threaded-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero_threaded-C48_MG_ex1a_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_da-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_da-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_debug-C12_azspice_gnu_full-debug-32bit failed
check_lfric_atm_nwp_gal9_debug-C12_ex1a_cce_full-debug-32bit failed
check_lfric_atm_nwp_gal9_debug-C48_MG_azspice_gnu_full-debug-32bit failed
check_lfric_atm_nwp_gal9_debug-C48_MG_ex1a_cce_full-debug-32bit failed
check_lfric_atm_nwp_gal9_eda-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_eda-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_eda_jada-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_eda_jada-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_mol-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_mol-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_noukca_1T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_1T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_2T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_2T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_2T-C48_MG_ex1a_cce_full-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_4T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_short-C12_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_short-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_ral3-seuk_MG_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3-seuk_MG_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3_ens-seuk_MG_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3_ens-seuk_MG_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3_mixmol-seuk_MG_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3_mixmol-seuk_MG_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_rce-BiP64x64-1500x1500_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_rce-BiP64x64-1500x1500_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_thai_ben1-C48_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_thai_ben1-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_coupled_nwp_gal9-C48_ex1a_cce_fast-debug-64bit failed
check_linear_model_dcmip301-C24_azspice_gnu_fast-debug-64bit failed
check_linear_model_dcmip301-C24_ex1a_gnu_fast-debug-64bit failed
check_linear_model_nwp_gal9-C12_MG_azspice_gnu_fast-debug-64bit failed
check_linear_model_nwp_gal9-C12_MG_ex1a_gnu_fast-debug-64bit failed
check_linear_model_nwp_gal9_random-C12_MG_azspice_gnu_fast-debug-64bit failed
check_linear_model_nwp_gal9_random-C12_MG_ex1a_gnu_fast-debug-64bit failed
check_linear_model_semi-implicit-C12_azspice_gnu_fast-debug-64bit failed
check_linear_model_semi-implicit-C12_ex1a_gnu_fast-debug-64bit failed
rose_ana_lfricinputs_um2lfric-protogal_chem-N48L70_C12L70_azspice_gnu_full-debug-64bit failed
✅ succeeded tasks - 1305
⌛ waiting tasks - 2
Task State
housekeep_azspice waiting
housekeep_ex1a waiting

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

@github-actions github-actions bot added the cla-required The CLA has not yet been signed by the author of this PR - added by GA label Jan 22, 2026
@github-actions github-actions bot added cla-signed The CLA has been signed as part of this PR - added by GA and removed cla-required The CLA has not yet been signed by the author of this PR - added by GA labels Jan 22, 2026
@thomasmelvin
Copy link
Author

A selection of results comparing to stable

Test Branch Trunk
Baroclinic wave (C48) branch-baroclinic-C48 trunk-baroclinic-C48
3D Warm Bubble branch-warm-bubble trunk-warm-bubble
SBR (C24) branch-sbr-C24 trunk-sbr-C24
NWP-GAL9 (C48) branch-nwp-gal9-u_in_w3 trunk-nwp-gal9-u_in_w3
RAL3-SEUK branch-ral3-seuk-u_in_w3 trunk-ral3-seuk-u_in_w3

@thomasmelvin thomasmelvin added the KGO This PR contains changes to KGO label Jan 23, 2026
@thomasmelvin thomasmelvin added this to the Spring 2026 milestone Jan 23, 2026
Copy link
Contributor

@tommbendall tommbendall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good speed-up. I'm happy that this is scientifically equivalent to main and the KGOs are only changing from bit-wise differences in the kernels.

I am generally happy -- the new mixed solver kernels have been literally taken from the existing kernel and split into two parts, making this easy to review.

My main thoughts that aren't captured through my comments on the code:

  1. Do you prefer to keep the old apply_mixed_operator_kernel_mod.F90 file, which I think is now unused? Or could it be removed?
  2. For the assemble_w2h_from_w2hb_kernel, there is already a very similar kernel in the core repo: https://github.com/MetOffice/lfric_core/blob/main/components/science/source/kernel/inter_function_space/sci_average_w2b_to_w2_kernel_mod.F90 . I think it could be better to modify the existing kernel to work for both W2H and W2, rather than add a new kernel here? (appreciating that you'd rather not make this a linked ticket, so feel free to refuse)
  3. The timers will conflict with the changes from #68 and #176 -- so I'm suggesting reverting these changes

@@ -0,0 +1,86 @@
!-----------------------------------------------------------------------------
! Copyright (c) 2017, Met Office, on behalf of HMSO and Queen's Printer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the old copyright statement

if ( LPROF ) call start_timing( id, 'mixed_operator' )
type(r_solver_field_type) :: y_uv_broken

if ( subroutine_timers ) call timer('mixed_solver.operator')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here

@@ -418,7 +414,7 @@ contains
call invoke( inc_X_times_Y(rhs, h_diag) )
end if

if ( LPROF ) call stop_timing( id, 'mixed_schur_rhs' )
if ( subroutine_timers ) call timer('schur_precon.rhs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here


if ( LPROF ) call start_timing( id, 'schur_back_substitute' )
type(r_solver_field_type), target :: uvw_norm
if ( subroutine_timers ) call timer('schur_precon.back_sub')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here

@@ -507,7 +504,7 @@ contains
state_p => state%get_field_from_position(isol_p)
call invoke( setval_X(state_p, exner_inc) )

if ( LPROF ) call stop_timing( id, 'schur_back_substitute' )
if ( subroutine_timers ) call timer('schur_precon.back_sub')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here


if ( LPROF ) call start_timing( id, 'helmholtz_lhs' )
if ( subroutine_timers ) call timer('pressure_solver.helmholtz_lhs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here

@@ -242,7 +241,7 @@ contains
nullify( w3_mask, w2_mask )
end if
nullify( x_vec, y_vec )
if ( LPROF ) call stop_timing( id, 'helmholtz_lhs' )
if ( subroutine_timers ) call timer('pressure_solver.helmholtz_lhs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here

@thomasmelvin
Copy link
Author

This is a really good speed-up. I'm happy that this is scientifically equivalent to main and the KGOs are only changing from bit-wise differences in the kernels.

I am generally happy -- the new mixed solver kernels have been literally taken from the existing kernel and split into two parts, making this easy to review.

My main thoughts that aren't captured through my comments on the code:

  1. Do you prefer to keep the old apply_mixed_operator_kernel_mod.F90 file, which I think is now unused? Or could it be removed?
  2. For the assemble_w2h_from_w2hb_kernel, there is already a very similar kernel in the core repo: https://github.com/MetOffice/lfric_core/blob/main/components/science/source/kernel/inter_function_space/sci_average_w2b_to_w2_kernel_mod.F90 . I think it could be better to modify the existing kernel to work for both W2H and W2, rather than add a new kernel here? (appreciating that you'd rather not make this a linked ticket, so feel free to refuse)
  3. The timers will conflict with the changes from Adding Timing Wrapper calls throughout LFRic #68 and Calipers performance25 #176 -- so I'm suggesting reverting these changes
  1. Yes I think it's best to keep the old on (though I have removed the algorithm level code for using it). My reasoning is that it is not clear splitting the kernels is the best solution in the long run (e.g. if architectures/compilers change) and so we may wish to go back to the old version and so it is best to keep it
  2. I did consider that but there are 2 differences that made me decide on a new kernel, firstly the new kernel doesn't require the vertical loop (this could have been fixed by adding an if test), secondly the new kernel doesn't require a multiplicity, I couldn't think of a way of easily generalising this without passing in a dummy field of ones which I didnt want to do for a performance sensitive part of the code
  3. I've reverted to the old timer names

Copy link
Contributor

@tommbendall tommbendall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reply. I'm happy with the changes in that case, so this passes science review.

This is now ready for code review @mo-rickywong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed as part of this PR - added by GA KGO This PR contains changes to KGO

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants