[Core][Parallelization] Making explicitily `schedule(runtime)`, with `dynamic` by default, in OMP loops in `ParallelUtils` #12923

loumalouomega · 2024-12-11T09:13:52Z

📝 Description

Making explicitily schedule(runtime), with dynamic by default, in OMP loops in ParallelUtils. I still need to add a benchmark and actually compare that is faster. Also updates the bannerwith the parallelism information:

 |  /           |                  
 ' /   __| _` | __|  _ \   __|    
 . \  |   (   | |   (   |\__ \  
_|\_\_|  \__,_|\__|\___/ ____/
           Multi-Physics 10.1."0"-core/explicit-schedule-parallel-utili-d7754dadfa-Release-x86_64
           Compiled for GNU/Linux and Python3.10 with Clang-14.0
Compiled with threading and MPI support. Threading support with OpenMP, scheduling dynamic.
Maximum number of threads: 20.
Running without MPI.

Add benchmark
Compare results

Fixes #12924

🆕 Changelog

Making explicitily schedule(dynamic) by default in OMP loops in ParallelUtils
Refactor for runtime
Benchmark from [Core][Benchmark] Adding benchmark for parallel_utilities #12942
Changes in kernel from [Core] Adding additional SMP information into header information #12947 + extension

… in `ParallelUtils`

…ith dynamic schedule without conflicting the GIL

RiccardoRossi · 2024-12-11T12:20:58Z

are u sure this is needed? because this is c++ code, i don't think the gil presents a problem here

loumalouomega · 2024-12-11T12:41:54Z

are u sure this is needed? because this is c++ code, i don't think the gil presents a problem here

Look at https://github.com/KratosMultiphysics/Kratos/actions/runs/12273173829/job/34243450170

loumalouomega · 2024-12-11T12:43:39Z

are u sure this is needed? because this is c++ code, i don't think the gil presents a problem here

Look at KratosMultiphysics/Kratos/actions/runs/12273173829/job/34243450170

And now is failing when running tests: https://github.com/KratosMultiphysics/Kratos/actions/runs/12275201329/job/34250231555?pr=12923. I will define in CMake

RiccardoRossi · 2024-12-11T13:31:27Z

@loumalouomega dynamic scheduling is used today for example in the builder and solver....without the need of releasing the GIL

why is that different?

loumalouomega · 2024-12-11T13:33:38Z

@loumalouomega dynamic scheduling is used today for example in the builder and solver....without the need of releasing the GIL

why is that different?

No idea, look at the outcome from the CI. We tested for some functions and the improvement is significant. This was added in a recent version of pybind11. pybind/pybind11#4246

loumalouomega · 2024-12-11T14:53:38Z

Okay, looks like the last change fixed the issue

loumalouomega · 2024-12-12T09:37:30Z

@RiccardoRossi we can set it on runtime with this: https://www.openmp.org/spec-html/5.0/openmpse49.html and keep the current code and set the OMP_SCHEDULE by default to "dynamic"

loumalouomega · 2024-12-12T12:05:14Z

Modified to be on runtime, defaulted to dynamic

loumalouomega · 2024-12-12T15:49:11Z

Okay, looks like the runtime works

RiccardoRossi · 2024-12-12T16:00:54Z

Right now if you have 4 tareas and 1000 items, you will do 250 on each...definitely suboptimal.for dyna.ic scheduling...

loumalouomega · 2024-12-12T16:05:04Z

Right now if you have 4 tareas and 1000 items, you will do 250 on each...definitely suboptimal.for dyna.ic scheduling...

The default is dynamic, not dynamic, 4. dynamic,4 is just an example, not the actual default. Anyway i am seeing that is not taking properly the environment variable.

…ortInfo

loumalouomega · 2024-12-12T17:08:09Z

Right now if you have 4 tareas and 1000 items, you will do 250 on each...definitely suboptimal.for dyna.ic scheduling...

The default is dynamic, not dynamic, 4. dynamic,4 is just an example, not the actual default. Anyway i am seeing that is not taking properly the environment variable.

Okay fixed that issue, BTW, now the the banner includes the parallelism information:

 |  /           |                  
 ' /   __| _` | __|  _ \   __|    
 . \  |   (   | |   (   |\__ \  
_|\_\_|  \__,_|\__|\___/ ____/
           Multi-Physics 10.1."0"-core/explicit-schedule-parallel-utili-d7754dadfa-Release-x86_64
           Compiled for GNU/Linux and Python3.10 with Clang-14.0
Compiled with threading and MPI support. Threading support with OpenMP, scheduling dynamic.
Maximum number of threads: 20.
Running without MPI.

…t variable for scheduling type

….cpp

roigcarlo · 2024-12-17T11:45:57Z

CMakeLists.txt

+    # Check if the environment variable OMP_SCHEDULE is defined
+    if(DEFINED ENV{OMP_SCHEDULE})
+        # Set the already defined one
+        set(KRATOS_OMP_SCHEDULE $ENV{OMP_SCHEDULE})


OMP_SCHEDULE is a runtime env variable, it is a extremely bad idea to use it a compilation switch (IMO).

I understand, but this is the following.

During compilation the OMP_SCHEDULE will set KRATOS_OMP_SCHEDULE that will be used as default if actually OMP_SCHEDULE is not defined, but if OMP_SCHEDULE is defined OMP_SCHEDULE will be taken into account. Do you understand me?

pooyan-dadvand · 2024-12-17T14:03:55Z

I agree with chunk size argument by @RiccardoRossi

My point (in #12924) was to first give a way to define dynamic scheduling in our for each loop. This would let us to fine tune our parallelization in many cases that dynamic would be better or at least not worst.

For having the dynamic as default now I understand that would not work and chunk size would be an important blocker....

RiccardoRossi · 2024-12-17T15:03:32Z

I agree with chunk size argument by @RiccardoRossi

My point (in #12924) was to first give a way to define dynamic scheduling in our for each loop. This would let us to fine tune our parallelization in many cases that dynamic would be better or at least not worst.

For having the dynamic as default now I understand that would not work and chunk size would be an important blocker....

to clarify, it is NOT difficult to change the chunking algorithm (i guess it will be a 20 lines long code), i am simply telling that it needs to be done aside of the other changes.

…ock-based operations for improved performance and clarity

…uling for improved performance

philbucher · 2025-05-14T15:51:02Z

Cannot this be changed with an environment variable?

loumalouomega · 2025-05-15T07:30:24Z

Cannot this be changed with an environment variable?

I think I added the option to that

RiccardoRossi · 2025-05-24T21:32:28Z

I have been thinking about this PR for a while, and in my PERSONAL opinion (not yet brought up in @KratosMultiphysics/technical-committee) it is time finally switch to using std::for_each (and other stuff as described in c++17).

My point is that with the recent upgrade to centod Devtools 11 and the changes in ManyLinux2014 (which we use to package our releases) we are effectively setting our compiler baseline to gcc11.

according to cppreference
the library features we use are

fully supported since gcc 9
available in MSVC
available in modern clang (or in older ones using libgcc)
available on mac using "-fexperimental" mac_support

my point is that we could finally rely on high quality implementations instead of the ones we can make ourselves ...

loumalouomega · 2025-05-24T21:42:49Z

I have been thinking about this PR for a while, and in my PERSONAL opinion (not yet brought up in @KratosMultiphysics/technical-committee) it is time finally switch to using std::for_each (and other stuff as described in c++17).

My point is that with the recent upgrade to centod Devtools 11 and the changes in ManyLinux2014 (which we use to package our releases) we are effectively setting our compiler baseline to gcc11.

according to cppreference the library features we use are
* fully supported since gcc 9

* available in MSVC

* available in modern clang (or in older ones using libgcc)

* available on mac using "-fexperimental" [mac_support](https://stackoverflow.com/questions/78746910/recognize-stdexecutionpar-c-feature-in-macos)
my point is that we could finally rely on high quality implementations instead of the ones we can make ourselves ...

I agree with you. This is the current status of OpenMP when limited to because of MSVC:

Details

Jokes aside, yes, I think We should move to c++ paralellel. I started a branch long ago, but it didn't work. https://github.com/KratosMultiphysics/Kratos/tree/core/cxx17-parutils

philbucher · 2025-05-27T09:39:24Z

I did lots of work on pushing for Cxx-based parallelization, IMO only few things are left. There are a few important parallel loops like in the B&S that probably need a manual reimplementation

And "minor" things like IsInParallel in the OMPUtils. Other functionalities in the OMPUtils have been deprecated for years

In a nutshell, I believe it is possible to switch if several devs push for it

RiccardoRossi · 2025-05-27T11:20:30Z

Clarifica,
I do not think we should remove the parallel utils. They have some unique features that are definitely worth (for example the tls). We could parallelize them internally using c++ parallelism but this is a minor.

My point is that I would not further expand the parallel útilities but rather to use std parallelism for what goes beyond that. (For example the assemblea phase)

Now this also means for example forgetting things like "dynamic dcheduling" and thrusting that the lib will handle those things for us. (I understand this is a harder call).

Having daid this we did a very preliminary test with @roigcarlo and gcc and clang both compile std parallel examples

loumalouomega · 2025-05-27T11:36:28Z

Clarifica, I do not think we should remove the parallel utils. They have some unique features that are definitely worth (for example the tls). We could parallelize them internally using c++ parallelism but this is a minor.

My point is that I would not further expand the parallel útilities but rather to use std parallelism for what goes beyond that. (For example the assemblea phase)

Now this also means for example forgetting things like "dynamic dcheduling" and thrusting that the lib will handle those things for us. (I understand this is a harder call).

Having daid this we did a very preliminary test with @roigcarlo and gcc and clang both compile std parallel examples

I agree with you. IMHO parallel utilities could be moved to an independent library, as it is quite straighforward to use and could be useful to many project outside this one. In addition to improve the library as standalone would be faster and easier IMHO. ALso easier to benchamark and test.

I had also some experiment with STD paralell in the utilities, we can try to see if we can reuse some of my work.

RiccardoRossi · 2025-05-28T04:51:18Z

I forgot an implrtant issue that we previously considerado as blocker: parallel stl does not allow controlling the number of threads.

I discussed about this with chwtgpt, and I got the following (to be verified)

https://chatgpt.com/c/68369519-1f8c-800b-bef7-d34ccc7faad8

loumalouomega · 2025-05-28T09:10:43Z

I forgot an implrtant issue that we previously considerado as blocker: parallel stl does not allow controlling the number of threads.

I discussed about this with chwtgpt, and I got the following (to be verified)

chatgpt.com/c/68369519-1f8c-800b-bef7-d34ccc7faad8

"Unable to load conversation 68369519-1f8c-800b-bef7-d34ccc7faad8"

loumalouomega · 2025-05-28T10:31:45Z

I forgot an implrtant issue that we previously considerado as blocker: parallel stl does not allow controlling the number of threads.
I discussed about this with chwtgpt, and I got the following (to be verified)
chatgpt.com/c/68369519-1f8c-800b-bef7-d34ccc7faad8

"Unable to load conversation 68369519-1f8c-800b-bef7-d34ccc7faad8"

https://g.co/gemini/share/718135e95327

Gemini also says that is not possible with the STL but it is possible with others like TBB, whichs is very simialr to STL. Maybe we should think about this.

roigcarlo · 2025-05-28T11:56:01Z

This:

You're absolutely right to seek control over thread usage when using std::execution::par (not par_seq, which doesn't exist — maybe you meant std::execution::par or std::execution::seq). However, C++17's Parallel STL (PSTL) provides no standard mechanism to control the number of threads, and this is implementation-defined — not controlled like OpenMP via OMP_NUM_THREADS.

But you're not out of luck.
🔧 Solution: Use an implementation that allows customization (like Intel TBB)

If you're using Intel's PSTL or oneTBB (which many implementations use under the hood), you can set the number of threads used by TBB like this:
✅ Step-by-step (Intel TBB example):

Include TBB headers (not part of standard C++17, but needed to configure threads):

#include <tbb/global_control.h>

Set thread count before calling the algorithm:

#include <execution>
#include <vector>
#include <algorithm>
#include <iostream>
#include <tbb/global_control.h>

int main() {
    const int num_threads = 4;

    tbb::global_control c(tbb::global_control::max_allowed_parallelism, num_threads);

    std::vector<int> v(1'000'000, 1);
    std::for_each(std::execution::par, v.begin(), v.end(), [](int& x) {
        x += 1;
    });

    std::cout << v[0] << std::endl;
}

✅ This sets the number of threads to 4 globally, just like OMP_NUM_THREADS.

🚧 Important Notes:

tbb::global_control works from TBB 2020 onward. If you're using an older version, upgrade.

You must ensure your C++17 standard library implementation is using TBB underneath (Intel PSTL, LLVM libc++, or similar). libstdc++ (GCC) does not use TBB by default.

🛠 How to Check Your Implementation:
GCC (libstdc++)

Uses a custom backend (not TBB).

No official way to set thread count. You’d have to hack the backend or recompile with a TBB-backed PSTL.

LLVM libc++

May use TBB if configured accordingly.

🔄 Alternative: Roll Your Own Thread Pool

If none of the above satisfies you, consider using a real task scheduler (e.g. TBB directly, std::async, std::jthread, or custom pool) — std::execution::par is too limited for fine-grained control.

Would you like me to help you determine which backend your system uses or show how to switch to Intel’s PSTL with TBB for full control?
You said:
what about MSVC?
ChatGPT said:

Excellent follow-up. Under MSVC (Microsoft Visual C++ STL), things are slightly different — and still a bit limited — but you can control the thread pool to a degree.

Here’s the situation:
✅ TL;DR: MSVC uses its own thread pool, and you can set the number of threads using a non-standard API:
Concurrency::Scheduler::SetDefaultSchedulerPolicy(...) (Windows-only).
✅ MSVC STL Implementation Details:

MSVC implements std::execution::par using its own internal Thread Pool Scheduler, not TBB.

There is no official C++17 standard API to set the number of threads.

However, MSVC's backend uses the PPL (Parallel Patterns Library) and Concurrency Runtime, which allows configuring via a scheduler policy.

🔧 How to Set the Number of Threads in MSVC STL Parallel Algorithms

Here’s a working example using the Concurrency Runtime:
⚙️ Example: Force 4 threads in MSVC STL parallel algorithms

#include
#include
#include
#include
#include <ppl.h> // Needed for Concurrency::Scheduler

int main() {
// Define scheduler policy to limit threads
Concurrency::SchedulerPolicy policy(2, // policy count
Concurrency::MaxConcurrency, 4, // limit threads to 4
Concurrency::MinConcurrency, 4); // and exactly 4

// Set as default scheduler (this is global)
Concurrency::Scheduler::SetDefaultSchedulerPolicy(policy);

std::vector<int> data(1000000, 1);

std::for_each(std::execution::par, data.begin(), data.end(), [](int& x) {
    x += 1;
});

std::cout << "First element: " << data[0] << std::endl;

}

🧱 Compile with:

Make sure you compile with /std:c++17 and link to the concrt.lib if needed (though it's usually automatic).
⚠️ Notes and Limitations

This only works on Windows.

The configuration affects the Concurrency Runtime thread pool, which backs MSVC’s std::execution::par.

This is not portable and not standard. If you want full control across platforms, TBB or OpenMP is better.

✅ Bottom Line for MSVC

You can control threads.

Use Concurrency::SchedulerPolicy to set the default thread pool behavior.

It’s not standard C++, but it works and is the best you can do on MSVC without replacing the backend.

Would you like me to provide a CMake configuration or Visual Studio project snippet to go along with this?

loumalouomega · 2025-05-28T11:59:30Z

This:

You're absolutely right to seek control over thread usage when using std::execution::par (not par_seq, which doesn't exist — maybe you meant std::execution::par or std::execution::seq). However, C++17's Parallel STL (PSTL) provides no standard mechanism to control the number of threads, and this is implementation-defined — not controlled like OpenMP via OMP_NUM_THREADS.

But you're not out of luck. 🔧 Solution: Use an implementation that allows customization (like Intel TBB)

If you're using Intel's PSTL or oneTBB (which many implementations use under the hood), you can set the number of threads used by TBB like this: ✅ Step-by-step (Intel TBB example):
Include TBB headers (not part of standard C++17, but needed to configure threads):
#include <tbb/global_control.h>

Set thread count before calling the algorithm:
#include <execution>
#include <vector>
#include <algorithm>
#include <iostream>
#include <tbb/global_control.h>

int main() {
    const int num_threads = 4;

    tbb::global_control c(tbb::global_control::max_allowed_parallelism, num_threads);

    std::vector<int> v(1'000'000, 1);
    std::for_each(std::execution::par, v.begin(), v.end(), [](int& x) {
        x += 1;
    });

    std::cout << v[0] << std::endl;
}

✅ This sets the number of threads to 4 globally, just like OMP_NUM_THREADS.
🚧 Important Notes:
tbb::global_control works from TBB 2020 onward. If you're using an older version, upgrade.

You must ensure your C++17 standard library implementation is using TBB underneath (Intel PSTL, LLVM libc++, or similar). libstdc++ (GCC) does not use TBB by default.
🛠 How to Check Your Implementation: GCC (libstdc++)
Uses a custom backend (not TBB).

No official way to set thread count. You’d have to hack the backend or recompile with a TBB-backed PSTL.
LLVM libc++
May use TBB if configured accordingly.
🔄 Alternative: Roll Your Own Thread Pool

If none of the above satisfies you, consider using a real task scheduler (e.g. TBB directly, std::async, std::jthread, or custom pool) — std::execution::par is too limited for fine-grained control.

Would you like me to help you determine which backend your system uses or show how to switch to Intel’s PSTL with TBB for full control? You said: what about MSVC? ChatGPT said:

Excellent follow-up. Under MSVC (Microsoft Visual C++ STL), things are slightly different — and still a bit limited — but you can control the thread pool to a degree.

Here’s the situation: ✅ TL;DR: MSVC uses its own thread pool, and you can set the number of threads using a non-standard API: Concurrency::Scheduler::SetDefaultSchedulerPolicy(...) (Windows-only). ✅ MSVC STL Implementation Details:
MSVC implements std::execution::par using its own internal Thread Pool Scheduler, not TBB.

There is no official C++17 standard API to set the number of threads.

However, MSVC's backend uses the PPL (Parallel Patterns Library) and Concurrency Runtime, which allows configuring via a scheduler policy.
🔧 How to Set the Number of Threads in MSVC STL Parallel Algorithms

Here’s a working example using the Concurrency Runtime: ⚙️ Example: Force 4 threads in MSVC STL parallel algorithms

#include #include #include #include #include <ppl.h> // Needed for Concurrency::Scheduler

int main() { // Define scheduler policy to limit threads Concurrency::SchedulerPolicy policy(2, // policy count Concurrency::MaxConcurrency, 4, // limit threads to 4 Concurrency::MinConcurrency, 4); // and exactly 4
// Set as default scheduler (this is global)
Concurrency::Scheduler::SetDefaultSchedulerPolicy(policy);

std::vector<int> data(1000000, 1);

std::for_each(std::execution::par, data.begin(), data.end(), [](int& x) {
    x += 1;
});

std::cout << "First element: " << data[0] << std::endl;
}

🧱 Compile with:

Make sure you compile with /std:c++17 and link to the concrt.lib if needed (though it's usually automatic). ⚠️ Notes and Limitations
This only works on Windows.

The configuration affects the Concurrency Runtime thread pool, which backs MSVC’s std::execution::par.

This is not portable and not standard. If you want full control across platforms, TBB or OpenMP is better.
✅ Bottom Line for MSVC
You can control threads.

Use Concurrency::SchedulerPolicy to set the default thread pool behavior.

It’s not standard C++, but it works and is the best you can do on MSVC without replacing the backend.
Would you like me to provide a CMake configuration or Visual Studio project snippet to go along with this?

I don't like the idea of having different things for diffent OSs. I would rather use TBB instead (specially if is the standard facto under the hood).

RiccardoRossi · 2025-05-29T09:38:19Z

i don't dislike the idea of using TBB ... but .. .what about ARM machines? will that work fine there?

loumalouomega · 2025-05-29T16:27:21Z

i don't dislike the idea of using TBB ... but .. .what about ARM machines? will that work fine there?

Intel TBB (now part of Intel oneAPI Threading Building Blocks, or oneTBB) is compatible with ARM processors.

While Intel TBB was originally developed by Intel for Intel processors, it has evolved to be highly portable. Modern versions, especially oneTBB, support a variety of architectures, including ARM.
You can find evidence of this in the official documentation and community discussions:

Official Documentation: The oneTBB system requirements explicitly list "Microsoft* Windows* on ARM*/ARM64*" and "macOS* on ARM64*" as community-supported platforms, and also state that it "has ports to multiple architectures that include Intel® architectures and ARM."
Community Support: There are many discussions and resources online about building and using TBB on ARM-based systems, including Raspberry Pi.
It's worth noting that while TBB generally works on ARM, there might have been some complexities or specific compiler flags needed in earlier versions to ensure optimal performance or address specific architectural nuances. However, with the ongoing development and open-source nature of oneTBB, support for ARM continues to improve.

philbucher · 2025-05-29T17:18:01Z

Did you check that yourself? Sounds like the LLM might be hallucinating 😅

I would double check, only last week I had issues with Intel (MPI) on a non-Intel CPU (AMD)

loumalouomega · 2025-05-29T18:10:22Z

Did you check that yourself? Sounds like the LLM might be hallucinating 😅

I would double check, only last week I had issues with Intel (MPI) on a non-Intel CPU (AMD)

https://archlinuxarm.org/packages/aarch64/onetbb -> aarch64 is ARM

(I check it before send it, but the text of Gemini was better structured anything I could write, so I copy pasted)

RiccardoRossi · 2025-10-08T20:15:11Z

just to report here that apparently in P1000 everyone asked for control over resources. The partial answer to this is P2300 which made it as one of the major changes in c++26. Unfortunately for dynamic schedulers we will have to wait for c++29 ...

loumalouomega · 2025-10-08T20:19:53Z

just to report here that apparently in P1000 everyone asked for control over resources. The partial answer to this is P2300 which made it as one of the major changes in c++26. Unfortunately for dynamic schedulers we will have to wait for c++29 ...

[Core] Making explicitily schedule(dynamic) by default in OMP loops…

897cd72

… in `ParallelUtils`

loumalouomega added Kratos Core Performance Parallel-SMP Shared memory parallelism with OpenMP or C++ Threads labels Dec 11, 2024

loumalouomega requested a review from roigcarlo December 11, 2024 09:14

loumalouomega mentioned this pull request Dec 11, 2024

[Core][Parallelization] Shall we change our parallel utils to use dynamic scheduling instead of static? #12924

Open

loumalouomega changed the title ~~[Core] Making explicitily schedule(dynamic) by default in OMP loops in ParallelUtils~~ [Core][Parallelization] Making explicitily schedule(dynamic) by default in OMP loops in ParallelUtils Dec 11, 2024

Define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREF to run OMP loops w…

dc8a7e2

…ith dynamic schedule without conflicting the GIL

Moving to Cmake

91374a0

loumalouomega marked this pull request as ready for review December 11, 2024 14:53

loumalouomega requested a review from a team as a code owner December 11, 2024 14:53

Refcator for runtime

83ce3e7

loumalouomega changed the title ~~[Core][Parallelization] Making explicitily schedule(dynamic) by default in OMP loops in ParallelUtils~~ [Core][Parallelization] Making explicitily schedule(runtime), with dynamic by default, in OMP loops in ParallelUtils Dec 12, 2024

Missing )

d7754da

loumalouomega added 2 commits December 12, 2024 18:06

Fix preprocessor directive formatting in Kernel::PrintParallelismSupp…

7ece75a

…ortInfo

Set default value for KRATOS_OMP_SCHEDULE if not defined in CMake

8ba1ab5

loumalouomega added 2 commits December 12, 2024 21:49

Enhance Kernel::PrintParallelismSupportInfo to set and log environmen…

5503861

…t variable for scheduling type

Code for Windows

67c4c6f

Add benchmarks for parallel utilities in parallel_utilities_benchmark…

9c95018

….cpp

loumalouomega mentioned this pull request Dec 16, 2024

[Core][Benchmark] Adding benchmark for parallel_utilities #12942

Merged

roigcarlo reviewed Dec 17, 2024

View reviewed changes

loumalouomega mentioned this pull request Dec 17, 2024

[Core] Adding additional SMP information into header information #12947

Merged

loumalouomega added 4 commits December 17, 2024 17:06

Add authorship in parallel_utilities.h

bc5074e

Refactor benchmarks in parallel_utilities_benchmark.cpp to utilize bl…

606882b

…ock-based operations for improved performance and clarity

Update OpenMP scheduling in parallel_utilities.h to use runtime sched…

24f40aa

…uling for improved performance

Merge branch 'master' into core/explicit-schedule-parallel-utili

fea23da

loumalouomega marked this pull request as draft December 18, 2024 15:23

Merge branch 'master' into core/explicit-schedule-parallel-utili

567b789

[Core][Parallelization] Making explicitily schedule(runtime), with dynamic by default, in OMP loops in ParallelUtils #12923

Are you sure you want to change the base?

[Core][Parallelization] Making explicitily schedule(runtime), with dynamic by default, in OMP loops in ParallelUtils #12923

Conversation

loumalouomega commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RiccardoRossi commented Dec 11, 2024

Uh oh!

loumalouomega commented Dec 11, 2024

Uh oh!

loumalouomega commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RiccardoRossi commented Dec 11, 2024

Uh oh!

loumalouomega commented Dec 11, 2024

Uh oh!

loumalouomega commented Dec 11, 2024

Uh oh!

loumalouomega commented Dec 12, 2024

Uh oh!

loumalouomega commented Dec 12, 2024

Uh oh!

loumalouomega commented Dec 12, 2024

Uh oh!

RiccardoRossi commented Dec 12, 2024

Uh oh!

loumalouomega commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

loumalouomega commented Dec 12, 2024

Uh oh!

roigcarlo Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

loumalouomega Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

pooyan-dadvand commented Dec 17, 2024

Uh oh!

RiccardoRossi commented Dec 17, 2024

Uh oh!

philbucher commented May 14, 2025

Uh oh!

loumalouomega commented May 15, 2025

Uh oh!

RiccardoRossi commented May 24, 2025

Uh oh!

loumalouomega commented May 24, 2025

Uh oh!

philbucher commented May 27, 2025

Uh oh!

RiccardoRossi commented May 27, 2025

Uh oh!

loumalouomega commented May 27, 2025

Uh oh!

RiccardoRossi commented May 28, 2025

Uh oh!

loumalouomega commented May 28, 2025

Uh oh!

loumalouomega commented May 28, 2025

Uh oh!

roigcarlo commented May 28, 2025

Uh oh!

loumalouomega commented May 28, 2025

Uh oh!

RiccardoRossi commented May 29, 2025

Uh oh!

loumalouomega commented May 29, 2025

Uh oh!

philbucher commented May 29, 2025

Uh oh!

loumalouomega commented May 29, 2025

Uh oh!

RiccardoRossi commented Oct 8, 2025

Uh oh!

loumalouomega commented Oct 8, 2025

Uh oh!

Reviewers

[Core][Parallelization] Making explicitily `schedule(runtime)`, with `dynamic` by default, in OMP loops in `ParallelUtils` #12923

[Core][Parallelization] Making explicitily `schedule(runtime)`, with `dynamic` by default, in OMP loops in `ParallelUtils` #12923

loumalouomega commented Dec 11, 2024 •

edited

Loading

loumalouomega commented Dec 11, 2024 •

edited

Loading

loumalouomega commented Dec 12, 2024 •

edited

Loading