Skip to content

Releases: uxlfoundation/oneCCL

Intel(R) oneAPI Collective Communications Library (oneCCL) 2022.0.0

Choose a tag to compare

@Maria1Petrova Maria1Petrova released this 13 May 13:25
1318c3a

What's New oneCCL 2022.0

  • Improved NCCL* compatibility by setting NCCL* like C API by default
  • Intel® Arc™ Pro B-Series support delivers optimized scale up performance leveraging low latency protocol
  • SPMD support for Allgather, Allreduce, Alltoall, ReduceScatter, Broadcast, pt2pt and Group API for scale up on Intel® Arc™ Pro B-Series
  • Added support for user defined reduction operations for scale out on Intel® Data Center GPU Max Series
  • Added reduction operations for scale up on Intel® Arc™ Pro B-Series
  • Improved profiling information to assess imbalance across communicating processes by tracing tools
  • Added onecclCommWindowRegister, onecclCommWindowDeregister, onecclMemAlloc, onecclMemFree APIs
  • Introduced support for SYCL graph for scale up

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.9

Choose a tag to compare

@Maria1Petrova Maria1Petrova released this 23 Apr 10:14
6cee031

This ccl_2021.15.9-arc branch introduces several bug fixes on Intel ARC A and B Series GPU.

An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.8

Choose a tag to compare

@Maria1Petrova Maria1Petrova released this 26 Mar 18:53
c0bfc31

This ccl_2021.15.8-arc branch introduces several enhancements for Intel ARC A and B Series GPU:

  • Introduced Alltoall scale-up algorithms leveraging copy engines and scale-out algorithms with GPU RDMA.
  • Added a simple protocol to optimize large-message bandwidth for collective operations, including Allreduce, ReduceScatter, and Allgather.

An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.17.2

Choose a tag to compare

@Maria1Petrova Maria1Petrova released this 04 Feb 15:27
6649993

What's New 2021.17.2:

  • Added support for single process and multiple threads for allreduce, allgatherv, reduce_scatter on BMG
  • Fixed performance issues for allgatherv
  • Fixed a bug in comm split
  • Fixed a bug in allgatherv with inplace operation

Note: Previous 2021.17.1 release is only available via binary distribution channels and fixes compatibility issues with manylinux 2-28 platform standard. No code changes are present in it.

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.7

Choose a tag to compare

@maciekpac maciekpac released this 18 Dec 00:07
c570491

This ccl_2021.15.7-arc branch introduces several enhancements for Intel ARC A and B Series GPU:

  • allreduce LL chunking
  • fixes for sub-communicators for allreduced and pt2pt
  • CCL benchmark now prints both alg and bus bandwidth
  • fixes for LL flag overflow, which may happen to a long running workload (stress test)
  • fixes for small GPU memory leak
  • applying chunking in Allgather to reduce contention

An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1

An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

Attached binaries:

2021.15.7.6 package is built using 2025.2.0 version of Intel® oneAPI DPC++/C++ Compiler

2021.15.7.8 package is built using 2025.3.2 version of Intel® oneAPI DPC++/C++ Compiler

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.17

Choose a tag to compare

@Maria1Petrova Maria1Petrova released this 04 Dec 10:56
93f2621

What's New 2021.17:

  • New API: Technical preview of NCCL* like API alignment with an addition of onecclcommDestroy, onecclGetErrorstring, and onecclGetLastError APIs
  • Support for single process and multiple threads: Currently supporting Allgather, Allreduce, Alltoall, ReduceScatter, Broadcast, pt2pt and Group API for scale up
  • Added Operations: Added support for user defined reduction operations for scale up and extended group API to also support pt2pt operations.
  • Improved Performance: Allgather optimizations for large messages for scale out up to 8 nodes
  • Support for BMG: Added BMG support, for now only available on the opensource
  • Bug fixes and performance optimizations

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.6

Choose a tag to compare

@nikitaxgusev nikitaxgusev released this 22 Oct 09:23
0a67730

This ccl_2021.15.6-arc branch introduces several enhancements for Intel ARC A and B Series GPU:

  • Bug Fixes
  • Add implementation for ofi barrier to optimize the CCL barrier in OFI transport
  • Applying chunking in Allgather scale-up (LL protocol)
  • Code refactoring

An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1

An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

Attached binaries:

2021.15.6.2 package is built using 2025.0.0 version of Intel® oneAPI DPC++/C++ Compiler

2021.15.6.9 package is built using 2025.2.0 version of Intel® oneAPI DPC++/C++ Compiler

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16.2

Choose a tag to compare

@maciekpac maciekpac released this 24 Sep 11:29
4f1449d

What's new:

  • Bug fixes

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.5

Choose a tag to compare

@nikitaxgusev nikitaxgusev released this 23 Sep 09:08
52eee8d

This ccl_2021.15.5-arc branch introduces several enhancements for Intel ARC A and B Series GPU:

This release introduces bug fixes and refactoring, along with new implementations for Alltoall LL and one-way RDMA send-receive functionalities.

The cmake command is the same as before:

An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1

An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16.1

Choose a tag to compare

@maciekpac maciekpac released this 01 Sep 11:53
f588098

What's new:

  • Bug fixes