Releases: uxlfoundation/oneCCL
Release list
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.4
This ccl_2021.15.4-arc branch introduces several enhancements for Intel ARC A and B Series GPU:
- Support for Reduce-Scatter and Point-To-Point in addition to previously enabled Allreduce and Allgather
- Support for 8 bit datatypes (int8, uint8)
- Bug fixes, including removal of previously required setting of IGC_VISAOptions=-activeThreadsOnlyBarrier, which is no longer needed.
The cmake command is the same as before:
make .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16
What's New 2021.16:
- Added SYCL graph support for Record and Replay for Allgather, Allreduce, Alltoall, ReduceScatter and Broadcast
- Added SYCL-based implementation of ring algorithm for Allgather
- Added SYCL-based implementation for Broadcast
- Added multithread support for Allgather and ReduceScatter scale up impementation
- Added attribute in the communicator to specify blocking operations for CPU
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.3
This ccl_2021.15.3-arc branch adds support for Intel ARC A and B Series GPU and some bug fixes.
An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
If the system does not have GPU Peer-to-Peer (P2P) support, you will need to add this compiler environment flag (export IGC_VISAOptions=-activeThreadsOnlyBarrier) before compiling. Similarly, on a system without P2P support, add export IGC_VISAOptions=-activeThreadsOnlyBarrier to your command line before running the application.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.2
What's new:
- Bug fix - Improvement of User Experience based on setting of Environment Variables.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.1
What's new:
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15
What's new:
- Support for Average for Allreduce and Reduce-Scatter
- Extend Group API to also support collective operations.
- New split_communicator API with updated parameters.
- Performance optimizations for scaleup for Alltoall
Removals:
- split_communicators is deprecated in 2021.15.0 and will be removed later
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.14
What's New:
- Optimizations on key-value store support to scale up to 3000 nodes
- New APIs for Allgather, Broadcast and group API calls
- Performance Optimizations for scaleup for Allgather, Allreduce, and Reduce-scatter for scaleup and scaleout
- Performance Optimizations for CPU single node
- Optimizations to reuse Level Zero events.
- Change of the default mechanism for IPC exchange to pidfd
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13.1
What's new:
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13
What's New:
- Optimizations to limit the memory consumed by oneCCL
- Optimizations to limit the number of file descriptors maintained opened by oneCCL.
- Align the support for in-place for the Allgatherv and Reduce-scatter collectives to follow the same behavior as NCCL.
- In particular, the Allgatherv collective is in place when:
- send_buff == recv_buff + rank_offset, where rank_offset = sum (recv_counts[i]), for all I<rank.
- Reduce-scatter is in-place when recv_buff == send_buff + rank *recv_count.
- When using the environment variable CCL_WORKER_AFFINITY, oneCCL enforces the requirement that the length of the list should be equal to the number of workers.
- Bug fixes.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.12
What's New
- Performance improvements for scaleup for all message sizes for AllReduce, Allgather, and Reduce_Scatter.
- Optimizations also include small message sizes that appear in inference apps.
- Performance improvements for scaleout for Allreduce, Reduce, Allgather, and Reduce_Scatter.
- Optimized memory usage of oneCCL.
- Support for PMIx 4.2.6.
- Bug fixes.
Removals
- oneCCL 2021.12 removes support for PMIx 4.2.2