All notable changes to this project will be documented in this file. The format is based on Keep a Changelog.
- Added support for
manylinux_2_28wheels built with CUDA in nightly releases (#496) - Added support for
manylinux_2_28wheels in nightly releases (#480) - Added CUDA 13.0 support (#509, #511)
- Added CUDA 12.9 support (#494)
- Added PyTorch 2.8 support (#494)
- Added PyTorch 2.7 support (#442)
- Added CUDA >= 12.6 support (#431)
- Added PyTorch 2.5 support (#360)
- Added PyTorch 2.4 support (#338)
- Added PyTorch 2.3 support (#322)
- Added Windows support (#315)
- Added macOS Apple Silicon support (#310)
- Added ROCM 6.4 support (#507)
- Removed Support for PyTorch 1.13-2.5 (#532)
- Dropped Python 3.9 support (#525)
- Dropped Python 3.8 support (#356)
- Removed linking to Python (#462)
- Added PyTorch 2.2 support (#294)
- Added
softmax_csrimplementation (#264, #282) - Added support for edge-level sampling (#280)
- Added support for
bfloat16data type insegment_matmulandgrouped_matmul(CPU only) (#272)
- Dropped the MKL code path when sampling neighbors with
replace=Falsesince it does not correctly prevent duplicates (#275) - Added
--biasedparameter to run benchmarks for biased sampling (#267) - Improved speed of biased sampling (#270)
- Fixed
grouped_matmulwhen tensors are not contiguous (#290)
- Added PyTorch 2.1 support (#256)
- Added low-level support for distributed neighborhood sampling (#246, #252, #253, #254)
- Added support for homogeneous and heterogeneous biased neighborhood sampling (#247, #251)
- Added dispatch for XPU device in
index_sort(#243) - Added
metispartitioning (#229) - Enable
hetero_neighbor_sampleeto work in parallel (#211)
- Fixed vector-based mapping issue in
Mapping(#244) - Fixed performance issues reported by Coverity Tool (#240)
- Updated
cutlassversion for speed boosts insegment_matmulandgrouped_matmul(#235) - Drop nested tensor wrapper for
grouped_matmulimplementation (#226) - Added
generate_range_of_intsfunction (it uses MKL library in order to generate ints) to RandintEngine class (#222) - Fixed TorchScript support in
grouped_matmul(#220)
- Added PyTorch 2.0 support (#214)
neighbor_sampleroutines now also return information about the number of sampled nodes/edges per layer (#197)- Added
index_sortimplementation (#181, #192) - Added
triton>=2.0support (#171) - Added
biasterm togrouped_matmulandsegment_matmul(#161) - Added
sampled_opimplementation (#156, #159, #160)
- Improved
[segment|grouped]_matmulGPU implementation by reducing launch overheads (#213) - Sample the nodes with the same timestamp as seed nodes (#187)
- Added
write-csv(saves benchmark results as csv file) andlibraries(determines which libraries will be used in benchmark) parameters (#167) - Enable benchmarking of neighbor sampler on temporal graphs (#165)
- Improved
[segment|grouped]_matmulCPU implementation viaat::matmul_outand MKL BLASgemm_batch(#146, #172)
- Added PyTorch 1.13 support (#145)
- Added native PyTorch support for
grouped_matmul(#137) - Added
fused_scatter_reduceoperation for multiple reductions (#141, #142) - Added
tritondependency (#133, #134) - Enable
pytesttesting (#132) - Added C++-based autograd and TorchScript support for
segment_matmul(#120, #122) - Allow overriding
timefor seed nodes viaseed_timeinneighbor_sample(#118) - Added
[segment|grouped]_matmulCPU implementation (#111) - Added
temporal_strategyoption toneighbor_sample(#114) - Added benchmarking tool (Google Benchmark) along with
pyg::sampler::Mapperbenchmark example (#101) - Added CSC mode to
pyg::sampler::neighbor_sampleandpyg::sampler::hetero_neighbor_sample(#95, #96) - Speed up
pyg::sampler::neighbor_sampleviaIndexTrackerimplementation (#84) - Added
pyg::sampler::hetero_neighbor_sampleimplementation (#90, #92, #94, #97, #98, #99, #102, #110) - Added
pyg::utils::to_vectorimplementation (#88) - Added support for PyTorch 1.12 (#57, #58)
- Added
grouped_matmulandsegment_matmulCUDA implementations viacutlass(#51, #56, #61, #64, #69, #73, #123) - Added
pyg::sampler::neighbor_sampleimplementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89) - Added
pyg::sampler::Mapperutility for mapping global to local node indices (#45, #83) - Added benchmark script (#45, #79, #82, #91, #93, #106)
- Added download script for benchmark data (#44)
- Added
biased samplingutils (#38) - Added
CHANGELOG.md(#39) - Added
pyg.subgraph()(#31) - Added nightly builds (#28, #36)
- Added
randCPU engine (#26, #29, #32, #33) - Added
pyg.random_walk()(#21, #24, #25) - Added documentation via
readthedocs(#19, #20) - Added code coverage report (#15, #16, #17, #18)
- Added
CMakeExtensionsupport (#14) - Added test suite via
gtest(#13) - Added
clang-formatlinting viapre-commit(#12) - Added
CMakesupport (#5) - Added
pyg.cuda_version()(#4)
- Allow different types for graph and timestamp data (#143)
- Fixed dispatcher in
hetero_neighbor_sample(#125) - Require sorted neighborhoods according to time in temporal sampling (#108)
- Only sample neighbors with a strictly earlier timestamp than the seed node (#104)
- Prevent absolute paths in wheel (#75)
- Improved installation instructions (#68)
- Replaced std::unordered_map with a faster phmap::flat_hash_map (#65)
- Fixed versions of
checkoutandsetup-pythonin CI (#52) - Make use of the
pyg_sphinx_themedocumentation template (#47) - Auto-compute number of threads and blocks in CUDA kernels (#41)
- Optional return types in
pyg.subgraph()(#40) - Absolute headers (#30)
- Use
at::equalrather thanat::allin tests (#37) - Build
*.soextension on Mac instead of*.dylib(#107)