Skip to content

Comparative Tests on H2 matvec and H2 matmul

Edmond Chow edited this page Oct 18, 2020 · 7 revisions

In practice, it is much more efficient by applying -matmul to multiply multiple vectors in one call than applying -matvec in multiple calls. Below, we demonstrate the efficiency difference between -matmul and -matvec for multiplying various numbers of vectors.

Hardware and software configuration

  • 2 * Intel Xeon Gold 6226 CPU @ 2.7GHz (2 * 12 cores, 2 * 12 * 2 threads, hyperthreading disabled)
  • 6 * 32 GB DDR4 memory
  • Red Hat Enterprise Linux 7.6 (kernel 3.10.0-957.12.1.el7)
  • Intel Parallel Studio Cluster version 2019.5
  • ICC optimization flags: -O3 -xHost
  • OpenMP environment variables
    • OMP_NUM_THREADS=24
    • OMP_PLACES=cores
    • OMP_PROC_BIND=close

Test settings

  • Point sets: 400,000 uniformly and randomly distributed points in a 3D scaled cube
  • Relative error threshold: 1e-6
  • Running mode: JIT and AOT
  • -construction and -matvec timings in seconds,
  • Kernel: 3D Gaussian with

Numerical Results

Clone this wiki locally