[DRAFT] Use object files in ROCm pipeline#12
[DRAFT] Use object files in ROCm pipeline#12mrowan137 wants to merge 2 commits intoCEMeNT-PSAAP:mainfrom
Conversation
d5f1eae to
2d70485
Compare
|
Hey @mrowan137 I finally had some time to test this out and am getting this linker error on Tuo. My build script dfor a python env is attached and I also have a pip and module list below. Let me know your thoughts. Thanks! pip list module list |
e0d6735 to
c64854b
Compare
|
@jpmorgan98 thank you for testing it out! Comparing with your packages, I found that my testing uses some older ones; these ones particularly: When I upgraded to 0.61.2, 0.1.6, 0.44.0 as in your setup, I can reproduce the linker error you shared. I pushed one more change that will use |
|
The plot thickens With your most recent updates and I am unable to install Thanks! (venv-tuo) [morgan83@tuolumne2150:slab_absorbium]$ HSA_XNACK=1 ROCR_VISIBLE_DEVICES=0 flux run -N 1 -n 1 -g 1 -q pdebug -B cbronze python input.py --mode=numba --target=gpu
flux-job: f2tHcJpQxfRH started 00:00:04
warning: overriding the module target triple with x86_64-unknown-linux-gnu [-Woverride-module]
1 warning generated.
lld: warning: WS1/morgan83/tuo_dep/harmonize/harmonize/python/../cpp/program/mod.h:66:0: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_Z9_dev_execI12EventProgramI14mcdc_precursorEEvNT_13DeviceContextENS3_11DeviceStateEm': desired occupancy was 4, final occupancy is 3
lld: warning: WS1/morgan83/tuo_dep/harmonize/harmonize/python/../cpp/program/mod.h:66:0: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in '_Z9_dev_execI12EventProgramI11mcdc_sourceEEvNT_13DeviceContextENS3_11DeviceStateEm': desired occupancy was 4, final occupancy is 3
:0:/longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/clr/hipamd/src/hip_global.cpp:158 : 364374250279 us: Module not initialized
Fatal Python error: Aborted
Current thread 0x000015555554c080 (most recent call first):
File "/usr/WS1/morgan83/tuo_dep/MCDC/mcdc/main.py", line 1122 in prepare
File "/usr/WS1/morgan83/tuo_dep/MCDC/mcdc/main.py", line 67 in run
File "/usr/WS1/morgan83/tuo_dep/MCDC/examples/fixed_source/slab_absorbium/input.py", line 48 in <module>
Extension modules: mkl._mklinit, mkl._py_mkl_service, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy._lib._ccallback_c, yaml._yaml, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.utils, h5py.h5t, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5r, h5py._proxy, h5py._conv, h5py.h5z, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5o, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5l, h5py._selector, gmpy2.gmpy2, mpi4py.MPI, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, matplotlib._image, numba.mviewbuf, hip._util.types, hip._hip_helpers, hip._util.posixloader, hip.chip, hip.hip, hip._hiprtc_helpers, hip.chiprtc, hip.hiprtc, hip.chipblas, hip.hipblas, hip.crccl, hip.rccl, hip.chiprand, hip.hiprand, hip.chipfft, hip.hipfft, hip.chipsparse, hip.hipsparse, hip.croctx, hip.roctx, hip.chipsolver, hip.hipsolver, cuda.nvrtc, cuda.cuda, cuda.cudart, rocm.llvm._util.types, rocm.llvm.c.types, rocm.llvm._util.posixloader, rocm.llvm.c.canalysis, rocm.llvm.c.analysis, rocm.llvm.c.cbitreader, rocm.llvm.c.bitreader, rocm.llvm.c.cbitwriter, rocm.llvm.c.bitwriter, rocm.llvm.c.cblake3, rocm.llvm.c.blake3, rocm.llvm.c.ccomdat, rocm.llvm.c.comdat, rocm.llvm.c.cerrorhandling, rocm.llvm.c.errorhandling, rocm.llvm.c.ccore, rocm.llvm.c.core, rocm.llvm.c.datatypes, rocm.llvm.c.cdebuginfo, rocm.llvm.c.debuginfo, rocm.llvm.c.deprecated, rocm.llvm.c.disassemblertypes, rocm.llvm.c.cdisassembler, rocm.llvm.c.disassembler, rocm.llvm.c.cerror, rocm.llvm.c.error, rocm.llvm.config.cllvm_config, rocm.llvm.config.llvm_config, rocm.llvm.c.ctarget, rocm.llvm.c.target, rocm.llvm.c.ctargetmachine, rocm.llvm.c.targetmachine, rocm.llvm.c.cexecutionengine, rocm.llvm.c.executionengine, rocm.llvm.c.cirreader, rocm.llvm.c.irreader, rocm.llvm.c.clinker, rocm.llvm.c.linker, rocm.llvm.c.corc, rocm.llvm.c.orc, rocm.llvm.c.clljit, rocm.llvm.c.lljit, rocm.llvm.c.clljitutils, rocm.llvm.c.lljitutils, rocm.llvm.c.clto, rocm.llvm.c.lto, rocm.llvm.c.cobject, rocm.llvm.c.object, rocm.llvm.c.corcee, rocm.llvm.c.orcee, rocm.llvm.c.cremarks, rocm.llvm.c.remarks, rocm.llvm.c.csupport, rocm.llvm.c.support, rocm.llvm.c.transforms.cpassbuilder, rocm.llvm.c.transforms.passbuilder, rocm.amd_comgr.camd_comgr, rocm.amd_comgr.amd_comgr, numba.cpython.mathimpl.math, numba.cpython.mathimpl.sys, numba.core.typing.cmathdecl.cmath, _cffi_backend, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.linalg._flinalg, numba.types.itertools, numba.cpython.hashing.math, numba.cpython.hashing.sys, numba.cpython.numbers.math, numba.core.typing.mathdecl.math, numba.np.arraymath.math (total: 175)
flux-job: task(s) Abortedle |
|
Many thanks for the contributions - both past and present! @mrowan137 and @jpmorgan98 I think I have these changes working on tuolumne, though not with the exact same configuration. Here is the script I used to set up the initial environment: #!/bin/bash -x
cd
# =============================================================================
# Setups
# =============================================================================
# MC/DC and harmonize branches
MCDC_BRANCH="main"
HARMONIZE_BRANCH="global_array_fields"
# Name for the virtual environment
VENV_NAME="mcdc"
# ROCm and Python versions
ROCM_VERSION="6.0.0"
PYTHON_VERSION="3.11.5"
# MPI module
MPI_MODULE="cray-mpich/9.0.1"
# Paths
WORKSPACE="$HOME"
VENV_PATH="$WORKSPACE/new-tuo-venv"
ROCM_LLVM_PY_DIR="$WORKSPACE/rocm_llvm_py-new"
HARMONIZE_DIR="$WORKSPACE/harmonize"
MCDC_DIR="$WORKSPACE/MCDC"
# =============================================================================
# Preparation
# =============================================================================
# Load necessary modules
module load "python/$PYTHON_VERSION"
module load "rocm/$ROCM_VERSION"
module load "$MPI_MODULE"
# =============================================================================
# Create Python environment
# =============================================================================
# Remove any pre-existing instance of the environment
rm -rf "$VENV_PATH"
# Create the environment
"/usr/tce/packages/python/python-$PYTHON_VERSION/bin/virtualenv" "$VENV_PATH"
# Add ROCm paths to the environment (to help hip-numba later)
PATH_EXPORTS="""
export ROCM_PATH="/opt/rocm-$ROCM_VERSION"
export ROCM_HOME="/opt/rocm-$ROCM_VERSION"
"""
echo "$PATH_EXPORTS" >> "$VENV_PATH/bin/activate"
# Activate the venv
source "$VENV_PATH/bin/activate"
# Make sure we are working with a recent version of pip and setuptools
pip install --upgrade pip
pip install --upgrade setuptools
# =============================================================================
# Install ROCm-LLVM-Python
# =============================================================================
# Remove any pre-existing install
rm -rf "$ROCM_LLVM_PY_DIR"
# Clone in the repo
git clone https://github.com/ROCm/rocm-llvm-python "$ROCM_LLVM_PY_DIR"
# Enter the repo
cd $ROCM_LLVM_PY_DIR
# Get the branch for our preferred version of ROCM
git checkout "release/rocm-rel-$ROCM_VERSION"
# Build the package
./init.sh
sed -i "s/cimport *cpython.string/#cimport cpython.string/g" "$ROCM_LLVM_PY_DIR/rocm-llvm-python/rocm/llvm/_util/types.pyx"
./build_pkg.sh --post-clean -j 16
# Select a wheel with the preferred rocm version.
LATEST=$( ls -1 rocm-llvm-python/dist/rocm_llvm_python-${ROCM_VERSION}*.whl | tail -n 1 )
pip install --force-reinstall $LATEST
unset LATEST
###############################################################################
# Install HIP-Python
###############################################################################
pip install -i https://test.pypi.org/simple "hip-python~=$ROCM_VERSION"
pip install -i https://test.pypi.org/simple "hip-python-as-cuda~=$ROCM_VERSION"
###############################################################################
# Install HIP-Numba
###############################################################################
pip install numba==0.61.0
pip config set global.extra-index-url https://test.pypi.org/simple
#pip install --no-deps "git+https://github.com/ROCm/numba-hip.git@8098162162fb0babd77b56583b289d6dd6226151"
pip install --no-deps "git+https://github.com/ROCm/numba-hip.git"
###############################################################################
# Install Harmonize and MC/DC
###############################################################################
# Harmonize
cd "$HARMONIZE_DIR"
git checkout "$HARMONIZE_BRANCH"
pip install -e .
# MC/DC
cd "$MCDC_DIR"
git checkout "$MCDC_BRANCH"
pip install -e .[dev]
# Install MPI4Py
CC=cc MPICC=cc pip install --no-binary=mpi4py mpi4py==4.0.0
# Install supported library versions
pip install cvxpy==1.7.0
pip install scipy==1.12
pip install numpy==1.24.3This script switches the There are some changes in |
|
@jpmorgan98, the error is good news in a way-- we've passed the linker error, to a runtime error. Initially, it seemed this could be due to some difference in our python packages, so I iteratively aligned package versions to the ones you shared. Surprisingly, even with identical package versions, the runtime error did not appear. After that, it seemed a remaining major difference was how we were running the reproducer: interactive mode ( Upon further inspection of environment differences between those contexts, the runtime error appears to be due to unintended interaction with Spindle (which is on by default); disabling that, e.g. with:
|
This PR modifies the ROCm compilation pipeline for harmonize.so to link using object instead of bitcode files.
The change is aimed at addressing a linker error (described here: #7).