Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
12f4abe
[SYCL][LIBCLC] Additional libclc builtins to support SYCL work
Feb 6, 2020
979b448
[SYCL] CMake and lit support for SYCL CUDA backend
Feb 6, 2020
b937adf
[SYCL][CUDA] Add SYCL CUDA support to clang driver
Feb 18, 2020
4f2e019
[SYCL] Local Accessor Support for CUDA
Feb 7, 2020
b63f78a
[SYCL][CUDA] Change __spirv_BuiltIn.. to functions
Feb 7, 2020
fc60859
[SYCL][CUDA] Initial Implementation of the CUDA backend
Feb 24, 2020
17c8ccf
[SYCL] Update libclc install rules
Feb 3, 2020
680f890
[SYCL][CUDA] Inline cl namespace to simplify SYCL API usage
fwyzard Feb 3, 2020
5e71823
Added missing flags for device-side builtins
Feb 10, 2020
b01ff28
[SYCL][CUDA] Removing unnecessary tool from the tree
Feb 10, 2020
abee4f9
[SYCL][PI] Fix kernel group info parameter conversion
Feb 12, 2020
cfd1266
[SYCL] Changed CUDA unit tests to call through plugin
Feb 18, 2020
61a206b
[SYCL] Have default_selector consider SYCL_BE
Feb 14, 2020
c2168af
[SYCL] Select GlobalPlugin based on SYCL_BE
Feb 17, 2020
c7e2846
[SYCL] Improve default device selection checks
Feb 17, 2020
23b179e
[SYCL] Formatting update for device_selector.cpp
Feb 18, 2020
52736fd
[SYCL][CUDA] Refactor __SYCL_INLINE macro
fwyzard Feb 13, 2020
62afe84
[SYCL][CUDA] Code style and cleanup to CUDA support
Feb 21, 2020
5f5e017
[SYCL] Pass SYCL_BE=PI_OPENCL in check-sycl
Feb 20, 2020
54678ab
[SYCL][CUDA] Remove PI_CUDA specific details from clang
Feb 20, 2020
fb4521e
[SYCL][CUDA] Disable linear_id/opencl-interop.cpp for cuda
Feb 20, 2020
ab9f4be
[SYCL][CUDA] Further fixes to CUDA device selection
Feb 20, 2020
cdab838
[SYCL] Enable asserts in all buildbot builds
Feb 21, 2020
5b1ff35
[SYCL][CUDA] Minor test and build configuration
Feb 24, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 4 additions & 0 deletions clang/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,10 @@ endif()
include(CheckIncludeFile)
check_include_file(sys/resource.h CLANG_HAVE_RLIMITS)

if(SYCL_BUILD_PI_CUDA)
set(SYCL_HAVE_PI_CUDA 1)
endif()

set(CLANG_RESOURCE_DIR "" CACHE STRING
"Relative directory from the Clang binary to its resource files.")

Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/Basic/DiagnosticDriverKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ def warn_drv_unknown_cuda_version: Warning<
"Unknown CUDA version %0. Assuming the latest supported version %1">,
InGroup<CudaUnknownVersion>;
def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;
def err_drv_no_sycl_libspirv : Error<
"cannot find `libspirv-nvptx64--nvidiacl.bc`. Provide path to libspirv library via "
"-fsycl-libspirv-path, or pass -fno-sycl-libspirv to build without linking with libspirv.">;
def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not supported.">;
def err_drv_invalid_thread_model_for_target : Error<
"invalid thread model '%0' in '%1' for this target">;
Expand Down
2 changes: 1 addition & 1 deletion clang/include/clang/Basic/DiagnosticIDs.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ namespace clang {
// Size of each of the diagnostic categories.
enum {
DIAG_SIZE_COMMON = 300,
DIAG_SIZE_DRIVER = 250, // 200 -> 250 for SYCL related diagnostics
DIAG_SIZE_DRIVER = 210,
DIAG_SIZE_FRONTEND = 150,
DIAG_SIZE_SERIALIZATION = 120,
DIAG_SIZE_LEX = 400,
Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/Config/config.h.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@
#cmakedefine01 CLANG_ENABLE_OBJC_REWRITER
#cmakedefine01 CLANG_ENABLE_STATIC_ANALYZER

/* Define if we have SYCL PI CUDA support */
#cmakedefine SYCL_HAVE_PI_CUDA ${SYCL_HAVE_PI_CUDA}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#cmakedefine SYCL_HAVE_PI_CUDA ${SYCL_HAVE_PI_CUDA}
#cmakedefine01 SYCL_HAVE_PI_CUDA

According to the docs it should do the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this define? Can we have "SYCL PI CUDA support" unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to have PI CUDA support unconditionally the cuda toolchain will always be required for compilation. We decided to make it optional to allow people who only use the OpenCL plugin to compile the project without a cuda toolchain on their system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to have PI CUDA support unconditionally the cuda toolchain will always be required for compilation. We decided to make it optional to allow people who only use the OpenCL plugin to compile the project without a cuda toolchain on their system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my understanding we need CUDA toolchain to build CUDA plugin only.
Could you clarify why we should require CUDA toolchain to build the driver?
https://llvm.org/docs/CompileCudaWithLLVM.html - doesn't seem to require some custom driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, we only need CUDA toolchain for building of the plugin, but we limit the valid SYCL triples in the clang driver based if PI CUDA support is available or not.

https://github.com/intel/llvm/pull/1091/files#diff-beaf25b0cdf8830dd4ea165404b00671R618

static bool isValidSYCLTriple(llvm::Triple T) {
#ifdef SYCL_HAVE_PI_CUDA
  // NVPTX is valid for SYCL.
  if (T.isNVPTX())
    return true;
#endif
  // Check for invalid SYCL device triple values.
  // Non-SPIR arch.
  if (!T.isSPIR())
    return false;
  // SPIR arch, but has invalid SubArch for AOT.
  StringRef A(T.getArchName());
  if (T.getSubArch() == llvm::Triple::NoSubArch &&
      ((T.getArch() == llvm::Triple::spir && !A.equals("spir")) ||
       (T.getArch() == llvm::Triple::spir64 && !A.equals("spir64"))))
    return false;
  return true;
}

We can remove this limitation though and always allow nvptx triples for compilation, regardless of it the CUDA plugin is available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for removing.


/* Spawn a new process clang.exe for the CC1 tool invocation, when necessary */
#cmakedefine01 CLANG_SPAWN_CC1

Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -1872,6 +1872,9 @@ def fsycl_help_EQ : Joined<["-"], "fsycl-help=">,
def fsycl_help : Flag<["-"], "fsycl-help">, Alias<fsycl_help_EQ>,
Flags<[DriverOption, CoreOption]>, AliasArgs<["all"]>, HelpText<"Emit help information "
"from all of the offline compilation tools">;
def fsycl_libspirv_path_EQ : Joined<["-"], "fsycl-libspirv-path=">,
Flags<[CC1Option, CoreOption]>, HelpText<"Path to libspirv library">;
def fno_sycl_libspirv : Flag<["-"], "fno-sycl-libspirv">, HelpText<"Disable check for libspirv">;
def fsyntax_only : Flag<["-"], "fsyntax-only">,
Flags<[DriverOption,CoreOption,CC1Option]>, Group<Action_Group>;
def ftabstop_EQ : Joined<["-"], "ftabstop=">, Group<f_Group>;
Expand Down
3 changes: 2 additions & 1 deletion clang/lib/Basic/Targets/NVPTX.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
.Default(32);
}

TLSSupported = false;
// FIXME: Needed for compiling SYCL to PTX.
TLSSupported = Triple.getEnvironment() == llvm::Triple::SYCLDevice;
VLASupported = false;
AddrSpaceMap = &NVPTXAddrSpaceMap;
UseAddrSpaceMapMangling = true;
Expand Down
6 changes: 6 additions & 0 deletions clang/lib/Basic/Targets/NVPTX.h
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,12 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo {
Opts.support("cl_khr_global_int32_extended_atomics");
Opts.support("cl_khr_local_int32_base_atomics");
Opts.support("cl_khr_local_int32_extended_atomics");
// PTX actually supports 64 bits operations even if the Nvidia OpenCL
// runtime does not report support for it.
// This is required for libclc to compile 64 bits atomic functions.
// FIXME: maybe we should have a way to control this ?
Opts.support("cl_khr_int64_base_atomics");
Opts.support("cl_khr_int64_extended_atomics");
}

/// \returns If a target requires an address within a target specific address
Expand Down
6 changes: 6 additions & 0 deletions clang/lib/CodeGen/CGCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -755,6 +755,12 @@ CodeGenTypes::arrangeLLVMFunctionInfo(CanQualType resultType,
return *FI;

unsigned CC = ClangCallConvToLLVMCallConv(info.getCC());
// This is required so SYCL kernels are successfully processed by tools from CUDA. Kernels
// with a `spir_kernel` calling convention are ignored otherwise.
if (CC == llvm::CallingConv::SPIR_KERNEL && CGM.getTriple().isNVPTX() &&
getContext().getLangOpts().SYCLIsDevice) {
CC = llvm::CallingConv::C;
}

// Construct the function info. We co-allocate the ArgInfos.
FI = CGFunctionInfo::create(CC, instanceMethod, chainCall, info,
Expand Down
2 changes: 2 additions & 0 deletions clang/lib/CodeGen/CodeGenModule.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,8 @@ void CodeGenModule::createSYCLRuntime() {
switch (getTriple().getArch()) {
case llvm::Triple::spir:
case llvm::Triple::spir64:
case llvm::Triple::nvptx:
case llvm::Triple::nvptx64:
SYCLRuntime.reset(new CGSYCLRuntime(*this));
break;
default:
Expand Down
2 changes: 1 addition & 1 deletion clang/lib/CodeGen/TargetInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6546,7 +6546,7 @@ void NVPTXTargetCodeGenInfo::setTargetAttributes(
llvm::Function *F = cast<llvm::Function>(GV);

// Perform special handling in OpenCL mode
if (M.getLangOpts().OpenCL) {
if (M.getLangOpts().OpenCL || M.getLangOpts().SYCLIsDevice) {
// Use OpenCL function attributes to check for kernel functions
// By default, all functions are device functions
if (FD->hasAttr<OpenCLKernelAttr>()) {
Expand Down
118 changes: 105 additions & 13 deletions clang/lib/Driver/Driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,11 @@ Driver::OpenMPRuntimeKind Driver::getOpenMPRuntime(const ArgList &Args) const {
}

static bool isValidSYCLTriple(llvm::Triple T) {
#ifdef SYCL_HAVE_PI_CUDA
// NVPTX is valid for SYCL.
if (T.isNVPTX())
return true;
#endif
// Check for invalid SYCL device triple values.
// Non-SPIR arch.
if (!T.isSPIR())
Expand Down Expand Up @@ -3250,11 +3255,37 @@ class OffloadingActionBuilder final {
/// Type of output file for FPGA device compilation.
types::ID FPGAOutType = types::TY_FPGA_AOCX;

/// List of CUDA architectures to use in this compilation with NVPTX targets.
SmallVector<CudaArch, 8> GpuArchList;

/// Build the last steps for CUDA after all BC files have been linked.
Action *finalizeNVPTXDependences(Action *Input, const llvm::Triple &TT) {
auto *BA = C.getDriver().ConstructPhaseAction(
C, Args, phases::Backend, Input, AssociatedOffloadKind);
if (TT.getOS() != llvm::Triple::NVCL) {
auto *AA = C.getDriver().ConstructPhaseAction(
C, Args, phases::Assemble, BA, AssociatedOffloadKind);
ActionList DeviceActions = {BA, AA};
return C.MakeAction<LinkJobAction>(DeviceActions,
types::TY_CUDA_FATBIN);
}
return BA;
}

public:
SYCLActionBuilder(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs)
: DeviceActionBuilder(C, Args, Inputs, Action::OFK_SYCL) {}

void withBoundArchForToolChain(const ToolChain* TC,
llvm::function_ref<void(const char *)> Op) {
if (TC->getTriple().isNVPTX())
for (CudaArch A : GpuArchList)
Op(CudaArchToString(A));
else
Op(nullptr);
}

ActionBuilderReturnCode
getDeviceDependences(OffloadAction::DeviceDependences &DA,
phases::ID CurPhase, phases::ID FinalPhase,
Expand All @@ -3272,8 +3303,11 @@ class OffloadingActionBuilder final {
C.MakeAction<CompileJobAction>(A, types::TY_SYCL_Header);
A = C.MakeAction<CompileJobAction>(A, types::TY_LLVM_BC);
}
DA.add(*DeviceCompilerInput, *ToolChains.front(), /*BoundArch=*/nullptr,
Action::OFK_SYCL);
const auto *TC = ToolChains.front();
const char *BoundArch = nullptr;
if (TC->getTriple().isNVPTX())
BoundArch = CudaArchToString(GpuArchList.front());
DA.add(*DeviceCompilerInput, *TC, BoundArch, Action::OFK_SYCL);
// Clear the input file, it is already a dependence to a host
// action.
DeviceCompilerInput = nullptr;
Expand Down Expand Up @@ -3329,9 +3363,17 @@ class OffloadingActionBuilder final {
}

// By default, we produce an action for each device arch.
auto TC = ToolChains.begin();
for (Action *&A : SYCLDeviceActions) {
if ((*TC)->getTriple().isNVPTX() && CurPhase >= phases::Backend) {
// For CUDA, stop to emit LLVM IR so it can be linked later on.
++TC;
continue;
}

A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A,
AssociatedOffloadKind);
++TC;
}

return ABRT_Success;
Expand Down Expand Up @@ -3430,7 +3472,9 @@ class OffloadingActionBuilder final {
auto TI = ToolChains.begin();
for (auto *A : SYCLDeviceActions) {
OffloadAction::DeviceDependences Dep;
Dep.add(*A, **TI, /*BoundArch=*/nullptr, Action::OFK_SYCL);
withBoundArchForToolChain(*TI, [&](const char *BoundArch) {
Dep.add(*A, **TI, BoundArch, Action::OFK_SYCL);
});
AL.push_back(C.MakeAction<OffloadAction>(Dep, A->getType()));
++TI;
}
Expand Down Expand Up @@ -3514,22 +3558,27 @@ class OffloadingActionBuilder final {
else
LinkObjects.push_back(Input);
}
auto *DeviceLinkAction =
Action *DeviceLinkAction =
C.MakeAction<LinkJobAction>(LinkObjects, types::TY_LLVM_BC);
ActionList WrapperInputs;
Action *SPIRVInput = DeviceLinkAction;
types::ID OutType = types::TY_SPIRV;
if (DeviceCodeSplit) {
auto *SplitAction = C.MakeAction<SYCLPostLinkJobAction>(
DeviceLinkAction, types::TY_Tempfilelist);
auto *EntryGenAction = C.MakeAction<SYCLPostLinkJobAction>(
DeviceLinkAction, types::TY_TempEntriesfilelist);
SPIRVInput = SplitAction;
DeviceLinkAction = SplitAction;
WrapperInputs.push_back(EntryGenAction);
OutType = types::TY_Tempfilelist;
}
auto *SPIRVTranslateAction =
C.MakeAction<SPIRVTranslatorJobAction>(SPIRVInput, OutType);
auto isNVPTX = (*TC)->getTriple().isNVPTX();
if (isNVPTX) {
DeviceLinkAction =
finalizeNVPTXDependences(DeviceLinkAction, (*TC)->getTriple());
}
else
DeviceLinkAction =
C.MakeAction<SPIRVTranslatorJobAction>(DeviceLinkAction, OutType);

auto TT = SYCLTripleList[I];
bool SYCLAOTCompile =
Expand All @@ -3550,7 +3599,7 @@ class OffloadingActionBuilder final {
// triple calls for it (provided a valid subarch).
Action *DeviceBECompileAction;
ActionList BEActionList;
BEActionList.push_back(SPIRVTranslateAction);
BEActionList.push_back(DeviceLinkAction);
for (const auto &A : DeviceLibObjects)
BEActionList.push_back(A);
DeviceBECompileAction =
Expand All @@ -3561,11 +3610,12 @@ class OffloadingActionBuilder final {
DA.add(*DeviceWrappingAction, **TC, /*BoundArch=*/nullptr,
Action::OFK_SYCL);
} else {
WrapperInputs.push_back(SPIRVTranslateAction);
WrapperInputs.push_back(DeviceLinkAction);
auto *DeviceWrappingAction = C.MakeAction<OffloadWrapperJobAction>(
WrapperInputs, types::TY_Object);
DA.add(*DeviceWrappingAction, **TC, /*BoundArch=*/nullptr,
Action::OFK_SYCL);
withBoundArchForToolChain(*TC, [&](const char *BoundArch) {
DA.add(*DeviceWrappingAction, **TC, BoundArch, Action::OFK_SYCL);
});
}
++TC;
++I;
Expand Down Expand Up @@ -3596,6 +3646,43 @@ class OffloadingActionBuilder final {
}
}

/// Initialize the GPU architecture list from arguments - this populates `GpuArchList` from
/// `--cuda-gpu-arch` flags. Only relevant if compiling to CUDA. Return true if any
/// initialization errors are found.
bool initializeGpuArchMap() {
const OptTable &Opts = C.getDriver().getOpts();
for (auto *A : Args) {
unsigned Index;

if (A->getOption().matches(options::OPT_Xsycl_backend_EQ))
// Passing device args: -Xsycl-target-backend=<triple> -opt=val.
if (llvm::Triple(A->getValue(0)).isNVPTX())
Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
else
continue;
else if (A->getOption().matches(options::OPT_Xsycl_backend))
// Passing device args: -Xsycl-target-backend -opt=val.
Index = Args.getBaseArgs().MakeIndex(A->getValue(0));
else
continue;

A->claim();
auto ParsedArg = Opts.ParseOneArg(Args, Index);
// TODO: Support --no-cuda-gpu-arch, --{,no-}cuda-gpu-arch=all.
if (ParsedArg->getOption().matches(options::OPT_cuda_gpu_arch_EQ)) {
ParsedArg->claim();
GpuArchList.push_back(StringToCudaArch(ParsedArg->getValue(0)));
}
}

// If there are no CUDA architectures provided then default to SM_30.
if (GpuArchList.empty()) {
GpuArchList.push_back(CudaArch::SM_30);
}

return false;
}

bool initialize() override {
// Get the SYCL toolchains. If we don't get any, the action builder will
// know there is nothing to do related to SYCL offloading.
Expand Down Expand Up @@ -3671,7 +3758,7 @@ class OffloadingActionBuilder final {
? types::TY_FPGA_AOCR : types::TY_FPGA_AOCX;

DeviceLinkerInputs.resize(ToolChains.size());
return false;
return initializeGpuArchMap();
}

bool canUseBundlerUnbundler() const override {
Expand Down Expand Up @@ -6055,6 +6142,11 @@ const ToolChain &Driver::getOffloadingDeviceToolChain(const ArgList &Args,
TC = std::make_unique<toolchains::SYCLToolChain>(
*this, Target, HostTC, Args);
break;
case llvm::Triple::nvptx:
case llvm::Triple::nvptx64:
TC = std::make_unique<toolchains::CudaToolChain>(
*this, Target, HostTC, Args, TargetDeviceOffloadKind);
break;
default:
break;
}
Expand Down
7 changes: 5 additions & 2 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3998,7 +3998,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
}
}

const llvm::Triple *AuxTriple = IsCuda ? TC.getAuxTriple() : nullptr;
const llvm::Triple *AuxTriple = (IsSYCL || IsCuda) ? TC.getAuxTriple() : nullptr;
bool IsWindowsMSVC = RawTriple.isWindowsMSVCEnvironment();
bool IsIAMCU = RawTriple.isOSIAMCU();
bool IsSYCLDevice = (RawTriple.getEnvironment() == llvm::Triple::SYCLDevice);
Expand Down Expand Up @@ -4106,7 +4106,10 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
}
}

CmdArgs.push_back("-disable-llvm-passes");
if (Triple.isSPIR()) {
CmdArgs.push_back("-disable-llvm-passes");
}

if (Args.hasFlag(options::OPT_fsycl_allow_func_ptr,
options::OPT_fno_sycl_allow_func_ptr, false)) {
CmdArgs.push_back("-fsycl-allow-func-ptr");
Expand Down
Loading