Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions sycl/doc/SYCLInstrumentationUsingXPTI.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,3 +256,12 @@ All trace point types in bold provide semantic information about the graph, node
| `wait_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::wait_end` that marks the beginning of the wait on an `event`</li> <li> **parent**: `nullptr`</li> <li> **event**: The event ID will reflect the ID of the command group object submission that created this event or a new event based on the combination of the string "queue.wait" and the address of the event. </li> <li> **instance**: Unique ID to allow the correlation of the `wait_begin` event with the `wait_end` event. </li> <li> **user_data**: String indicating `queue.wait` and the address of the event as `const char *` </li></div> | **`sycl_device`**, `sym_function_name`, `sym_source_file_name`, `sym_line_no` |
| `barrier_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_begin` that marks the beginning of a barrier while enqueuing a command group object</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |
| `barrier_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_end` that marks the end of the barrier that is encountered during enqueue.</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |

## Level Zero Plugin Stream `"oneapi.level_zero.experimental.mem_alloc"` Notification Signatures

| Trace Point Type | Parameter Description | Metadata |
| :------------------------: | :-------------------- | :------- |
| `mem_alloc_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_alloc_begin` that marks the beginning of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_alloc_begin` event with the `mem_alloc_end` event. </li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any), allocation size, and guard zone size (if any). </li></div> | None |
| `mem_alloc_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_alloc_end` that marks the end of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_alloc_begin` event with the `mem_alloc_end` event. This value is guaranteed to be the same value received by the trace event for the corresponding `mem_alloc_begin`.</li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any), allocated pointer, allocation size, and guard zone size (if any). </li></div> | None |
| `mem_release_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_release_begin` that marks the beginning of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_release_begin` event with the `mem_release_end` event. </li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any) and released pointer. </li></div> | None |
| `mem_release_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_release_end` that marks the end of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_release_begin` event with the `mem_release_end` event. This value is guaranteed to be the same value received by the trace event for the corresponding `mem_release_begin`.</li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any) and released pointer. </li></div> | None |
11 changes: 6 additions & 5 deletions sycl/source/detail/device_image_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include <detail/context_impl.hpp>
#include <detail/device_impl.hpp>
#include <detail/kernel_id_impl.hpp>
#include <detail/mem_alloc_helper.hpp>
#include <detail/plugin.hpp>
#include <detail/program_manager/program_manager.hpp>

Expand Down Expand Up @@ -185,11 +186,11 @@ class device_image_impl {
std::lock_guard<std::mutex> Lock{MSpecConstAccessMtx};
if (nullptr == MSpecConstsBuffer && !MSpecConstsBlob.empty()) {
const detail::plugin &Plugin = getSyclObjImpl(MContext)->getPlugin();
Plugin.call<PiApiKind::piMemBufferCreate>(
detail::getSyclObjImpl(MContext)->getHandleRef(),
PI_MEM_FLAGS_ACCESS_RW | PI_MEM_FLAGS_HOST_PTR_USE,
MSpecConstsBlob.size(), MSpecConstsBlob.data(), &MSpecConstsBuffer,
nullptr);
memBufferCreateHelper(Plugin,
detail::getSyclObjImpl(MContext)->getHandleRef(),
PI_MEM_FLAGS_ACCESS_RW | PI_MEM_FLAGS_HOST_PTR_USE,
MSpecConstsBlob.size(), MSpecConstsBlob.data(),
&MSpecConstsBuffer, nullptr);
}
return MSpecConstsBuffer;
}
Expand Down
32 changes: 32 additions & 0 deletions sycl/source/detail/mem_alloc_helper.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
//==-------- mem_alloc_helper.hpp - SYCL mem alloc helper ------------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#pragma once

#include <CL/sycl/detail/pi.h>

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace detail {
void memBufferCreateHelper(const plugin &Plugin, pi_context Ctx,
pi_mem_flags Flags, size_t Size, void *HostPtr,
pi_mem *RetMem,
const pi_mem_properties *Props = nullptr);
void memReleaseHelper(const plugin &Plugin, pi_mem Mem);
void memBufferMapHelper(const plugin &Plugin, pi_queue command_queue,
pi_mem buffer, pi_bool blocking_map,
pi_map_flags map_flags, size_t offset, size_t size,
pi_uint32 num_events_in_wait_list,
const pi_event *event_wait_list, pi_event *event,
void **ret_map);
void memUnmapHelper(const plugin &Plugin, pi_queue command_queue, pi_mem memobj,
void *mapped_ptr, pi_uint32 num_events_in_wait_list,
const pi_event *event_wait_list, pi_event *event);
} // namespace detail
} // namespace sycl
} // __SYCL_INLINE_NAMESPACE(cl)
199 changes: 188 additions & 11 deletions sycl/source/detail/memory_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,106 @@
#include <CL/sycl/detail/memory_manager.hpp>
#include <detail/context_impl.hpp>
#include <detail/event_impl.hpp>
#include <detail/mem_alloc_helper.hpp>
#include <detail/queue_impl.hpp>

#include <algorithm>
#include <cassert>
#include <cstring>
#include <vector>

#ifdef XPTI_ENABLE_INSTRUMENTATION
#include <xpti/xpti_data_types.h>
#include <xpti/xpti_trace_framework.hpp>
#endif

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace detail {

#ifdef XPTI_ENABLE_INSTRUMENTATION
uint8_t GMemAllocStreamID;
xpti::trace_event_data_t *GMemAllocEvent;
#endif

uint64_t emitMemAllocBeginTrace(uintptr_t ObjHandle, size_t AllocSize,
size_t GuardZone) {
(void)ObjHandle;
(void)AllocSize;
(void)GuardZone;
uint64_t CorrelationID = 0;
#ifdef XPTI_ENABLE_INSTRUMENTATION
if (xptiTraceEnabled()) {
xpti::mem_alloc_data_t MemAlloc{ObjHandle, 0 /* alloc ptr */, AllocSize,
GuardZone};

CorrelationID = xptiGetUniqueId();
xptiNotifySubscribers(
GMemAllocStreamID,
static_cast<uint16_t>(xpti::trace_point_type_t::mem_alloc_begin),
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
}
#endif
return CorrelationID;
}

void emitMemAllocEndTrace(uintptr_t ObjHandle, uintptr_t AllocPtr,
size_t AllocSize, size_t GuardZone,
uint64_t CorrelationID) {
(void)ObjHandle;
(void)AllocPtr;
(void)AllocSize;
(void)GuardZone;
(void)CorrelationID;
#ifdef XPTI_ENABLE_INSTRUMENTATION
if (xptiTraceEnabled()) {
xpti::mem_alloc_data_t MemAlloc{ObjHandle, AllocPtr, AllocSize, GuardZone};

xptiNotifySubscribers(
GMemAllocStreamID,
static_cast<uint16_t>(xpti::trace_point_type_t::mem_alloc_end),
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
}
#endif
}

uint64_t emitMemReleaseBeginTrace(uintptr_t ObjHandle, uintptr_t AllocPtr) {
(void)ObjHandle;
(void)AllocPtr;
#ifdef XPTI_ENABLE_INSTRUMENTATION
uint64_t CorrelationID = 0;
if (xptiTraceEnabled()) {
xpti::mem_alloc_data_t MemAlloc{ObjHandle, AllocPtr, 0 /* alloc size */,
0 /* guard zone */};

CorrelationID = xptiGetUniqueId();
xptiNotifySubscribers(
GMemAllocStreamID,
static_cast<uint16_t>(xpti::trace_point_type_t::mem_release_begin),
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
}
#endif
return CorrelationID;
}

void emitMemReleaseEndTrace(uintptr_t ObjHandle, uintptr_t AllocPtr,
uint64_t CorrelationID) {
(void)ObjHandle;
(void)AllocPtr;
(void)CorrelationID;
#ifdef XPTI_ENABLE_INSTRUMENTATION
if (xptiTraceEnabled()) {
xpti::mem_alloc_data_t MemAlloc{ObjHandle, AllocPtr, 0 /* alloc size */,
0 /* guard zone */};

xptiNotifySubscribers(
GMemAllocStreamID,
static_cast<uint16_t>(xpti::trace_point_type_t::mem_release_end),
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
}
#endif
}

static void waitForEvents(const std::vector<EventImplPtr> &Events) {
// Assuming all events will be on the same device or
// devices associated with the same Backend.
Expand All @@ -34,6 +123,97 @@ static void waitForEvents(const std::vector<EventImplPtr> &Events) {
}
}

void memBufferCreateHelper(const plugin &Plugin, pi_context Ctx,
pi_mem_flags Flags, size_t Size, void *HostPtr,
pi_mem *RetMem, const pi_mem_properties *Props) {
uint64_t CorrID = 0;
// We only want to instrument piMemBufferCreate
{
CorrID =
emitMemAllocBeginTrace(0 /* mem object */, Size, 0 /* guard zone */);
xpti::utils::finally _{[&] {
// C-style cast is required for MSVC
uintptr_t MemObjID = (uintptr_t)(*RetMem);
pi_native_handle Ptr = 0;
// Always use call_nocheck here, because call may throw an exception,
// and this lambda will be called from destructor, which in combination
// rewards us with UB.
Plugin.call_nocheck<PiApiKind::piextMemGetNativeHandle>(*RetMem, &Ptr);
emitMemAllocEndTrace(MemObjID, (uintptr_t)(Ptr), Size, 0 /* guard zone */,
CorrID);
}};
Plugin.call<PiApiKind::piMemBufferCreate>(Ctx, Flags, Size, HostPtr, RetMem,
Props);
}
}

void memReleaseHelper(const plugin &Plugin, pi_mem Mem) {
// FIXME piMemRelease does not guarante memory release. It is only true if
// reference counter is 1. However, SYCL runtime currently only calls
// piMemRetain only for OpenCL interop
uint64_t CorrID = 0;
// C-style cast is required for MSVC
uintptr_t MemObjID = (uintptr_t)(Mem);
uintptr_t Ptr = 0;
// Do not make unnecessary PI calls without instrumentation enabled
if (xptiTraceEnabled()) {
pi_native_handle PtrHandle = 0;
Plugin.call<PiApiKind::piextMemGetNativeHandle>(Mem, &PtrHandle);
Ptr = (uintptr_t)(PtrHandle);
}
// We only want to instrument piMemRelease
{
CorrID = emitMemReleaseBeginTrace(MemObjID, Ptr);
xpti::utils::finally _{
[&] { emitMemReleaseEndTrace(MemObjID, Ptr, CorrID); }};
Plugin.call<PiApiKind::piMemRelease>(Mem);
}
}

void memBufferMapHelper(const plugin &Plugin, pi_queue Queue, pi_mem Buffer,
pi_bool Blocking, pi_map_flags Flags, size_t Offset,
size_t Size, pi_uint32 NumEvents,
const pi_event *WaitList, pi_event *Event,
void **RetMap) {
uint64_t CorrID = 0;
uintptr_t MemObjID = (uintptr_t)(Buffer);
// We only want to instrument piEnqueueMemBufferMap
{
CorrID = emitMemAllocBeginTrace(MemObjID, Size, 0 /* guard zone */);
xpti::utils::finally _{[&] {
emitMemAllocEndTrace(MemObjID, (uintptr_t)(*RetMap), Size,
0 /* guard zone */, CorrID);
}};
Plugin.call<PiApiKind::piEnqueueMemBufferMap>(
Queue, Buffer, Blocking, Flags, Offset, Size, NumEvents, WaitList,
Event, RetMap);
}
}

void memUnmapHelper(const plugin &Plugin, pi_queue Queue, pi_mem Mem,
void *MappedPtr, pi_uint32 NumEvents,
const pi_event *WaitList, pi_event *Event) {
uint64_t CorrID = 0;
uintptr_t MemObjID = (uintptr_t)(Mem);
uintptr_t Ptr = (uintptr_t)(MappedPtr);
// We only want to instrument piEnqueueMemUnmap
{
CorrID = emitMemReleaseBeginTrace(MemObjID, Ptr);
xpti::utils::finally _{[&] {
// There's no way for SYCL to know, when the pointer is freed, so we have
// to explicitly wait for the end of data transfers here in order to
// report correct events.
// Always use call_nocheck here, because call may throw an exception,
// and this lambda will be called from destructor, which in combination
// rewards us with UB.
Plugin.call_nocheck<PiApiKind::piEventsWait>(1, Event);
emitMemReleaseEndTrace(MemObjID, Ptr, CorrID);
}};
Plugin.call<PiApiKind::piEnqueueMemUnmap>(Queue, Mem, MappedPtr, NumEvents,
WaitList, Event);
}
}

void MemoryManager::release(ContextImplPtr TargetContext, SYCLMemObjI *MemObj,
void *MemAllocation,
std::vector<EventImplPtr> DepEvents,
Expand Down Expand Up @@ -67,7 +247,7 @@ void MemoryManager::releaseMemObj(ContextImplPtr TargetContext,
}

const detail::plugin &Plugin = TargetContext->getPlugin();
Plugin.call<PiApiKind::piMemRelease>(pi::cast<RT::PiMem>(MemAllocation));
memReleaseHelper(Plugin, pi::cast<RT::PiMem>(MemAllocation));
}

void *MemoryManager::allocate(ContextImplPtr TargetContext, SYCLMemObjI *MemObj,
Expand Down Expand Up @@ -165,9 +345,8 @@ MemoryManager::allocateBufferObject(ContextImplPtr TargetContext, void *UserPtr,

RT::PiMem NewMem = nullptr;
const detail::plugin &Plugin = TargetContext->getPlugin();
Plugin.call<PiApiKind::piMemBufferCreate>(TargetContext->getHandleRef(),
CreationFlags, Size, UserPtr,
&NewMem, nullptr);
memBufferCreateHelper(Plugin, TargetContext->getHandleRef(), CreationFlags,
Size, UserPtr, &NewMem, nullptr);
return NewMem;
}

Expand Down Expand Up @@ -623,10 +802,9 @@ void *MemoryManager::map(SYCLMemObjI *, void *Mem, QueueImplPtr Queue,
void *MappedPtr = nullptr;
const size_t BytesToMap = AccessRange[0] * AccessRange[1] * AccessRange[2];
const detail::plugin &Plugin = Queue->getPlugin();
Plugin.call<PiApiKind::piEnqueueMemBufferMap>(
Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem), CL_FALSE, Flags,
AccessOffset[0], BytesToMap, DepEvents.size(), DepEvents.data(),
&OutEvent, &MappedPtr);
memBufferMapHelper(Plugin, Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem),
CL_FALSE, Flags, AccessOffset[0], BytesToMap,
DepEvents.size(), DepEvents.data(), &OutEvent, &MappedPtr);
return MappedPtr;
}

Expand All @@ -639,9 +817,8 @@ void MemoryManager::unmap(SYCLMemObjI *, void *Mem, QueueImplPtr Queue,
// Using the plugin of the Queue.

const detail::plugin &Plugin = Queue->getPlugin();
Plugin.call<PiApiKind::piEnqueueMemUnmap>(
Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem), MappedPtr,
DepEvents.size(), DepEvents.data(), &OutEvent);
memUnmapHelper(Plugin, Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem),
MappedPtr, DepEvents.size(), DepEvents.data(), &OutEvent);
}

void MemoryManager::copy_usm(const void *SrcMem, QueueImplPtr SrcQueue,
Expand Down
14 changes: 13 additions & 1 deletion sycl/source/detail/xpti_registry.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,24 @@ inline constexpr const char *SYCL_PICALL_STREAM_NAME = "sycl.pi";
// Stream name being used for traces generated from PI calls. This stream
// contains information about function arguments.
inline constexpr const char *SYCL_PIDEBUGCALL_STREAM_NAME = "sycl.pi.debug";
inline constexpr auto SYCL_MEM_ALLOC_STREAM_NAME =
"sycl.experimental.mem_alloc";

#ifdef XPTI_ENABLE_INSTRUMENTATION
extern uint8_t GMemAllocStreamID;
extern xpti::trace_event_data_t *GMemAllocEvent;
#endif

class XPTIRegistry {
public:
void initializeFrameworkOnce() {
#ifdef XPTI_ENABLE_INSTRUMENTATION
std::call_once(MInitialized, [] { xptiFrameworkInitialize(); });
std::call_once(MInitialized, [this] {
xptiFrameworkInitialize();

// Memory allocation events
initializeStream(SYCL_MEM_ALLOC_STREAM_NAME, 0, 1, "0.1");
});
#endif
}

Expand Down
Loading