Skip to content

DB-7: Add support for event-based preprocessing#71

Open
nmy2103 wants to merge 18 commits intodevfrom
feature/DB-7
Open

DB-7: Add support for event-based preprocessing#71
nmy2103 wants to merge 18 commits intodevfrom
feature/DB-7

Conversation

@nmy2103
Copy link
Collaborator

@nmy2103 nmy2103 commented Mar 6, 2026

Link to Issue

Summary

Previously, the PhysioView Dashboard supported preprocessing with only time-based data segmentation. This PR adds support for event-based segmentation and preprocessing. It also includes a significant refactor of the original dashboard.utils.py module. All utility functions are now modularized within the dashboard._core subpackage.

@nmy2103 nmy2103 requested a review from ynwtnb March 6, 2026 19:22
@nmy2103 nmy2103 self-assigned this Mar 6, 2026
@nmy2103 nmy2103 added the enhancement New feature or request label Mar 6, 2026
@ynwtnb
Copy link
Collaborator

ynwtnb commented Mar 11, 2026

Do we want to prevent users from using duplicate task names? For example, when I loaded an event file like this:

event,start,end
Rest,1763574922.999363,1763575527.589143
Mental (sedentary),1763575588.22549,1763575891.841346
Rest,1763575908.596285,1763576216.07174
:

it combined all "Rest" tasks, and the dashboard shows a segment like this around the time between the end of the previous Rest task and the start of the next Rest task.

Screenshot 2026-03-11 at 11 55 31 AM

I think we should either:

  • Prevent users from entering duplicate event names
  • Treat each row in the event file as a single event and do not combine them even if there are duplicate event names (In my opinion this is a better option)

@ynwtnb
Copy link
Collaborator

ynwtnb commented Mar 11, 2026

Also, the dashboard shows unexpected behavior when such a task with a duplicate name is loaded. Once the rest task is loaded, when I open another task, it shows a blank segment, although it shows valid segments again when I re-select segments.

screen_recording_trimmed.mov

@ynwtnb
Copy link
Collaborator

ynwtnb commented Mar 11, 2026

Maybe we should revise this description so users can understand that "batch" mode is not just for exporting all participants but also exporting all tasks if files are separated by tasks?

Screenshot 2026-03-11 at 12 23 20 PM

Also, when I use the "single" export mode, it currently saves a zip file whose name is something like <filename>_<date>_<timestamp>.zip. Each csv file inside the zipped folder does have event names, but I wonder if it'd be more helpful to include the event name in the zip file as well. (Same for SQA summary)

@ynwtnb
Copy link
Collaborator

ynwtnb commented Mar 11, 2026

I tried using two EDA files, but both files showed errors like this:

Traceback (most recent call last):
  File "/Users/yuna.w/Research/CBSL/heartview/physioview/dashboard/callbacks.py", line 1434, in run_pipeline
    metrics_by_event = preprocessor.preprocess_event(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yuna.w/Research/CBSL/heartview/physioview/dashboard/_core/preprocessing.py", line 306, in preprocess_event
    for event_label, event_data in preprocessed.groupby('Event'):
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yuna.w/Research/CBSL/heartview/.venv/lib/python3.12/site-packages/pandas/core/frame.py", line 9190, in groupby
    return DataFrameGroupBy(
           ^^^^^^^^^^^^^^^^^
  File "/Users/yuna.w/Research/CBSL/heartview/.venv/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 1330, in __init__
    grouper, exclusions, obj = get_grouper(
                               ^^^^^^^^^^^^
  File "/Users/yuna.w/Research/CBSL/heartview/.venv/lib/python3.12/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
    raise KeyError(gpr)
KeyError: 'Event'

There seems to be something wrong with the EDA event processing.

@ynwtnb
Copy link
Collaborator

ynwtnb commented Mar 11, 2026

I personally feel that this part is a little too cluttered, and if the filename itself is very long, task names may be harder to distinguish.

Screenshot 2026-03-11 at 2 16 51 PM

Because currently, the batch mode does not support event-based processing, I feel like it might be better to just have task names instead of filename + task name.

Comment on lines +1093 to +1095
preprocessor = _core.Preprocessor(
dtype, fs, filt_on, peak_detector, event_df, seg_size,
filter_kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this error when I tried to run the pipeline for E4 (both for PPG and EDA).

dash.exceptions.LongCallbackError: An error occurred inside a long callback: dtype must be one of 'ECG', 'EDA', 'PPG'.
Traceback (most recent call last):
  File "/Users/yuna.w/Research/CBSL/heartview/.venv/lib/python3.12/site-packages/dash/long_callback/managers/diskcache_manager.py", line 194, in run
    user_callback_output = fn(*maybe_progress, *user_callback_args)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yuna.w/Research/CBSL/heartview/physioview/dashboard/callbacks.py", line 1093, in run_pipeline
    preprocessor = _core.Preprocessor(
                   ^^^^^^^^^^^^^^^^^^^
  File "/Users/yuna.w/Research/CBSL/heartview/physioview/dashboard/_core/preprocessing.py", line 100, in __init__
    raise ValueError(f"dtype must be one of 'ECG', 'EDA', 'PPG'.")
ValueError: dtype must be one of 'ECG', 'EDA', 'PPG'.

The reason is probably because you're passing dtype, not e4_dtype but dtype is None if the data is from E4?

Comment on lines +284 to +297
# === Toggle segmentation settings ========================================
@app.callback(
[Output('segment-data-by-event', 'hidden'),
Output('segment-data-by-time', 'hidden')],
Input('segment-by', 'value'),
prevent_initial_call = True
)
def toggle_data_segmentation(segment_by):
"""Toggle the data segmentation settings by time or event."""
if segment_by == 'time':
return True, False
elif segment_by == 'event':
return False, False

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a little confusing to let users choose between "segment by time" and "segment by event", and yet you still show segment size even when they select "segment by event"?
I understand that this is because you still segment the signal into smaller chunks even when you split the data into tasks, but if that's the case, I thought it might be more intuitive to have a separate checkbox to toggle on/off event-based signal splitting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I'll change the event-based segmentation component(s) to a toggle instead.

])
]),
html.Div(id = 'segment-data', children = [
html.Div(id = 'segment-data-overview', children = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure which code is the culprit, but it seems like the segmenting option is completely hidden when E4 or Actigraph files are imported. Is it intentional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants