Skip to content

Restrict dataVals vowel onset to times that can have formant values#165

Open
cwnaber wants to merge 2 commits intomasterfrom
dataVals-vowel-timing-fix
Open

Restrict dataVals vowel onset to times that can have formant values#165
cwnaber wants to merge 2 commits intomasterfrom
dataVals-vowel-timing-fix

Conversation

@cwnaber
Copy link
Collaborator

@cwnaber cwnaber commented Mar 10, 2026

Background about time sampling

When generating dataVals.mat with gen_dataVals_from_wave_viewer, we use two related time scales and sets of data:

  1. those related to the raw audio signal, which normally have a sampling rate at 16kHz. For example, sigmat.ampl and sigmat.ampl_taxis
  2. Those related to the formant tracking, which normally have a sampling rate of 333.3 Hz. For example, sigmat.ftrack_taxis or eventually dataVals.f1

One difference between the two time scales is the first and last sample timepoint, even ignoring sampling rate differences. sigmat.ampl_taxis starts at zero, but sigmat_ftrack_taxis starts at around 72 milliseconds. This is due to needing to collect a certain amount of signal data to determine formant values.

Background about determining onset and offset

To determine the vowel onset and offset in a trial that has no user events, gen_dataVals_from_wave_viewer follows this process:

  1. To find the onset, find the first sample in sigmat.ampl where the amplitude is above the amplitude threshold. This happens in old line 306, in embedded function get_onset_from_ampl
  2. To find the offset, it's more complicated, but in short, look for the first time after the onset that the amplitude drops below the amplitude threshold, and only pick a value that could exist in sigmat.ftrack_taxis

The problem

I encountered an edge case pictured below, where the amplitude started above the threshold at the very beginning of the trial and then went below the threshold around 25 milliseconds in, before the formant tracking time scale started counting. Given our current code, the onset was set at time point 0, and the offset was set at time point 0.072, the first allowable value for the formant time axis. The code to determine the offset couldn't accommodate the offset being on the first sample of sigmat.ftrack_taxis and errored out.
561061609-3703f8fa-d3bb-4bfa-b26b-046fcb35fbc4

Code error of the problem

In old L348, onsetIndFtrack was set to 1. Then in L355, offsetIndFtrack was set to 0, and finally in L359 offsetIndAmp couldn't be computed because sigmat.ftrack_taxis(offsetIndFtrack) tried to index on a value of 0.

Proposed solution

I think the root issue here is that we restrict the offset to allowable values for ftrack_taxis, but we don't do that for the onset. This pull request introduces a change to only allow the onset to be allowable values on ftrack_taxis. The onset will still be determined with the high sampling rate of sigmat.ampl, but will only consider values that also exist in ftrack_taxis.

Alternate solution

If we want to keep the onset-determining code the same, we could add a check to the offset-finding code to look for this edge case and error out helpfully, such as telling the user they need to put in user events.

Testing done

Since this is important code which we shouldn't change lightly, I made a script to compare dataVals.mat files created before this change and after this change. Ideally this code change should only fix edge cases, but not impact how the offset/onset are determined in "normal" trials. My simplistic measure was to see if the duration of a trial in dataVals.mat changed with the old vs new gen_dataVals_from_wave_viewer. I ran the script on 5 sample participants; this code change successfully did not affect the duration on any trials, except the one edge case which alerted me to the problem in the first place (pictured in The Problem.)

@cwnaber cwnaber requested a review from carrien March 10, 2026 16:48
@cwnaber cwnaber marked this pull request as ready for review March 10, 2026 16:48

% find the index of ftrack_taxis that's closest to and greater than ampl_taxis's onset index
[~, onsetIndFtrack] = find(sigmat.ftrack_taxis - sigmat.ampl_taxis(onsetIndAmp)>0, 1);
if ~isempty(onsetIndFtrack)
Copy link
Collaborator Author

@cwnaber cwnaber Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is already on the master branch, from when I used this dataVals-vowel-timing-fix branch before. Kind of confusing, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant