Restrict dataVals vowel onset to times that can have formant values#165
Open
Restrict dataVals vowel onset to times that can have formant values#165
Conversation
cwnaber
commented
Mar 10, 2026
|
|
||
| % find the index of ftrack_taxis that's closest to and greater than ampl_taxis's onset index | ||
| [~, onsetIndFtrack] = find(sigmat.ftrack_taxis - sigmat.ampl_taxis(onsetIndAmp)>0, 1); | ||
| if ~isempty(onsetIndFtrack) |
Collaborator
Author
There was a problem hiding this comment.
This change is already on the master branch, from when I used this dataVals-vowel-timing-fix branch before. Kind of confusing, sorry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background about time sampling
When generating dataVals.mat with
gen_dataVals_from_wave_viewer, we use two related time scales and sets of data:One difference between the two time scales is the first and last sample timepoint, even ignoring sampling rate differences. sigmat.ampl_taxis starts at zero, but sigmat_ftrack_taxis starts at around 72 milliseconds. This is due to needing to collect a certain amount of signal data to determine formant values.
Background about determining onset and offset
To determine the vowel onset and offset in a trial that has no user events,
gen_dataVals_from_wave_viewerfollows this process:get_onset_from_amplThe problem
I encountered an edge case pictured below, where the amplitude started above the threshold at the very beginning of the trial and then went below the threshold around 25 milliseconds in, before the formant tracking time scale started counting. Given our current code, the onset was set at time point 0, and the offset was set at time point 0.072, the first allowable value for the formant time axis. The code to determine the offset couldn't accommodate the offset being on the first sample of sigmat.ftrack_taxis and errored out.

Code error of the problem
In old L348, onsetIndFtrack was set to 1. Then in L355, offsetIndFtrack was set to 0, and finally in L359 offsetIndAmp couldn't be computed because sigmat.ftrack_taxis(offsetIndFtrack) tried to index on a value of 0.
Proposed solution
I think the root issue here is that we restrict the offset to allowable values for ftrack_taxis, but we don't do that for the onset. This pull request introduces a change to only allow the onset to be allowable values on ftrack_taxis. The onset will still be determined with the high sampling rate of sigmat.ampl, but will only consider values that also exist in ftrack_taxis.
Alternate solution
If we want to keep the onset-determining code the same, we could add a check to the offset-finding code to look for this edge case and error out helpfully, such as telling the user they need to put in user events.
Testing done
Since this is important code which we shouldn't change lightly, I made a script to compare dataVals.mat files created before this change and after this change. Ideally this code change should only fix edge cases, but not impact how the offset/onset are determined in "normal" trials. My simplistic measure was to see if the duration of a trial in dataVals.mat changed with the old vs new gen_dataVals_from_wave_viewer. I ran the script on 5 sample participants; this code change successfully did not affect the duration on any trials, except the one edge case which alerted me to the problem in the first place (pictured in The Problem.)