Skip to content

Conversation

@jspaezp
Copy link
Contributor

@jspaezp jspaezp commented Apr 8, 2025

#166 (comment)

This PR removes the MS1Spectra and the generic peak/imspeak in ProcessedSpectrum

LMK if this is what you had in mind ... RN I am not the biggest fan of a couple of places in the processing where I am zipping->unzipping to sort the peaks (which is why I am leaving it as a draft for now).

In my benchmarking it is almost as fast as the other implementation (runtime is maybe 2% slower in my computer and that matches the number of instructions retired on some random data)

LMK what you think! (or feel free to take it as a base to go off ...)

@lazear
Copy link
Owner

lazear commented Apr 9, 2025

Thanks for the draft - while it might not be faster yet, it should unlock some better performance both in theory (cache locality, esp for binary search) and enable some further downstream optimizations.

Some ideas (not necessarily for us to implement now):

  • Reading from columnar data (naturally SOA), for example mzparquet
  • SOA could enable something like inline compression of data to further reduce RAM consumption
  • There are some optimizations around binary search (e.g. select_most_intense_peak) that could leverage the fact that we know the mz array is sorted

@jspaezp
Copy link
Contributor Author

jspaezp commented Apr 21, 2025

LMK if you want me to polish it to be a "real PR" (I can also work with the authors to rebase #173 and #181 ... since I would be adding again merge conflicts to those ... I think ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants