Skip to content

Detect negative seqlets#24

Open
LukasMahieu wants to merge 5 commits intomainfrom
fix_negative_seqlets
Open

Detect negative seqlets#24
LukasMahieu wants to merge 5 commits intomainfrom
fix_negative_seqlets

Conversation

@LukasMahieu
Copy link
Collaborator

This is an attempt to update tangermeme's recursive seqlet calling algorithm to correctly detect negative seqlets.
This fix implements two things:

  1. It now calculates a separate null distribution for testing the significance of negative seqlets with instead of a single global positive distribution
  2. we clamp outliers in the contribution scores since the binning logic was dominated by single large outlier values

The intended effect is that negative seqlets are now detected and small positive seqlets are now also detected (since previously the negative seqlets where still included in the left tail of the null distribution).

The speed difference is actually negligible when you're testing with many examples, since the main extra overhead comes from calculating an additional initial null distribution.

seqlet_performance_benchmark

This fix will need more testing. It's difficult to benchmark this apart from just going through the examples one by one and comparing tangermeme with this new fix.

There are still improvements to be made. Especially detecting long seqlets as single seqlets instead of two overlapping seqlets seems to be difficult still.

Here are the first 6 examples from the test data in the unit tests to give an idea (above is new, below is tangermeme).

example_seqlet_logo_0 example_tangermeme_seqlet_logo_0 example_seqlet_logo_1 example_tangermeme_seqlet_logo_1 example_seqlet_logo_2 example_tangermeme_seqlet_logo_2 example_seqlet_logo_3 example_tangermeme_seqlet_logo_3 example_seqlet_logo_4 example_tangermeme_seqlet_logo_4 example_seqlet_logo_5 example_tangermeme_seqlet_logo_5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant