Sassy is a library and tool for searching short strings in texts, a problem that goes by many names:
- approximate string matching,
- pattern matching,
- fuzzy searching.
The motivating application is searching short (length 20 to 100) DNA sequences in a human genome or e.g. in a set of reads. Sassy generally works well for patterns/queries up to length 1000, and supports both ASCII and DNA.
Highlights:
- Sassy uses bitpacking and SIMD (both AVX2 and NEON supported). Its main novelty is tiling these in the text direction.
- Support for overhang alignments where the pattern extends beyond the text.
- Support for (case-insensitive) ASCII, DNA (
ACGT), and IUPAC (=ACGT+NYR...) alphabets. - Rust library (
cargo add sassy), binary (cargo install sassy), Python bindings (pip install sassy-rs), and C bindings (see below).
See the paper below, and corresponding evals in evals/.
Rick Beeloo and Ragnar Groot Koerkamp.
Sassy: Searching Short DNA Strings in the 2020s.
bioRxiv, July 2025. https://doi.org/10.1101/2025.07.22.666207.
Sassy can be used via the CLI, or as Rust, Python, or C library.
The library can be used to search for ASCII or DNA strings.
A larger example can be found in src/lib.rs.
// cargo add sassy
use sassy::{Searcher, Match, profiles::{Dna}, Strand};
let pattern = b"ATCG";
let text = b"AAAATTGAAA";
let k = 1;
let mut searcher = Searcher::<Dna>::new_fwd();
let matches = searcher.search(pattern, &text, k);
assert_eq!(matches.len(), 1);
assert_eq!(matches[0].text_start, 3);
assert_eq!(matches[0].text_end, 7);
assert_eq!(matches[0].cost, 1);
assert_eq!(matches[0].strand, Strand::Fwd);
assert_eq!(matches[0].cigar.to_string(), "2=1X1=");Build and install using cargo
cargo install sassyThe CLI can be used via:
sassy grep: to show nicely coloured output.sassy search: to write a.tsvof matching locations.sassy filter: to write a.fasta/.fastqof (non)-matching records.sassy crispr: to search for CRISPR guides.
grep, search, and filter all take the same arguments, and are implemented
by forwarding to grep. Thus, they can all be combined via e.g.
sassy grep -p ACGTCAAACCTA -k 3 --matches matches.tsv --output filtered.fastq reads.fastq.gzSearch a pattern ATGAGCA in text.fasta with ≤1 edit:
sassy search --pattern ATGAGCA -k 1 text.fastaor search all records of a fasta file with --pattern-fasta <fasta-file> instead of --pattern.
The grep output is coloured:
- green shows matching characters,
- orange shows mismatches,
- red shows deleted characters (in pattern but not in text),
- blue shows inserted characters (in text but not in pattern).

sassy search -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fa > matches.tsv
# or
sassy search -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fa --matches matches.tsv
# or
sassy grep -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fa --matches matches.tsvgives .tsv output like this:
pat_id text_id cost strand start end match_region cigar
pattern AC_000001.1__1_1 0 + 6 48 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_35 0 + 897 939 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_49 1 + 866 908 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCGCGCG 37=1X4=
pattern AC_000001.1__1_64 0 - 1267 1309 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_67 0 + 600 642 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_68 0 - 1826 1868 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_78 3 - 4381 4425 GTACAGAAACGAGCGGATGGAAAATAGTAGTGAGCGGCCTCGCG 23=1X1I10=1I8=
pattern AC_000001.1__1_92 0 - 6554 6596 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_94 0 - 6413 6455 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_115 2 + 2091 2131 GTACAGAAACGAGCATGGAAAGAGTAGTGAGCGCCTCGCG 14=2D26=
pattern AC_000001.1__1_118 0 - 3062 3104 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_123 0 + 1416 1458 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_127 0 + 27 69 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=sassy filter -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fq > filtered.fq
# or
sassy filter -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fq -o filtered.fq
# or
sassy grep -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fq -o filtered.fqWrites a file containing only matching records. Use --invert to only
write non-matching records.
Search for one or more guides in guides.txt:
sassy crispr --threads 8 --guide guides.txt --k 5 --max-n-frac 0.1 --output hits.tsv hg38.fastaAllows <= k edits in the sgRNA, and the PAM (the last 3 characters of each guide) has to match exactly, unless --allow-pam-edits is given.
Output of the crispr command is a tab-delimited file with one row per hit, e.g.:
guide text_id cost strand start end match_region cigar
GAGTCCGAGCAGAAGAAGAANGG chr21 5 + 5024135 5024154 GAGGCCACAGAGAAGAGGG 3=1X2=1D1=1D3=1D5=1D4=
GAGTCCGAGCAGAAGAAGAANGG chr21 3 + 21087337 21087359 gagaccgaggagaagaaaaagg 3=1X5=1X7=1D5=
GAGTCCGAGCAGAAGAAGAANGG chr21 3 - 9701297 9701320 GACTCGAGCATGAAGAAGAAAGG 2=1X1=1D6=1I12=
GAGTCCGAGCAGAAGAAGAANGG chr21 5 - 46396975 46396998 CAGTCCCAGCAGACGACGGACGG 1X5=1X6=1X2=1X1=1X4=
The start and end are 0-based open-ended (i.e. 0-based inclusive of the
start, but exclusive of the end), and start is always less than end
(regardless of the strand). The
match_region reported will be the sequence from the target file when strand is +, or the reverse complement
of the sequence from the target file when strand is -, so that it matches the guide sequence.
The cigar is always oriented to read left-to-right with the provided guide and match_region sequences.
Note that this searches for approximate occurrences of the guide sequence itself, and not for reverse-complement binding sites. If binding sites are to be found, please reverse-complement the input or output manually.
PyPI wheels can be installed with:
pip install sassy-rs import sassy
pattern = b"ACTG"
text = b"ACGGCTACGCAGCATCATCAGCAT"
searcher = sassy.Searcher("dna") # ascii / dna / iupac
matches = searcher.search(pattern, text, k=1)
for m in matches:
print(m)See python/README.md for more details.
See c/README.md for details. Quick example:
#include "sassy.h"
int main() {
const char* pattern = "ACTG";
const char* text = "ACGGCTACGCAGCATCATCAGCAT";
// DNA alphabet, with reverse complement, without overhang.
sassy_SearcherType* searcher = sassy_searcher("dna", true, NAN);
sassy_Match* out_matches = NULL;
size_t n_matches = search(searcher,
pattern, strlen(pattern),
text, strlen(text),
1, // k=1
&out_matches);
sassy_matches_free(out_matches, n_matches);
sassy_searcher_free(searcher);
}