Deduplicate files at fast speeds! Written in RUST.
- Well rust.
- Input lines directly streamed to the processing threads without collecting them all first.
- Partitions the hash space to reduce lock contention.
In the below test we utilise a small 75mb file (else we wait too long for hyperfine) with 1 595 966 lines of data.

When we up the anty a little bit going to large files 2.3gb we see some improvements.

When we compare with the likes of duplicut (https://github.com/nil0x42/duplicut) some significant improvements can be seen, however, I'm not sure if this boils down to the rust usage over c.

cat file.txt | rustdedup
rustdedup -i /diska9.txtextra.csvmodded.csv -o output2.txt