rustdedup

Deduplicate files at fast speeds! Written in RUST.

Well rust.
Input lines directly streamed to the processing threads without collecting them all first.
Partitions the hash space to reduce lock contention.

In the below test we utilise a small 75mb file (else we wait too long for hyperfine) with 1 595 966 lines of data.

When we up the anty a little bit going to large files 2.3gb we see some improvements.

When we compare with the likes of duplicut (https://github.com/nil0x42/duplicut) some significant improvements can be seen, however, I'm not sure if this boils down to the rust usage over c.

cat file.txt | rustdedup

rustdedup -i /diska9.txtextra.csvmodded.csv -o output2.txt

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
src		src
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback