I’m in the process of starting a proper backup solution however over the years I’ve had a few copy-paste home directory from different systems as a quick and dirty solution. Now I have to pay my technical debt and remove the duplicates. I’m looking for a duplication removal tool.
- accept a destination directory
- source locations should be deleted after the operation
- if files content is the same then delete the redundant copy
- if files content is different, move and change the name to avoid name collision I tried doing it in nautilus but it does not look at the files content, only the file name. Eg if two photos have the same content but different name then it will also create a redundant copy.
I don’t actually know but I bet that’s relatively costly so I would at least try to be mindful of efficiency, e.g
find
to start only with large files, e.g > 1Gb (depends on your own threshold)then after trying a couple of times
and possibly heuristics e.g
Why do I suggest all this rather than a tool? Because I be a lot of decisions have to be manually made.
fclones https://github.com/pkolaczk/fclones looks great but I didn’t use it so can’t vouch for it.
I was using Radarr/Sonarr to download files via qBittorrent and then hardlink them to an organized directory for Jellyfin, but I set up my container volume mappings incorrectly and it was only copying the files over, not hardlinking them. When I realized this, I fixed the volume mappings and ended up using fclones to deduplicate the existing files and it was amazing. It did exactly what I needed it to and it did it fast. Highly recommend fclones.
I’ve used it on Windows as well, but I’ve had much more trouble there since I like to write the output to a file first to double check it before
cat
ting the information back into fclones to actually deduplicate the files it found. I think running everything as admin works but I don’t remember.if you use
rmlint
as others suggested here is how to check for path of dupesjq -c '.[] | select(.type == "duplicate_file").path' rmlint.json