Script to delete duplicate files - Programming On Unix

Users browsing this thread: 1 Guest(s)
z3bra
Grey Hair Nixers
As you learnt scripting on your own, it's rather good, so kuddos for that!
Over the years, I've written a bunch of scripts, and still use a most of them on a day to day basis, either interactively, in pipes or through cronjobs.

Here are a few tips I can give you, based on my experience:

0. Think small. Scripts are supposed to glue programs together, not be programs on their own. Think your ouptut to be useful to other programs.
1. Be quiet. Logging looks cool, especially with "...." and green "OK" or red "FAIL", but they definitely don't help in pipelines.
2. Avoid using 'rm' in shell scripts. I tend to make my script be selectors or filters, so that I can inspect the output beforhand, and append "xargs rm" after that.
3. Avoid interactivity. The best tools are the one that are automated and can "think" on their own.

Note that these tipa are totally subjective, it's based on my own experience, so it might differ for other people.

Now, I know that these are rather abstract, so here is an attemot at doing the same, "my way" ;) (written directly from memory, without testing of course :P)

Code:
#!/bin/sh
# read file list from input, output all duplicates on a single line, separated by tabs
# eg: find duplicates and only keep one:
# find . -type f | ./getdup | cut -f2- | xargs rm

# write hash + filename in a temp file, sorted by hash
TMP=$(mktemp)
xargs -n1 sha1sum | sort > $TMP

# find duplicate hash, and match them in the list
for SHA1 in $(cut -d' ' -f1 < $TMP | uniq -d); do
    # print all files for each hash separated by tabs
    # assume files don't include spaces, of course...
    grep $SHA1 < $TMP | cut -d' ' -f2- | xargs echo | tr ' ' '\t'
done

rm $TMP

This basically only transfom the input, and let the filesystem untouched, so even if the script is messed up, I cannot delete any file, loose data or corrupt it (because I know how bad I can be at coding :P)

Anyway, keep scripting!

EDIT: OMG, it works!


Messages In This Thread
Script to delete duplicate files - by pkal - 17-11-2017, 06:01 PM
RE: Script to delete duplicate files - by z3bra - 18-11-2017, 07:07 PM
RE: Script to delete duplicate files - by pkal - 18-11-2017, 09:26 PM
RE: Script to delete duplicate files - by z3bra - 19-11-2017, 03:38 PM
RE: Script to delete duplicate files - by pkal - 19-11-2017, 04:50 PM
RE: Script to delete duplicate files - by budRich - 19-11-2017, 10:28 PM
RE: Script to delete duplicate files - by z3bra - 20-11-2017, 07:38 AM
RE: Script to delete duplicate files - by budRich - 20-11-2017, 09:25 AM
RE: Script to delete duplicate files - by venam - 20-11-2017, 09:40 AM
RE: Script to delete duplicate files - by budRich - 20-11-2017, 09:55 AM
RE: Script to delete duplicate files - by z3bra - 20-11-2017, 11:09 AM