找重複文件的Bash程序

Tue, 16 May 2017 23:59:39 +0000

寫了一個找重複文件的 Bash 腳本，通過比較文件大小和校验和來判斷文件是否（可能）是重複的：

程序

#!/usr/bin/env bash

## Summary: find duplicate files
## Meng Lu <lumeng.dev@gmail.com>

DIR=${1:-`pwd`} ## use provided path if available, otherwise the current path

FILENAME=`basename $0`

TMPFILE=`mktemp /tmp/${FILENAME}.XXXXXX` || exit 1

## one-line version
#find -P . -type f -exec cksum '{}' \; | sort | tee $TMPFILE | cut -f 1-2 -d ' ' | uniq -d | grep -if - $TMPFILE | sort -nr -t' ' -k2,2 | cut -f 3- -d ' ' | while read line; do ls -lhta "$line"; done

## multi-line version with comments
find -P . -type f -exec cksum '{}' \; | # find non-directory files and compute their checksum; -P: never follow symbolic links
sort | # sort by {checksum, file size, file name}
tee $TMPFILE | # save a copy in a temporary file and pass along
cut -f 1-2 -d ' ' | # keep only the checksum and file size
uniq -d | # remove uniq ones
grep -if - $TMPFILE | # greps from previously saved file list the lines of duplicate files identified by having same file size and checksum; - is from redirecting stdout to stdin
sort -nr -t' ' -k2,2 | # sort by descending file size
cut -f 3- -d ' ' | # keep only file name
while read line; do ls -lhta "$line"; done # do informative ls on all found duplicate files

GitHub 存檔。

註釋

find -P . -type f -exec cksum '{}' \;
- -P 不找符號鏈接文件（symbolic links）；
- -type f 找文件而非文件夾；
- -exec cksum '{}' \; 對每個找到的文件（'{}'）計算校驗和，cksum 輸出校驗和文件大小文件名，其中文件大小是八進制數個數；
sort 排序，爲 uniq 做準備；
tee $TMPFILE 把 stdout 流的內容一方面保存到臨時文件，一方面繼續沿着 pipe 傳遞到下游；
cut -f 1-2 -d ' ' 只保留第1、2欄，欄目以空格分；
uniq -d 刪除唯一的亦即無重複的行；
grep -if - $TMPFILE 通過 - 將輸出流轉換爲輸入流，在預存的文件目錄中找重複文件的{校驗和，文件大小}出現的行，注意，這裏的行包含文件名；
sort -nr -t' ' -k2,2 對找出的重複的文件按大小降序排序；
cut -f 3- -d ' ' 之保留保留文件名；
while read line; do ls -lhta "$line"; done 對每一文件打印詳細信息。

燕南的網絡日誌

找重複文件的Bash程序

程序

註釋

相關文章