对于非常大的文件或者很重要的文件,在不稳定的网络环境下,可能文件的某些字节会损坏。此时,对文件计算MD5即可以校验其完整性。比如本次的 OpenStreetMap 导出包,我的学弟反馈说,有朋友通过网盘下载无法解压,并建议我增加每个文件的MD5校验。

对于文件非常多的情况,需要批量计算。尤其是对200多GB的分卷压缩包,更可能传输过程中出错。最简便的方法是使用git自带的md5sum进行计算,看看哪个错了,再单独下载一次。

1. 安装git并进入bash

到 https://git-scm.com/ 下载git,并安装。

安装后,右键单击网盘下载的文件夹,选择“git bash” 进入bash:



可以查看 md5sum的说明

$ md5sum --helpUsage: md5sum [OPTION]... [FILE]...Print or check MD5 (128-bit) checksums.With no FILE, or when FILE is -, read standard input.-b, --binary read in binary mode (default unless reading tty stdin)-c, --checkread MD5 sums from the FILEs and check them--tagcreate a BSD-style checksum-t, --text read in text mode (default if reading tty stdin)-z, --zero end each output line with NUL, not newline, and disable file name escapingThe following five options are useful only when verifying checksums:--ignore-missingdon't fail or report status for missing files--quietdon't print OK for each successfully verified file--status don't output anything, status code shows success--strict exit non-zero for improperly formatted checksum lines-w, --warn warn about improperly formatted checksum lines--help display this help and exit--versionoutput version information and exitThe sums are computed as described in RFC 1321.When checking, the inputshould be a former output of this program.The default mode is to print aline with checksum, a space, a character indicating input mode ('*' for binary,' ' for text or where binary is insignificant), and name for each FILE.Note: There is no difference between binary mode and text mode on GNU systems.GNU coreutils online help: Report any translation bugs to Full documentation or available locally via: info '(coreutils) md5sum invocation'

2. 批量计算md5

Linux常见命令 find 能够枚举文件并批量执行指令。

执行下面的指令,可以在屏幕输出各个文件的md5:

$ find ./Arch*.* -exec md5sum {} \;d060dd81785d957ae4e2bbd4f9ebeb4e *./ArchOSManjaro.7z.001b7326e73452d3fbbc56a889f55aa9a14 *./ArchOSManjaro.7z.002805c9ef68887953554c6c160c2a72eeb *./ArchOSManjaro.7z.003#...2cc5ab567abba1d7e3a284ec5c383d84 *./ArchOSManjaro.7z.059$

执行下面的指令,可以在文件输出各个文件的md5:

$ find ./Arch*.* -exec md5sum {} >> md5.txt \;

3.比较两个文件是否一致

我们假设本地校验结果放在check.txt,标准校验结果放在 md5.txt,则使用下面指令比较:

$ diff --helpUsage: diff [OPTION]... FILESCompare FILES line by line.Mandatory arguments to long options are mandatory for short options too.--normaloutput a normal diff (the default)-q, --brief report only when files differ-s, --report-identical-filesreport when two files are the same-c, -C NUM, --context[=NUM] output NUM (default 3) lines of copied context-u, -U NUM, --unified[=NUM] output NUM (default 3) lines of unified context-e, --edoutput an ed script-n, --rcs output an RCS format diff-y, --side-by-sideoutput in two columns-W, --width=NUM output at most NUM (default 130) print columns--left-column output only the left column of common lines--suppress-common-lines do not output common lines-p, --show-c-function show which C function each change is in-F, --show-function-line=RE show the most recent line matching RE--label LABEL use LABEL instead of file name and timestamp(can be repeated)-t, --expand-tabs expand tabs to spaces in output-T, --initial-tab make tabs line up by prepending a tab--tabsize=NUM tab stops every NUM (default 8) print columns--suppress-blank-emptysuppress space or tab before empty output lines-l, --paginatepass output through 'pr' to paginate it-r, --recursive recursively compare any subdirectories found--no-dereferencedon't follow symbolic links-N, --new-filetreat absent files as empty--unidirectional-new-file treat absent first files as empty--ignore-file-name-case ignore case when comparing file names--no-ignore-file-name-caseconsider case when comparing file names-x, --exclude=PAT exclude files that match PAT-X, --exclude-from=FILE exclude files that match any pattern in FILE-S, --starting-file=FILEstart with FILE when comparing directories--from-file=FILE1 compare FILE1 to all operands;FILE1 can be a directory--to-file=FILE2 compare all operands to FILE2;FILE2 can be a directory-i, --ignore-case ignore case differences in file contents-E, --ignore-tab-expansionignore changes due to tab expansion-Z, --ignore-trailing-space ignore white space at line end-b, --ignore-space-change ignore changes in the amount of white space-w, --ignore-all-spaceignore all white space-B, --ignore-blank-linesignore changes where lines are all blank-I, --ignore-matching-lines=REignore changes where all lines match RE-a, --texttreat all files as text--strip-trailing-cr strip trailing carriage return on input--binaryread and write data in binary mode-D, --ifdef=NAMEoutput merged file with '#ifdef NAME' diffs--GTYPE-group-format=GFMT format GTYPE input groups with GFMT--line-format=LFMTformat all input lines with LFMT--LTYPE-line-format=LFMTformat LTYPE input lines with LFMTThese format options provide fine-grained control over the outputof diff, generalizing -D/--ifdef.LTYPE is 'old', 'new', or 'unchanged'.GTYPE is LTYPE or 'changed'.GFMT (only) may contain:%<lines from FILE1%>lines from FILE2%=lines common to FILE1 and FILE2%[-][WIDTH][.[PREC]]{doxX}LETTERprintf-style spec for LETTERLETTERs are as follows for new group, lower case for old group:Ffirst line numberLlast line numberNnumber of lines = L-F+1EF-1ML+1%(A=B" />)if A equals B then T else ELFMT (only) may contain:%Lcontents of line%lcontents of line, excluding any trailing newline%[-][WIDTH][.[PREC]]{doxX}nprintf-style spec for input line numberBoth GFMT and LFMT may contain:%%%%c'C'the single character C%c'\OOO'the character with octal code OOOCthe character C (other characters represent themselves)-d, --minimaltry hard to find a smaller set of changes--horizon-lines=NUMkeep NUM lines of the common prefix and suffix--speed-large-filesassume large files and many scattered small changes--color[=WHEN] color output; WHEN is 'never', 'always', or 'auto'; plain --color means --color='auto'--palette=PALETTEthe colors to use when --color is active; PALETTE is a colon-separated list of terminfo capabilities--help display this help and exit-v, --versionoutput version information and exitFILES are 'FILE1 FILE2' or 'DIR1 DIR2' or 'DIR FILE' or 'FILE DIR'.If --from-file or --to-file is given, there are no restrictions on FILE(s).If a FILE is '-', read standard input.Exit status is 0 if inputs are the same, 1 if different, 2 if trouble.Report bugs to: bug-diffutils@gnu.orgGNU diffutils home page: <https://www.gnu.org/software/diffutils/>General help using GNU software: <https://www.gnu.org/gethelp/>

执行指令:

$ diffcheck.txtmd5.txt36c36< 23dfa036cd5d772f01173da67ebf2634 *./ArchOSManjaro.7z.029---> db2ce0bc5c39fd5d8f672f478788095c *./ArchOSManjaro.7z.029

可以看到第29号文件有问题。

使用 -y 选项,可以查看完整输出(左右两列)

$ diff -y check.txtmd5.txt..e155774f7dd158ded02d9a9aae68f5eb *./ArchOSManjaro.7z.028e155774f7dd158ded02d9a9aae68f5eb *./ArchOSManjaro.7z.02823dfa036cd5d772f01173da67ebf2634 *./ArchOSManjaro.7z.029| db2ce0bc5c39fd5d8f672f478788095c *./ArchOSManjaro.7z.029c8c32363ebd14a7eefce1cadaaa64def *./ArchOSManjaro.7z.030c8c32363ebd14a7eefce1cadaaa64def *./ArchOSManjaro.7z.030...

不一致的行,会用竖线“|”标记。