Data Integrity Checking

Files on our Digital Archive disks can be checked for data integrity using the md5sum program.

This is the best way to determine whether any archived file has become corrupted (especially large video files). It is faster and more reliable than a visual inspection.

Tools

The attached Bash script (md5walk.sh) walks down a directory tree and uses md5sum to compute an MD5 checksum for every file at each level of the tree.

The checksums are saved in each directory in a file called md5-checksums.txt.

Running md5walk.sh on a Windows machine requires md5sum (free download) and a Linux-like shell environment such as that provided by Cygwin (also a free download). Run "md5walk.sh -?" for a summary of the options.

At the time of this writing, all folders on our Digital Archive disks have been scanned using md5walk.sh and checksum files have been saved in each folder.

Click the icon below to download md5walk.sh

(Ultimately, it may be better to rewrite md5walk as a Python script. That exercise has been left to the reader. (wink))

Known Issues

  • Adobe Premiere Pro 5.5 is known to modify video files after viewing them. Adobe appends a 5 kilobyte XML metadata block at the end of the video file. This doesn't seem to harm the video, but it will cause data integrity checks to fail.