IntegrityChecker User Manual

Last updated: 20 Feb 2006
© 2003-2006 diglloyd, Inc.  All Rights Reserved.

Legal and contact

Please read the License Agreement. Distribution or copying without the express written permission of diglloyd, Inc is prohibited.  No warranty, either express or implied is given and you use this program entirely at your own risk.     For inquiries regarding disktester licensing, please see http://www.diglloyd.com/diglloyd/software/index.html.

Email contact: software@diglloyd.com

Please include “IntegrityChecker inquiry” in the subject of your email.  Due to a large volume of spam, messages may be inadvertently ignored without an appropriate subject.

Introduction

First, if you need help understanding how to use Mac OS X Terminal, please see Learning the Mac OS X Terminal.

IntegrityChecker (‘ic’ as the command line tool name) detects changes to file contents with an exceedingly low probablility of missing a change—roughly speaking you could run it continously for a few million years with a change being missed.   The slightest change to a file, even a single bit, will be detected using a 160-bit cryptographic hash

IntegrityChecker is useful for:

  1. verifying that copies of files (eg backups) have not changed in any way;
  2. verifying that files can be read realiably (eg a check on reliability of the volume/disk on which the files reside);
  3. determining which files among a set of files have changed, for example to enable incremental backups.
IntegrityChecker creates an “IntegrityChecker.txt” file in each directory (folder) that you have asked it to operate on.  It contains the cryptographic hashes for both the data and resource fork(s) of each file in that directory.    When this directory is copied, “IntegrityChecker.txt” is copied along with the other files, thus enabling verification of the copies.

Setup

IntegrityChecker™ requires the use of the Mac OS X Terminal application, found in your Utilities folder (the Utilities folder is inside the Applications folder).  You may want to drag Terminal into the Dock, for easy access.

NOTE: The setup for IntegrityChecker assumes that your default shell is /bin/bash, the default for MacOS 10.4 (new accounts).  If you aren’t sure what this means (or would like to change your default shell to /bin/bash), open Terminal, and enter the command “chsh -s bash” to change your default shell to /bin/bash:

llcG5:~ lloyd$ chsh -s bash
chsh: netinfo domain "." updated

To install IntegrityChecker, Start a Terminal window and execute the setup script:

llcG5:/ lloyd$ ./setup_ic.sh

NOTE: If you received IntegrityChecker via email, its executable bit may not be set properly; Mac OS X (Unix after all) won’t allow it to be started unless this bit is set. After copying "ic" to your Applications folder, execute the following commands (shown in bold) in Terminal to mark it as an executable file:

[llcG5:/Applications/Utilities] lloyd% cd /Applications/Utilities
[llcG5:~] lloyd% sudo chmod +x ./ic

You will need the administrator password to do this.

Technical details

IntegrityChecker computes the 160-bit SHA-1 hash of the data and resource fork of each file (additional forks are not currently supported). The SHA-1 hash algorithm is used in cryptography, and the probability of missing a change in file data is for all practical purposes impossible.  You have a better chance of having a roasted duck fly into your mouth.

IntegrityChecker detects and uses multiple processors/cores.  It uses sophisticated threading techniques to overlap I/O (disk access of all kinds) and computation. In most systems, the disk speed will be the limiting factor.  For example, on a PowerMac Quad, disk speed of approximately 400 megabytes per second is required to fully utilize all four cores.

IntegrityChecker is optimized for the PowerPC G5 processor, but runs nearly as fast on the G4 and G3 processors

Usage & command set

If you have just received IntegrityChecker, start by running it on all your critical files.  Ideally, you’ll keep all your important files in  a single directory (folder), using subfolders as appropriate.  For many users with a single hard drive, this means the home directory.  For users with additional hard drives, it might mean additional directories elsewhere, in addition to the home directory.

In Terminal, the special character “~” means your home directory (this includes the desktop which is a Finder-generated display of the folder “Desktop” within your home directory).  Assuming you have installed IntegrityChecker (ic) in your Applications/Utilities folder, the following IntegrityChecker command creates hashes for all files in your home directory:

./ic update-all ~

And this command verifies the hashes:

./ic verify ~

Suppose you store additional files on two other volumes “Main” and “Backup” each of which has the important files in the folder “MyStuff”. Volumes which are not your boot volume are seen on the command line as /Volumes/name, where name is the volume name.  You can apply the same commands as shown above with additional directories:

./ic update-all ~ /Volumes/Main/MyStuff /Volumes/Backup/MyStuff

Suppose you keep all your data in your home directory (typical) on your boot disk (it’s a good idea to keep all your important files within a single folder, so that they can all be backed up together, minimizing the risk of forgetting something).  Open a Terminal window, then make sure you are in the same directory as "ic" (this example assumes you have copied "ic" into your Applications folder):

[llcG5:/] lloyd% cd /Applications/Utilities

Enter the following command:

[llcG5:/Applications/Utilities] lloyd% ./ic update-all ~

To check on the current status of all files, you would enter:

[llcG5:/Applications/Utilities] lloyd% ./ic verify ~

Suppose you make backups of "MyData" to an external hard drive named "Backup" (or a DVD-R or anything else).  How do you know that your backup is valid?  To verify that the files are identical, you would enter:

[llcG5:/Applications] lloyd% ./ic verify /Volumes/Backup/MyData

Here is an overview of the syntax for all the commands:

ic update-all
        [--debug|-d]
        [--verbose|-v]
        --progress-interval|-p <seconds>
        --buffer-size|-b <kilobytes>
        --max-memory-use|-m <megabytes>
        path[ path]*

ic update-new
        [--debug|-d]
        [--verbose|-v]
        --progress-interval|-p <seconds>
        --buffer-size|-b <kilobytes>
        --max-memory-use|-m <megabytes>
        path[ path]*

ic update-changed
        [--debug|-d]
        [--verbose|-v]
        --progress-interval|-p <seconds>
        --buffer-size|-b <kilobytes>
        --max-memory-use|-m <megabytes>
        path[ path]*

ic show-status
        [--debug|-d]
        [--verbose|-v]
        path[ path]*

ic verify
        [--debug|-d]
        [--verbose|-v]
        --iterations|-i <count>
        --progress-interval|-p <seconds>
        --buffer-size|-b <kilobytes>
        --max-memory-use|-m <megabytes>
        path[ path]*

ic clean
        [--debug|-d]
        [--verbose|-v]
        path[ path]*

ic test-speed
        [--debug|-d]
        [--verbose|-v]
        --test-size|-t <size-in-MB>

Command Details

Command Discussion
update-all Creates verification data for all file(s) that are specified. One or more file(s) or folder(s) may be specified.  Verification data is created for all files found within those items.  When a directory is specified, all files within the entire directory tree under that directory are selected.
update-new Similar to update-all except that it creates verification data only for files which do not already have it.
update-changed Updates all files that are either new (same as update-new) or that have a size or file creation or modification date that is different from what is recorded in IntegrityChecker's data file.

Note that files which have the same size and date could still be different, but that situation is detectable only by running verify, since the file hash must be computed over all the file data.

show-status Displays information about all files that have different sizes or file dates as well as new files and files that have been ignored.  Does not verify any file data.  Use this command for a quick check of what has definitely changed in some way.  Note that a file whose size is the same, but whose date is changed may still be the same as before; use the verify command to tell.
verify For all files which have verification data, recomputes the data and compares it to the stored data.  The output consists of advice as to whether a file is changed, or whether it is likely corrupted.  A file is considered "corrupted" if its file dates and size have not changed, but the verification data does not match.    Some programs alter files while keeping file dates the same which may flag a file as corrupted. 

If a disk or disk driver sporadically produces bit errors in file data, you will see corruption warnings.   It is particularly important to pay attention to these warnings as it could indicate impending doom for your data.

IntegrityChecker reads files with caching disabled, so that data is actually re-read from disk.

clean All verification data (all IntegrityChecker.txt files) are removed from the specified directory or directories.
test-speed Tests the speed at which the CPU can perform the SHA1 hash.
version Display the version information.

Option discussion

Each option has a long name, which must be prefixed by “ --”, and a short name, which must be a single “-” followed by a letter (multiple boolean options may be grouped however eg “-dv”).  An option in square brackets [] is optional.

Command Discussion
--debug | -d Debugging information may be emitted while processing files.
--verbose | -v Additional "wordy" output is emitted while processing files.
--iterations | -i [verify command only].  The specified files are checked the specified number of times.  This can be useful if you are seeing sporadic errors and/or if you suspect them.  All files are checked once, then the process repeats for a total of the specified number of iterations.
--progress-interval | -p A summary of progress so far is emitted every X seconds as specified by this option.
--buffer-size | -b Controls the size chunk in which data will be read.  Larger chunks stall the CPU when initiating processing of a file; smaller chunks are less efficient.  By default, the size if 2 MB (2048K).  Be sure to specify this size in kilobytes.
--max-memory-use | -m Controls the maximum amount of memory to be used while processing files.  If the disks are faster than the CPUs, then this option may be of issue.  If the disks are slower than the CPUs, then memory usage will usually not be more than twice the buffer size.
--test-size | -t [test-speed command only].  Controls how big a chunk of memory to perform the test on.

Examples

This section shows various examples.