.TH same 1 .SH NAME same \- find identical files and link them to save disk space. .SH SYNOPSIS \fBsame\fR [-d | --debug] [-hs \fIn\fR | --hashstart \fIn\fR] [-n | --dryrun] [-s] [-t | --timings] [-v | --verbose] [-z --nullfiles] .SH OPTIONS .TP \fB\-d\fR, \fB\-\-debug\fR Switch on debugging messages from the program. .TP \fB\-hs\fR \fIn\fR, \fB--hashstart\fR \fIn\fR Set the start value of the hash function. .TP \fB\-n\fR, \fB\-\-dryrun\fR Do not modify any file on the disk. .TP \fB\-s\fR Create symbolic links instead of hard links. .TP \fB\-t\fR, \fB\-\-timings\fR At the end of the program, output the time needed. .TP \fB\-v\fR, \fB\-\-verbose\fR Output some messages what is done. .TP \fB\-z\fR, \fB\-\-nullfiles\fR Even create links for empty files. Normally these files are ignored. .SH INTRODUCTION This program takes a list of files (e.g. the output of \fBfind . -type f\fR) on stdin. Each of the files is compared against each of the others. Whenever two files are found that match exactly, the two files are linked (soft or hard) together. .SH GOAL The goal of this program is to conserve disk space when you have several different trees of large project on your disk. By creating hardlinks or softlinks between the files that are the same, you can save lots of disk space. For example, two different versions of the Linux kernel only differ in a small number of files. By running this program you only need to store the contents of those files once. This is especially useful if you have different versions of complete trees lying around. .SH IMPLEMENTATION The filesize of every file is used as an indication of wether two files can be the same. Whenever the filesizes match, the hashes of these two files are compared. Whenever these match, the file contents are compared. For every matching pair one of the two files is replaced by a hard link to the other file. With the \fB-s\fR option a softlink is used. To allow you to do this incrementally, the "rm" is done on the file with the least links. This allows you to "merge" a new tree with several trees that have already been processed. The new tree has link count 1, while the old tree has a higher link count for those files that are likely candidates for linkage. The current implementation keeps the "first" incantation of a file, and replaces further occurrances of the same file. This is significant when using softlinks. .SH EXAMPLE .TP \fBfind . -type f | same\fR This links all files together under the current directory that are the same. .SH BUGS .IP \(bu Make sure that you have all the permissions required for execution of the commands. .IP \(bu RCS probably allows you to do similar things. .IP \(bu If your editor does not move the original aside before writing a new copy, you will change the file in ALL incarnations when editing a file. Patch works just fine: it moves the original aside before creating a new copy. I'm confident that I could learn Emacs to do it this way too. I'm too lazy to figure it out, so if you happen to know an easy way how to do this, please Email me at R.E.Wolff@BitWizard.nl .IP \(bu There is a 1024 character limit to pathnames when using symlinks. .SH AUTHOR This manpage was written by Roland Illig for the pkgsrc distribution. Some sections are taken from the source code of `same'.