Corruption occurs without warning and without one's knowledge at least usually until too late to recover the missing parts. In backup methodologies, the issue of data corruption is an important one. This gives the user the chance to decide which file is the preferred one to retain, if the files should be merged to create one containing all of the differences, or perhaps to keep them both as-is for later reference, through some form of "versioning" control.įile comparison is an important, and most likely integral, part of file synchronization and backup. But if one wishes to compare text files or computer programs, a side-by-side visual comparison is usually best.
When one wishes to compare binary files, byte-level is probably best.
#GRAPHICAL FILE COMPARISON TOOL CODE#
In either case, particularly side-by-side viewing, code folding or text folding may be used to hide unchanged portions of the file, only showing the changed portions.Ĭomparison tools are used for various reasons. Byte or character-level comparison is useful in some specialized applications.ĭisplay of file comparison varies, with the main approaches being either showing two files side-by-side, or showing a single file, with markup showing the changes from one file to the other. The rsync protocol uses a rolling hash function to compare two files on two distant computers with low communication overhead.įile comparison in word processors is typically at the word level, while comparison in most programming tools is at the line level.
Some specialized file comparison tools find the longest increasing subsequence between two files. Other file comparison programs find block moves.
This is used in the IBM History Flow tool. In 1978, Paul Heckel published an algorithm that identifies most moved blocks of text. Any data not in the longest common subsequence is presented as an insertion or deletion. Most file comparison tools find the longest common subsequence between two files.