Copyfind is an open source windows-based program that compares documents and reports similarities in their words and phrases. It is free and available to anyone. It is licensed under the Gnu Public License, which basically means that you can do whatever you like with it except to try to sell it to someone else.
Unlike most modern software packages, Copyfind is a single executable file. You don’t install it, you just run it. Simply click on the link to download the executable file. If you’re running a 64-bit version of Windows, you can select the 64-bit executable, which runs about 10-20% faster than the 32-bit version. Place that file in a convenient location and execute it from the command line.
I haven’t yet written instructions for using Copyfind. Instead, I have posted an example and a commented script to feed to it at:
The script uses my collection of Shakespeare Sonnets, which you can obtain as a zip file at:
To try out Copyfind, the script, and the Sonnets, please open the zip file of Sonnets and copying them into a new folder (Copyfind can’t read the zip file, it needs the Sonnets unpacked). Then put Copyfind.4.1.5.exe (or Copyfind18.104.22.168.exe) and script.txt in the same folder. Edit the script.txt file so that the folders are all correct (at present, they include “Louis Bloomfield” in the path names, which obviously won’t work for you).
When you have the script.txt file spruced up, run the command window or the window powershell and “cd” to the folder containing Copyfind and the script. Then execute:
Copyfind.4.1.5.exe < script.txt
Copyfind22.214.171.124.exe < script.txt
Copyfind should run and should read from script.txt. It ought to compare the Sonnets in two different ways and generate a report.
Once you’ve got it working, you can start tinkering with different scripts. You can load documents individually or in folders into groups 1, 2, 3, … and then compare those groups against one another or internally. When you’re done with a collection of documents, run the “Done” command and you can begin again fresh. Each time you start fresh, you can specify a different reporting folder. You should be able to automate comparisons to run for hours or days without your intervention. You can either use one giant script file and feed it to Copyfind by hand, or you can write a program that calls Copyfind many times and feeds it a different script each time.
As open source software, you’re welcome to tinker with Copyfind to add features or make it behave differently.