Software

WCopyfind

WCopyfind is a windows-based program that examines a collection of document files, looking for similarities. It extracts the text portions of those documents and scours them for matching words in phrases of a specified minimum length. When it finds two files that share enough words in those phrases, WCopyfind generates html report files. These reports contain the document text with the matching phrases underlined.

What WCopyfind can do:

It can find documents that share large amounts of text. This result may indicate that one file is a copy or partial copy of the other, or that they are both copies or partial copies of a third document. WCopyfind can presently handle: .docx, .doc, .txt, .htm, .html, and .pdf formats. It will also try to find text in other file formats, but there are no guarantees it will succeed.

What WCopyfind cannot do:

It cannot search the web or internet to find matching documents for you. You must specify which documents it compares. Those documents can be local ones—on your computer or a file server—or web-resident html or text documents that are pointed to by local internet shortcuts. If you suspect that a particular web page has been copied, you must create an internet shortcut to that page and include that shortcut in the collection of documents that you give to WCopyfind.

Copyfind

Copyfind is a command-line program that examines collections of document files, looking for similarities. Copyfind runs only under Windows (at present) and has the same internal comparison engine as WCopyfind. Unlike WCopyfind, however, Copyfind reads instructions from the command line and is therefore much more flexible and capable of more complicated comparison activities.

10 thoughts on “Software

  1. How many documents can be compared? I am a high school teacher and I may need to upload and compare 20-30 documents against each other. Can WCopyfind handle that?
    Thanks,
    Susan

  2. The number of documents that WCopyfind can handle is limited only by your computer’s RAM memory and is probably several tens of thousands of documents. Checking 20 or 30 documents against one another is easy as pie.

    Actually, you don’t “upload” them — you merely drag the documents into WCopyfind’s inbox and let it compare them. The documents never leave your computer, which is important because it means that you retain complete control over them and don’t risk having those documents drift around the internet after you’ve done the comparison.

    Just download WCopyfind onto your computer, create a report folder, drag your documents into its inbox, and run it. With fewer than 30 documents to compare, I expect it to finish its work in about 1 second.

  3. I would just like to mention that I found WCopyfind both very easy to use AND quite elegant in the manner in which it compares documents. I was using it to compare two moderately long texts(35 pages) and in addition to the “metrics” provided, WCF provides a side by side html output of the documents linking similar/identical passages that might be near the start of one and towards the middle of the other.

    1. If there are no matching pairs of document, then WCopyfind doesn’t list any comparison links. I should add a comment that indicates this point so that you’re not left thinking that it didn’t run properly. I’ll put it on my todo list and try to get it done shortly.

      Thanks!

      Lou

  4. This software is good and I am looking forward to either command line windows version or even better Linux/Unix command line as it would greatly simplify automated tasks.

    Thanks

  5. Thank you for this very useful tool. I have been using it for years. Its speed is amazing.
    If you ever feel inclined to add features to WCopyFind, might I suggest an option to not compare bibliographies? The program could simply ignore everything in a file after a line containing one of a list of common phrases such as “Bibliography”, “Works Cited”, “References”, etc. and no other words. It might be useful to allow the user to add terms to the list (maybe in a separate text file?), since academic citation styles vary. People might even use this feature to exclude the ends of files for other reasons specific to particular assignments. This would make the measures of matches more useful in distinguishing inappropriate matching text from appropriate matching references without having to look at many side-by-side comparisons.
    Thank you!

  6. For what it’s worth, I just ran the WCopyfind successfully on my Mac (OS X Mavericks) under Wine (installed via MacPorts). I’ll try it on my home Linux box tonight, but presumably it should work there as well.

Leave a Reply