This web site provides software for detecting the reuse of written language across thousands or even millions of documents. In addition to helping to reduce the impact of plagiarism on our world, that software can study the migration of the written word through many interesting processes of civilization and society. As an academic, I am always interested in collaborations that will lead to peer-reviewed publications. If you have an interesting project that involves sifting through vast collections of documents for reused language, please contact me about a joint endeavor.

– Lou Bloomfield, Professor of Physics, University of Virginia

About Plagiarism:

Plagiarism is the misrepresentation of authorship. Typically, words and ideas conceived by one person are attributed to another person. Plagiarism is a form of intellectual theft or fraud and it undermines the intellectual economy that values ideas, words, and understanding. Even when an act of plagiarism appears superficially a victimless crime, it nonetheless devalues the currency of human thought and thereby weakens society.

In the most common form of plagiarism, one author’s words are inserted verbatim in the work of a second author, without quotation, acknowledgement, or attribution. But there are many other forms of plagiarism, including some that are often accepted or even encouraged by society, notably ghostwriting, speech-writing, and paraphrasing.

Plagiarism is not a black-and-white issue because many of our ideas and words derive from those of others, and what constitutes true intellectual theft or fraud often involves some degree of subjectivity. Moreover, each context has its own rules regarding the need for accurate attribution of authorship and those rules are not always obvious to everyone. Reasonable people may even disagree about those rules, so defining them clearly and explicitly is always a good idea.

What this Site Provides:

Software for Detecting Plagiarism

WCopyfind is an open source windows-based program that explores a collection of documents, looking for matching language. If you have a collection of documents that you think might contain plagiarized content, you can check them quickly with this free software.

Thoughts and Ideas about Plagiarism, Education, and Society

Lou Bloomfield’s writings about issues relating to plagiarism, writing, scholarship, authorship, credentialing, integrity, and ethics.

Links to Other Web Sites Dealing with Plagiarism

An assortment of web sites that provide information, software, or services relating to plagiarism.

22 comments to Welcome

  • vanya

    Can I have the Programme?

  • José Antonio

    Este programa, tiene bases de datos únicamente en inglés?, o en qué idiomas no más trabajan?

    Gracias
    José

    • Lou Bloomfield

      (José Antonio wrote: This program has databases in English only? Or in which languages does it work?)

      Este programa no tiene bases de datos en cualquier idioma, Inglés o Español. Se comparan los documentos que tenga en su computadora. Debe ser capaz de comparar los documentos en español tan bien como lo compara documentos en Inglés. Yo sugiero que usted tiene esos documentos .html, .docx, .txt, o basados en texto .pdf.

      Lou Bloomfield

      (This program has no databases in any language, English or Spanish. It compares documents that you have on your computer. It should be able to compare Spanish documents just as well as it compares English documents. I suggest that you have those documents as .html, .docx, .txt, or text based .pdf files.)

      Lou Bloomfield

  • karaba

    i need a software which compares Greek essays. Is this appropriate?

    • Lou Bloomfield

      WCopyfind should work with Greek and many other languages, as long as the files are in .DOCX, .HTML, or .PDF format (and the PDF is a text-based PDF, rather than an image-based one). WCopyfind can read and analyze any of the UTF-8 characters in those file formats, so it shouldn’t care whether the documents are in any particular language. You should select “Greek” as the language, so that WCopyfind distinguishes between letters and punctuation properly. Hopefully, it will do what you want.

      Lou

  • NIW

    Lou,
    If I download and then use your application to compare two documents, do you then have record of those documents?

    • Lou Bloomfield

      No, the documents remain entirely on your computer. They are never sent anywhere else. In fact, you can run WCopyfind without any internet connection.

      Lou

  • Chris

    I found a reference or two to “Copyfind” for linux. Is this available? Thanks

    • Lou Bloomfield

      I have written a command-line based version of my software, Copyfind, but I haven’t had a chance to post it on this site yet. It’s written in Microsoft C++ and contains a small amount of windows-specific stuff. My hope is to remove even that and make the program machine-independent. But right now I have so many things to do that I’m not sure when I’ll get to it.

      Lou

      • Lou,

        I’m eager to get Copyfind working on Linux. If you’ll share your source code I’ll cheerfully scrub out any Windows-isms I find. But I’d rather not duplicate work you’ve already done.

        • CA_Accountant

          Was the Linux-based version ever posted to the site? Is that what all of the source code files are for? Sorry, I’m not well versed in linux yet.

        • Lou Bloomfield

          I finally posted Copyfind — a version of this software that runs in a console window and can be scripted. It still needs to run under Windows, since I haven’t extracted all the windows-specific C++ extensions, but it’s much more coder-friendly.

          Lou

  • Mukul

    As a teacher, I have been using Wcopyfind for more than five years. In the current situation, I have Dell 64 bit system and most of the docs are .docx (word 2007/2010). The software hangs if i load more files, say, 5 or more.

    If I use version 2.6 compatible on my desktop, the file compare side-by-side window gives characters other than the report text.

    please help.

  • pcc

    Do you know if anyone has used Parallels or some other Windows emulation software to run your software on a Mac successfully?

  • ttsteidley

    Thanks for the fantastic program! I am currently using it in some text analysis research and was wondering if there is a simple way to tell the program that I am only interested in comparing “old” documents with “new” and to ignore “new” to “new” comparisons? Again, great program!

  • lsb4u

    Is there a database of papers that can be downloaded to use in your program? I want to compare my students work to each other, but I also want to make sure they didn’t plagiarize from internet sources. Any thoughts? Thanks!!

  • matt

    Hi Louis

    recently copyfind has been cutting words out of the reports. When I look at the side by side or the single page I have noticed that the sentences are incomplete, almost as if it were finishing off at the word wrap rather than the end of the sentence. Is this a known bug?

    BTW I’m the one who had the problem with memory overun with docx files a few years ago :-) I can post you examples if you like

    Matt

  • MarthaTH

    Hi Lou,

    This is a great idea for a site. I think there’s not enough “good” info out there on how to properly check for plagiarism.

    I came across this post here, which reviews some of the free tools online:
    http://www.grammarcheck.net/review-10-sites-that-check-for-plagiarism/

    I tried them out (most of them) and the free ones were delivering all the same result at all, but most of them were not working at all.

    I find it extremely difficult to find a “truly” free site, so I’m sticking to Google for now.

    Do you have any guide on how to use Google for plagiarism checking properly?

    Thanks and please keep me posted!

    Sincerely,
    Martha
    teachermartha82 [at] gmail [dot] com

  • Susan Keller

    I am interested in learning if this program will work on a mac–thanks

  • wwdreambuilder

    Hello,

    I am comparing two docx documents, one of which was converted from a pdf. They both contain addresses but the number part of the address is only occasionally highlighted along with the street name. For instance, when 111 S Main is on both documents, only S Main would be highlighted. Within the same list, 123 N Carson might fully be highlighted when on both documents. Is there something I am doing wrong?

    Thank you for your help!
    Kevin

Leave a Reply