Welcome

This web site provides software for detecting the reuse of written language across thousands or even millions of documents. In addition to helping to reduce the impact of plagiarism on our world, that software can study the migration of the written word through many interesting processes of civilization and society. As an academic, I am always interested in collaborations that will lead to peer-reviewed publications. If you have an interesting project that involves sifting through vast collections of documents for reused language, please contact me about a joint endeavor.

— Lou Bloomfield, Professor of Physics, University of Virginia

About Plagiarism:

Plagiarism is the misrepresentation of authorship. Typically, words and ideas conceived by one person are attributed to another person. Plagiarism is a form of intellectual theft or fraud and it undermines the intellectual economy that values ideas, words, and understanding. Even when an act of plagiarism appears superficially a victimless crime, it nonetheless devalues the currency of human thought and thereby weakens society.

In the most common form of plagiarism, one author’s words are inserted verbatim in the work of a second author, without quotation, acknowledgement, or attribution. But there are many other forms of plagiarism, including some that are often accepted or even encouraged by society, notably ghostwriting, speech-writing, and paraphrasing.

Plagiarism is not a black-and-white issue because many of our ideas and words derive from those of others, and what constitutes true intellectual theft or fraud often involves some degree of subjectivity. Moreover, each context has its own rules regarding the need for accurate attribution of authorship and those rules are not always obvious to everyone. Reasonable people may even disagree about those rules, so defining them clearly and explicitly is always a good idea.

What this Site Provides:

Software for Detecting Plagiarism

WCopyfind is an open source windows-based program that explores a collection of documents, looking for matching language. If you have a collection of documents that you think might contain plagiarized content, you can check them quickly with this free software.

Thoughts and Ideas about Plagiarism, Education, and Society

Lou Bloomfield’s writings about issues relating to plagiarism, writing, scholarship, authorship, credentialing, integrity, and ethics.

38 thoughts on “Welcome

    1. (José Antonio wrote: This program has databases in English only? Or in which languages does it work?)

      Este programa no tiene bases de datos en cualquier idioma, Inglés o Español. Se comparan los documentos que tenga en su computadora. Debe ser capaz de comparar los documentos en español tan bien como lo compara documentos en Inglés. Yo sugiero que usted tiene esos documentos .html, .docx, .txt, o basados en texto .pdf.

      Lou Bloomfield

      (This program has no databases in any language, English or Spanish. It compares documents that you have on your computer. It should be able to compare Spanish documents just as well as it compares English documents. I suggest that you have those documents as .html, .docx, .txt, or text based .pdf files.)

      Lou Bloomfield

    1. WCopyfind should work with Greek and many other languages, as long as the files are in .DOCX, .HTML, or .PDF format (and the PDF is a text-based PDF, rather than an image-based one). WCopyfind can read and analyze any of the UTF-8 characters in those file formats, so it shouldn’t care whether the documents are in any particular language. You should select “Greek” as the language, so that WCopyfind distinguishes between letters and punctuation properly. Hopefully, it will do what you want.

      Lou

    1. No, the documents remain entirely on your computer. They are never sent anywhere else. In fact, you can run WCopyfind without any internet connection.

      Lou

    1. I have written a command-line based version of my software, Copyfind, but I haven’t had a chance to post it on this site yet. It’s written in Microsoft C++ and contains a small amount of windows-specific stuff. My hope is to remove even that and make the program machine-independent. But right now I have so many things to do that I’m not sure when I’ll get to it.

      Lou

      1. Lou,

        I’m eager to get Copyfind working on Linux. If you’ll share your source code I’ll cheerfully scrub out any Windows-isms I find. But I’d rather not duplicate work you’ve already done.

        1. I finally posted Copyfind — a version of this software that runs in a console window and can be scripted. It still needs to run under Windows, since I haven’t extracted all the windows-specific C++ extensions, but it’s much more coder-friendly.

          Lou

  1. As a teacher, I have been using Wcopyfind for more than five years. In the current situation, I have Dell 64 bit system and most of the docs are .docx (word 2007/2010). The software hangs if i load more files, say, 5 or more.

    If I use version 2.6 compatible on my desktop, the file compare side-by-side window gives characters other than the report text.

    please help.

  2. Thanks for the fantastic program! I am currently using it in some text analysis research and was wondering if there is a simple way to tell the program that I am only interested in comparing “old” documents with “new” and to ignore “new” to “new” comparisons? Again, great program!

  3. Is there a database of papers that can be downloaded to use in your program? I want to compare my students work to each other, but I also want to make sure they didn’t plagiarize from internet sources. Any thoughts? Thanks!!

  4. Hi Louis

    recently copyfind has been cutting words out of the reports. When I look at the side by side or the single page I have noticed that the sentences are incomplete, almost as if it were finishing off at the word wrap rather than the end of the sentence. Is this a known bug?

    BTW I’m the one who had the problem with memory overun with docx files a few years ago 🙂 I can post you examples if you like

    Matt

  5. Hi Lou,

    This is a great idea for a site. I think there’s not enough “good” info out there on how to properly check for plagiarism.

    I came across this post here, which reviews some of the free tools online:
    http://www.grammarcheck.net/review-10-sites-that-check-for-plagiarism/

    I tried them out (most of them) and the free ones were delivering all the same result at all, but most of them were not working at all.

    I find it extremely difficult to find a “truly” free site, so I’m sticking to Google for now.

    Do you have any guide on how to use Google for plagiarism checking properly?

    Thanks and please keep me posted!

    Sincerely,
    Martha
    teachermartha82 [at] gmail [dot] com

  6. Hello,

    I am comparing two docx documents, one of which was converted from a pdf. They both contain addresses but the number part of the address is only occasionally highlighted along with the street name. For instance, when 111 S Main is on both documents, only S Main would be highlighted. Within the same list, 123 N Carson might fully be highlighted when on both documents. Is there something I am doing wrong?

    Thank you for your help!
    Kevin

  7. Hi everyone,

    I am a Mac user and use WCopyfind on my computer. All you have to do is download and install “Wine” for Mac and then you’ll be able to run WCopyfind on your Mac.

    Hope it helps,

    Luc

  8. Hi, everyone.
    I have a “newbie” question, which is simply how do I post a NEW question on the site? I can’t find any info at all on the site that explains how to do this (but maybe I am not looking in the right places).
    thanks,in advance.
    Steve

  9. Is there a way to make this program scriptable?

    I am looking to run it several thousand times via CLI, and I can’t find where in the source I should be modifying it. Can you do this/where should I look?

    1. The Copyfind version of this program is scriptable. Copyfind reads a script file, so you can easily make it do thousands of separate comparisons. At present, it only works under Windows (I wrote it in Visual C++), but I may try to port it to a more generic C++ (probably under eclipse). Having it work properly on a Linux box would make lots of sense. It’s just a time problem for me — a zillion things to do and endless pressure to publish or perish.

      Lou

    1. I’m afraid that I have no idea. The language selection affects punctuation and accents, but otherwise it should have relatively little effect on the comparison process. I suggest that you find the language that most closely resembles Latvian and see how it effective the comparison is. It should work pretty well, even if you select English.

      Lou

  10. I use the 4.1.1 version and I would like to update to the 4.1.4 version. However, when I try to run it, I get the a message saying that this is not a valid Win32 application (I took care to download the Win32 version, not the Win64). Same thing happens with version 4.1.2 and 4.1.3. I downloaded the 4.1.1 version again, and it works. Is there something that I can do to run the latest version? (I use Windows XP 32 bit, SP3; I know, I should upgrade…). Otherwise, thanks for this nice application!

    Peji

    1. Maybe I built it incorrectly. I just tried it on my Window 8 computer and it worked. Perhaps it has trouble under XP? I’m not sure how to enforce compatibility with older operating systems.

      Lou

  11. Version 4.1.4 does not work on my windows XP SP3 machine either. I have the same error was posted by peji, ‘Not a valid Win32 application’. It is good to hear that 4.1.1 works so I will give that a try.

    Apart from that thanks for making the program I expect it will be very useful.

  12. Wow! Just discovered and downloaded Wcopyfind. Tried it out on a small set of test tiles and it performed flawlessly. As a Code of Conduct officer, I am called on to deal with plagiarism, among many other issues. We have Turnitin available, but this software is far easier to use and allows me to work more independently.

    Thanks a million.

  13. Hello Sir,

    I need your help in plagiarism detection software. Can you please tell me how to run the source code? Also please tell me which algorithm you are using in checking the plagiarism?
    Can I contact you via email?

    1. Actually I am trying to run the source code in visual studio but I am unable to compile the source code but I am getting following errors:

      Error 1 error C1083: Cannot open include file: ‘afxwin.h’: No such file or directory e:\wcopyfind.4.1.4\wcopyfind\stdafx.h 23 1 WCopyfind
      2 IntelliSense: cannot open source file “afxwin.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 23 1 WCopyfind
      3 IntelliSense: cannot open source file “afxext.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 24 1 WCopyfind
      4 IntelliSense: cannot open source file “afxdisp.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 27 1 WCopyfind
      5 IntelliSense: cannot open source file “afxdtctl.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 32 1 WCopyfind
      6 IntelliSense: cannot open source file “afxcmn.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 35 1 WCopyfind
      7 IntelliSense: cannot open source file “afxcontrolbars.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 38 1 WCopyfind
      8 IntelliSense: #error directive: “include ‘stdafx.h’ before including this file for PCH” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 8 3 WCopyfind
      9 IntelliSense: identifier “WM_APP” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 12 31 WCopyfind
      10 IntelliSense: identifier “WM_APP” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 13 35 WCopyfind
      11 IntelliSense: identifier “WM_APP” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 14 36 WCopyfind
      12 IntelliSense: identifier “HWND” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 22 2 WCopyfind
      13 IntelliSense: identifier “CDialog” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 23 2 WCopyfind
      14 IntelliSense: not a class or struct name e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 28 30 WCopyfind
      15 IntelliSense: identifier “BOOL” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 35 10 WCopyfind
      16 IntelliSense: explicit type is missing (‘int’ assumed) e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 39 2 WCopyfind
      17 IntelliSense: expected a ‘;’ e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 40 1 WCopyfind
      18 IntelliSense: identifier “CALLBACK” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 18 WCopyfind
      19 IntelliSense: identifier “CListCtrl” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 59 WCopyfind
      20 IntelliSense: identifier “CString” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 77 WCopyfind
      21 IntelliSense: identifier “UINT” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 93 WCopyfind
      22 IntelliSense: cannot open source file “afxwin.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfindDlg.h 8 1 WCopyfind
      23 IntelliSense: cannot open source file “afxcmn.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfindDlg.h 9 1 WCopyfind
      24 IntelliSense: expected a ‘{‘ e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 28 7 WCopyfind
      25 IntelliSense: cannot open source file “afxinet.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 6 1 WCopyfind
      26 IntelliSense: cannot open source file “afxdialogex.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 7 1 WCopyfind
      27 IntelliSense: cannot open source file “afxwin.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 8 1 WCopyfind
      28 IntelliSense: explicit type is missing (‘int’ assumed) e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 29 1 WCopyfind
      29 IntelliSense: identifier “CWinApp” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 29 34 WCopyfind
      30 IntelliSense: expected a ‘{‘ e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 30 2 WCopyfind
      31 IntelliSense: identifier “m_dwRestartManagerSupportFlags” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 39 2 WCopyfind
      32 IntelliSense: identifier “AFX_RESTART_MANAGER_SUPPORT_RESTART” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 39 35 WCopyfind
      33 IntelliSense: identifier “BOOL” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 53 1 WCopyfind
      34 IntelliSense: identifier “INITCOMMONCONTROLSEX” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 58 2 WCopyfind
      35 IntelliSense: identifier “ICC_WIN95_CLASSES” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 62 20 WCopyfind
      36 IntelliSense: identifier “InitCommonControlsEx” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 63 2 WCopyfind
      37 IntelliSense: name followed by ‘::’ must be a class or namespace name e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 65 2 WCopyfind
      38 IntelliSense: identifier “AfxEnableControlContainer” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 68 2 WCopyfind
      39 IntelliSense: identifier “CShellManager” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 72 2 WCopyfind
      40 IntelliSense: identifier “pShellManager” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 72 17 WCopyfind
      41 IntelliSense: expected a type specifier e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 72 37 WCopyfind
      42 IntelliSense: identifier “SetRegistryKey” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 81 2 WCopyfind
      43 IntelliSense: identifier “_T” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 81 17 WCopyfind
      44 IntelliSense: identifier “m_pMainWnd” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 84 2 WCopyfind
      45 IntelliSense: identifier “INT_PTR” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 85 2 WCopyfind
      46 IntelliSense: class “CWCopyfindDlg” has no member “DoModal” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 85 26 WCopyfind
      47 IntelliSense: identifier “IDOK” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 86 19 WCopyfind
      48 IntelliSense: identifier “IDCANCEL” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 91 24 WCopyfind
      49 IntelliSense: identifier “FALSE” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 105 9 WCopyfind

  14. I have been using WCopyfind quite successfully for my usage. In fact you might say I have been abusing it.
    When I download ebooks in batch mode I find that there are occasions that books are split into chapters and I am getting not only separate chapters but also the whole book. I then use WCopyfind to find the files with the (duplicated) separate chapter(s) and delete them keeping only the whole book.
    It would help me tremendously if there would be a way to sort the percentages, descending/ascending, of the left “Perfect Match” column. This could be in the program window, the html window, or preferably both.

    Is this something you could/would add to WCopyFind?

    Sincerely,
    Juergen

Leave a Reply