Welcome

This web site provides software for detecting the reuse of written language across thousands or even millions of documents. In addition to helping to reduce the impact of plagiarism on our world, that software can study the migration of the written word through many interesting processes of civilization and society. As an academic, I am always interested in collaborations that will lead to peer-reviewed publications. If you have an interesting project that involves sifting through vast collections of documents for reused language, please contact me about a joint endeavor.

— Lou Bloomfield, Professor of Physics, University of Virginia


Our tiny company is producing EarJellies Earplugs, made from a shape-memory material (MemorySil®) that I invented at the University of Virginia. You can obtain these earplugs on Amazon or at our online store, EarJellies.com — Lou Bloomfield

About Plagiarism:

Plagiarism is the misrepresentation of authorship. Typically, words and ideas conceived by one person are attributed to another person. Plagiarism is a form of intellectual theft or fraud and it undermines the intellectual economy that values ideas, words, and understanding. Even when an act of plagiarism appears superficially a victimless crime, it nonetheless devalues the currency of human thought and thereby weakens society.

In the most common form of plagiarism, one author’s words are inserted verbatim in the work of a second author, without quotation, acknowledgement, or attribution. But there are many other forms of plagiarism, including some that are often accepted or even encouraged by society, notably ghostwriting, speech-writing, and paraphrasing.

Plagiarism is not a black-and-white issue because many of our ideas and words derive from those of others, and what constitutes true intellectual theft or fraud often involves some degree of subjectivity. Moreover, each context has its own rules regarding the need for accurate attribution of authorship and those rules are not always obvious to everyone. Reasonable people may even disagree about those rules, so defining them clearly and explicitly is always a good idea.

What this Site Provides:

Software for Detecting Plagiarism

WCopyfind is an open source windows-based program that explores a collection of documents, looking for matching language. If you have a collection of documents that you think might contain plagiarized content, you can check them quickly with this free software.

Thoughts and Ideas about Plagiarism, Education, and Society

Lou Bloomfield’s writings about issues relating to plagiarism, writing, scholarship, authorship, credentialing, integrity, and ethics.

49 thoughts to “Welcome”

    1. (José Antonio wrote: This program has databases in English only? Or in which languages does it work?)

      Este programa no tiene bases de datos en cualquier idioma, Inglés o Español. Se comparan los documentos que tenga en su computadora. Debe ser capaz de comparar los documentos en español tan bien como lo compara documentos en Inglés. Yo sugiero que usted tiene esos documentos .html, .docx, .txt, o basados en texto .pdf.

      Lou Bloomfield

      (This program has no databases in any language, English or Spanish. It compares documents that you have on your computer. It should be able to compare Spanish documents just as well as it compares English documents. I suggest that you have those documents as .html, .docx, .txt, or text based .pdf files.)

      Lou Bloomfield

    1. WCopyfind should work with Greek and many other languages, as long as the files are in .DOCX, .HTML, or .PDF format (and the PDF is a text-based PDF, rather than an image-based one). WCopyfind can read and analyze any of the UTF-8 characters in those file formats, so it shouldn’t care whether the documents are in any particular language. You should select “Greek” as the language, so that WCopyfind distinguishes between letters and punctuation properly. Hopefully, it will do what you want.

      Lou

    1. No, the documents remain entirely on your computer. They are never sent anywhere else. In fact, you can run WCopyfind without any internet connection.

      Lou

    1. I have written a command-line based version of my software, Copyfind, but I haven’t had a chance to post it on this site yet. It’s written in Microsoft C++ and contains a small amount of windows-specific stuff. My hope is to remove even that and make the program machine-independent. But right now I have so many things to do that I’m not sure when I’ll get to it.

      Lou

      1. Lou,

        I’m eager to get Copyfind working on Linux. If you’ll share your source code I’ll cheerfully scrub out any Windows-isms I find. But I’d rather not duplicate work you’ve already done.

        1. Was the Linux-based version ever posted to the site? Is that what all of the source code files are for? Sorry, I’m not well versed in linux yet.

        2. I finally posted Copyfind — a version of this software that runs in a console window and can be scripted. It still needs to run under Windows, since I haven’t extracted all the windows-specific C++ extensions, but it’s much more coder-friendly.

          Lou

  1. As a teacher, I have been using Wcopyfind for more than five years. In the current situation, I have Dell 64 bit system and most of the docs are .docx (word 2007/2010). The software hangs if i load more files, say, 5 or more.

    If I use version 2.6 compatible on my desktop, the file compare side-by-side window gives characters other than the report text.

    please help.

    1. I am aware of a bug in the reading of .DOCX files, but haven’t had time to fix it. I’ll try to get to it as soon as I can get my head above water.

      Lou

  2. Thanks for the fantastic program! I am currently using it in some text analysis research and was wondering if there is a simple way to tell the program that I am only interested in comparing “old” documents with “new” and to ignore “new” to “new” comparisons? Again, great program!

  3. Is there a database of papers that can be downloaded to use in your program? I want to compare my students work to each other, but I also want to make sure they didn’t plagiarize from internet sources. Any thoughts? Thanks!!

  4. Hi Louis

    recently copyfind has been cutting words out of the reports. When I look at the side by side or the single page I have noticed that the sentences are incomplete, almost as if it were finishing off at the word wrap rather than the end of the sentence. Is this a known bug?

    BTW I’m the one who had the problem with memory overun with docx files a few years ago 🙂 I can post you examples if you like

    Matt

  5. Hi Lou,

    This is a great idea for a site. I think there’s not enough “good” info out there on how to properly check for plagiarism.

    I came across this post here, which reviews some of the free tools online:
    http://www.grammarcheck.net/review-10-sites-that-check-for-plagiarism/

    I tried them out (most of them) and the free ones were delivering all the same result at all, but most of them were not working at all.

    I find it extremely difficult to find a “truly” free site, so I’m sticking to Google for now.

    Do you have any guide on how to use Google for plagiarism checking properly?

    Thanks and please keep me posted!

    Sincerely,
    Martha
    teachermartha82 [at] gmail [dot] com

  6. Hello,

    I am comparing two docx documents, one of which was converted from a pdf. They both contain addresses but the number part of the address is only occasionally highlighted along with the street name. For instance, when 111 S Main is on both documents, only S Main would be highlighted. Within the same list, 123 N Carson might fully be highlighted when on both documents. Is there something I am doing wrong?

    Thank you for your help!
    Kevin

  7. Hi everyone,

    I am a Mac user and use WCopyfind on my computer. All you have to do is download and install “Wine” for Mac and then you’ll be able to run WCopyfind on your Mac.

    Hope it helps,

    Luc

  8. Hi, everyone.
    I have a “newbie” question, which is simply how do I post a NEW question on the site? I can’t find any info at all on the site that explains how to do this (but maybe I am not looking in the right places).
    thanks,in advance.
    Steve

  9. Is there a way to make this program scriptable?

    I am looking to run it several thousand times via CLI, and I can’t find where in the source I should be modifying it. Can you do this/where should I look?

    1. The Copyfind version of this program is scriptable. Copyfind reads a script file, so you can easily make it do thousands of separate comparisons. At present, it only works under Windows (I wrote it in Visual C++), but I may try to port it to a more generic C++ (probably under eclipse). Having it work properly on a Linux box would make lots of sense. It’s just a time problem for me — a zillion things to do and endless pressure to publish or perish.

      Lou

    1. I’m afraid that I have no idea. The language selection affects punctuation and accents, but otherwise it should have relatively little effect on the comparison process. I suggest that you find the language that most closely resembles Latvian and see how it effective the comparison is. It should work pretty well, even if you select English.

      Lou

  10. I use the 4.1.1 version and I would like to update to the 4.1.4 version. However, when I try to run it, I get the a message saying that this is not a valid Win32 application (I took care to download the Win32 version, not the Win64). Same thing happens with version 4.1.2 and 4.1.3. I downloaded the 4.1.1 version again, and it works. Is there something that I can do to run the latest version? (I use Windows XP 32 bit, SP3; I know, I should upgrade…). Otherwise, thanks for this nice application!

    Peji

    1. Maybe I built it incorrectly. I just tried it on my Window 8 computer and it worked. Perhaps it has trouble under XP? I’m not sure how to enforce compatibility with older operating systems.

      Lou

  11. Version 4.1.4 does not work on my windows XP SP3 machine either. I have the same error was posted by peji, ‘Not a valid Win32 application’. It is good to hear that 4.1.1 works so I will give that a try.

    Apart from that thanks for making the program I expect it will be very useful.

  12. Wow! Just discovered and downloaded Wcopyfind. Tried it out on a small set of test tiles and it performed flawlessly. As a Code of Conduct officer, I am called on to deal with plagiarism, among many other issues. We have Turnitin available, but this software is far easier to use and allows me to work more independently.

    Thanks a million.

  13. Hello Sir,

    I need your help in plagiarism detection software. Can you please tell me how to run the source code? Also please tell me which algorithm you are using in checking the plagiarism?
    Can I contact you via email?

    1. Actually I am trying to run the source code in visual studio but I am unable to compile the source code but I am getting following errors:

      Error 1 error C1083: Cannot open include file: ‘afxwin.h’: No such file or directory e:\wcopyfind.4.1.4\wcopyfind\stdafx.h 23 1 WCopyfind
      2 IntelliSense: cannot open source file “afxwin.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 23 1 WCopyfind
      3 IntelliSense: cannot open source file “afxext.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 24 1 WCopyfind
      4 IntelliSense: cannot open source file “afxdisp.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 27 1 WCopyfind
      5 IntelliSense: cannot open source file “afxdtctl.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 32 1 WCopyfind
      6 IntelliSense: cannot open source file “afxcmn.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 35 1 WCopyfind
      7 IntelliSense: cannot open source file “afxcontrolbars.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 38 1 WCopyfind
      8 IntelliSense: #error directive: “include ‘stdafx.h’ before including this file for PCH” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 8 3 WCopyfind
      9 IntelliSense: identifier “WM_APP” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 12 31 WCopyfind
      10 IntelliSense: identifier “WM_APP” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 13 35 WCopyfind
      11 IntelliSense: identifier “WM_APP” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 14 36 WCopyfind
      12 IntelliSense: identifier “HWND” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 22 2 WCopyfind
      13 IntelliSense: identifier “CDialog” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 23 2 WCopyfind
      14 IntelliSense: not a class or struct name e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 28 30 WCopyfind
      15 IntelliSense: identifier “BOOL” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 35 10 WCopyfind
      16 IntelliSense: explicit type is missing (‘int’ assumed) e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 39 2 WCopyfind
      17 IntelliSense: expected a ‘;’ e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 40 1 WCopyfind
      18 IntelliSense: identifier “CALLBACK” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 18 WCopyfind
      19 IntelliSense: identifier “CListCtrl” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 59 WCopyfind
      20 IntelliSense: identifier “CString” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 77 WCopyfind
      21 IntelliSense: identifier “UINT” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 93 WCopyfind
      22 IntelliSense: cannot open source file “afxwin.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfindDlg.h 8 1 WCopyfind
      23 IntelliSense: cannot open source file “afxcmn.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfindDlg.h 9 1 WCopyfind
      24 IntelliSense: expected a ‘{‘ e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 28 7 WCopyfind
      25 IntelliSense: cannot open source file “afxinet.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 6 1 WCopyfind
      26 IntelliSense: cannot open source file “afxdialogex.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 7 1 WCopyfind
      27 IntelliSense: cannot open source file “afxwin.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 8 1 WCopyfind
      28 IntelliSense: explicit type is missing (‘int’ assumed) e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 29 1 WCopyfind
      29 IntelliSense: identifier “CWinApp” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 29 34 WCopyfind
      30 IntelliSense: expected a ‘{‘ e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 30 2 WCopyfind
      31 IntelliSense: identifier “m_dwRestartManagerSupportFlags” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 39 2 WCopyfind
      32 IntelliSense: identifier “AFX_RESTART_MANAGER_SUPPORT_RESTART” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 39 35 WCopyfind
      33 IntelliSense: identifier “BOOL” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 53 1 WCopyfind
      34 IntelliSense: identifier “INITCOMMONCONTROLSEX” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 58 2 WCopyfind
      35 IntelliSense: identifier “ICC_WIN95_CLASSES” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 62 20 WCopyfind
      36 IntelliSense: identifier “InitCommonControlsEx” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 63 2 WCopyfind
      37 IntelliSense: name followed by ‘::’ must be a class or namespace name e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 65 2 WCopyfind
      38 IntelliSense: identifier “AfxEnableControlContainer” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 68 2 WCopyfind
      39 IntelliSense: identifier “CShellManager” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 72 2 WCopyfind
      40 IntelliSense: identifier “pShellManager” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 72 17 WCopyfind
      41 IntelliSense: expected a type specifier e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 72 37 WCopyfind
      42 IntelliSense: identifier “SetRegistryKey” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 81 2 WCopyfind
      43 IntelliSense: identifier “_T” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 81 17 WCopyfind
      44 IntelliSense: identifier “m_pMainWnd” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 84 2 WCopyfind
      45 IntelliSense: identifier “INT_PTR” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 85 2 WCopyfind
      46 IntelliSense: class “CWCopyfindDlg” has no member “DoModal” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 85 26 WCopyfind
      47 IntelliSense: identifier “IDOK” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 86 19 WCopyfind
      48 IntelliSense: identifier “IDCANCEL” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 91 24 WCopyfind
      49 IntelliSense: identifier “FALSE” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 105 9 WCopyfind

  14. Pingback: Plagiarism: Additional Reference | English Subject Centre Archive
  15. I have been using WCopyfind quite successfully for my usage. In fact you might say I have been abusing it.
    When I download ebooks in batch mode I find that there are occasions that books are split into chapters and I am getting not only separate chapters but also the whole book. I then use WCopyfind to find the files with the (duplicated) separate chapter(s) and delete them keeping only the whole book.
    It would help me tremendously if there would be a way to sort the percentages, descending/ascending, of the left “Perfect Match” column. This could be in the program window, the html window, or preferably both.

    Is this something you could/would add to WCopyFind?

    Sincerely,
    Juergen

  16. I am a fan of your program, very nice solution!!!! In one of my various applications of it I got a message saying: “Error 8 Occurred During Report Initialization.” However, everything seems to be OK, the master report and its details. What does it mean then? Perhaps I am missing something in the “CopyFind” script, which is very simple:

    // Total number of documents
    Documents, 115
    // Reporting folder
    ReportFolder, C:\Users\Miguel\Desktop\Prueba\Details
    // Specify documents and comparison specifications
    Document, 1, .\Corpus_IPBES\key_texts
    // Comparison parameters
    PhraseLength,6 // default 6
    WordThreshold,6 // default 100
    SkipLength,20 // default 20
    MismatchTolerance,2 // default 2
    MismatchPercentage,80 // default 80
    BriefReport,0 // default 0 = No/False
    IgnoreCase,1 // default 0 = No/False
    IgnoreNumbers,0 // default 0 = No/False
    IgnoreOuterPunctuation,1 // default 0 = No/False
    IgnorePunctuation,1 // default 0 = No/False
    SkipLongWords,0 // default 0 = No/False
    SkipNonwords,0 // default 0 = No/False
    // Documents comparison anguage
    Locale, English
    PrepareForComparisons
    Compare,1,1 // Compare group 1 internally
    Done

  17. Hi Mr. Bloomfield
    I have a collection of about 1 million files in 2TB.
    Now I would like to use your software to find plagiats based on these.
    But the software is crashing due to faulty pdfs.
    So I would need a new version, where your software does not crash but continues and ignores the faulty pdfs or better writes the filenames into a log-files for later review.
    Is that possible?
    Thanks a lot
    Andreas in Switzerland

  18. Hi Dr. Bloomfield,

    I also sent you an email about this; but can you or someone assist me in understanding the metrics readout for the document L and R comparisons, please? I want to make sure I am interpreting these numbers correctly.

  19. Hi, I’m a highschool teacher from Argentina. I LOVE this program, it really makes my job faster and it made my 2020 a little easyer when I found it. But today is not working with any .docx file. Do you know how can I fix it? I really need it

    1. I haven’t changed the program for several years, so I don’t know what has gone wrong for you. Perhaps there has been some change in the .docx file format or on your computer. My code reads the .docx format simplistically and it may not handle some .docx files properly. If I had infinite time and resources, I would find more information about the .docx file structure and be sure that I follow it perfectly.

      Perhaps you can convert the .docx files into another format that my code reads more reliably. A .txt file is ideal.

      Lou

      1. Thank you for finding this problem and identifying its cause. I haven’t worked on the code for ages and it looks like I’m not handling the character sets properly. I’ve transitioned some of my various web sites to modern character sets (e.g., utf-8 or utf-16), but I think I’ve left WCopyfind behind. I’ve put that change on my to-do list and will try to get to it soon. The usual problem: there are only 24 hours in a day. Plus, I don’t have the crazy energy I had when I was a kid. Ah, the story of life…

        Lou

  20. Hi,
    I have version 4.1.5. I try comparing documents, and I systematically get the following message: “Error: File cannot be opened, perhaps because it is already opened by other software.” Interestingly, this only happens when I compare documents in French. Things seem to work fine when I compare English documents. I don’t know what the problem is. I have already compared documents in French before and I had no issue. Can you help me?
    Thanks,
    Charles

  21. Hi Lou,

    I think I found the problem. It seems to have something to do with the encoding of some “É” and “é” in the file names. Once these “É” and “é” replaced with “E” or “e” (without the accent), it works fine. It looks like not all the “É” and “é” cause problems, but some of them do. Again, probably something about the encoding.

Leave a Reply