This web site provides software for detecting the reuse of written language across thousands or even millions of documents. In addition to helping to reduce the impact of plagiarism on our world, that software can study the migration of the written word through many interesting processes of civilization and society. As an academic, I am always interested in collaborations that will lead to peer-reviewed publications. If you have an interesting project that involves sifting through vast collections of documents for reused language, please contact me about a joint endeavor.
— Lou Bloomfield, Professor of Physics, University of Virginia
Our tiny company is producing EarJellies Earplugs, made from a shape-memory material (MemorySil®) that I invented at the University of Virginia. You can obtain these earplugs on Amazon or at our online store, EarJellies.com — Lou Bloomfield
About Plagiarism:
Plagiarism is the misrepresentation of authorship. Typically, words and ideas conceived by one person are attributed to another person. Plagiarism is a form of intellectual theft or fraud and it undermines the intellectual economy that values ideas, words, and understanding. Even when an act of plagiarism appears superficially a victimless crime, it nonetheless devalues the currency of human thought and thereby weakens society.
In the most common form of plagiarism, one author’s words are inserted verbatim in the work of a second author, without quotation, acknowledgement, or attribution. But there are many other forms of plagiarism, including some that are often accepted or even encouraged by society, notably ghostwriting, speech-writing, and paraphrasing.
Plagiarism is not a black-and-white issue because many of our ideas and words derive from those of others, and what constitutes true intellectual theft or fraud often involves some degree of subjectivity. Moreover, each context has its own rules regarding the need for accurate attribution of authorship and those rules are not always obvious to everyone. Reasonable people may even disagree about those rules, so defining them clearly and explicitly is always a good idea.
What this Site Provides:
Software for Detecting Plagiarism
WCopyfind is an open source windows-based program that explores a collection of documents, looking for matching language. If you have a collection of documents that you think might contain plagiarized content, you can check them quickly with this free software.
Thoughts and Ideas about Plagiarism, Education, and Society
Lou Bloomfield’s writings about issues relating to plagiarism, writing, scholarship, authorship, credentialing, integrity, and ethics.
Can I have the Programme?
The program is available for download at: WCopyfind
Este programa, tiene bases de datos únicamente en inglés?, o en qué idiomas no más trabajan?
Gracias
José
(José Antonio wrote: This program has databases in English only? Or in which languages does it work?)
Este programa no tiene bases de datos en cualquier idioma, Inglés o Español. Se comparan los documentos que tenga en su computadora. Debe ser capaz de comparar los documentos en español tan bien como lo compara documentos en Inglés. Yo sugiero que usted tiene esos documentos .html, .docx, .txt, o basados en texto .pdf.
Lou Bloomfield
(This program has no databases in any language, English or Spanish. It compares documents that you have on your computer. It should be able to compare Spanish documents just as well as it compares English documents. I suggest that you have those documents as .html, .docx, .txt, or text based .pdf files.)
Lou Bloomfield
i need a software which compares Greek essays. Is this appropriate?
WCopyfind should work with Greek and many other languages, as long as the files are in .DOCX, .HTML, or .PDF format (and the PDF is a text-based PDF, rather than an image-based one). WCopyfind can read and analyze any of the UTF-8 characters in those file formats, so it shouldn’t care whether the documents are in any particular language. You should select “Greek” as the language, so that WCopyfind distinguishes between letters and punctuation properly. Hopefully, it will do what you want.
Lou
Lou,
If I download and then use your application to compare two documents, do you then have record of those documents?
No, the documents remain entirely on your computer. They are never sent anywhere else. In fact, you can run WCopyfind without any internet connection.
Lou
I found a reference or two to “Copyfind” for linux. Is this available? Thanks
I have written a command-line based version of my software, Copyfind, but I haven’t had a chance to post it on this site yet. It’s written in Microsoft C++ and contains a small amount of windows-specific stuff. My hope is to remove even that and make the program machine-independent. But right now I have so many things to do that I’m not sure when I’ll get to it.
Lou
Lou,
I’m eager to get Copyfind working on Linux. If you’ll share your source code I’ll cheerfully scrub out any Windows-isms I find. But I’d rather not duplicate work you’ve already done.
Was the Linux-based version ever posted to the site? Is that what all of the source code files are for? Sorry, I’m not well versed in linux yet.
I finally posted Copyfind — a version of this software that runs in a console window and can be scripted. It still needs to run under Windows, since I haven’t extracted all the windows-specific C++ extensions, but it’s much more coder-friendly.
Lou
As a teacher, I have been using Wcopyfind for more than five years. In the current situation, I have Dell 64 bit system and most of the docs are .docx (word 2007/2010). The software hangs if i load more files, say, 5 or more.
If I use version 2.6 compatible on my desktop, the file compare side-by-side window gives characters other than the report text.
please help.
I am aware of a bug in the reading of .DOCX files, but haven’t had time to fix it. I’ll try to get to it as soon as I can get my head above water.
Lou
Do you know if anyone has used Parallels or some other Windows emulation software to run your software on a Mac successfully?
Thanks for the fantastic program! I am currently using it in some text analysis research and was wondering if there is a simple way to tell the program that I am only interested in comparing “old” documents with “new” and to ignore “new” to “new” comparisons? Again, great program!
Is there a database of papers that can be downloaded to use in your program? I want to compare my students work to each other, but I also want to make sure they didn’t plagiarize from internet sources. Any thoughts? Thanks!!
Hi Louis
recently copyfind has been cutting words out of the reports. When I look at the side by side or the single page I have noticed that the sentences are incomplete, almost as if it were finishing off at the word wrap rather than the end of the sentence. Is this a known bug?
BTW I’m the one who had the problem with memory overun with docx files a few years ago 🙂 I can post you examples if you like
Matt
Hi Lou,
This is a great idea for a site. I think there’s not enough “good” info out there on how to properly check for plagiarism.
I came across this post here, which reviews some of the free tools online:
http://www.grammarcheck.net/review-10-sites-that-check-for-plagiarism/
I tried them out (most of them) and the free ones were delivering all the same result at all, but most of them were not working at all.
I find it extremely difficult to find a “truly” free site, so I’m sticking to Google for now.
Do you have any guide on how to use Google for plagiarism checking properly?
Thanks and please keep me posted!
Sincerely,
Martha
teachermartha82 [at] gmail [dot] com
I am interested in learning if this program will work on a mac–thanks
Hello,
I am comparing two docx documents, one of which was converted from a pdf. They both contain addresses but the number part of the address is only occasionally highlighted along with the street name. For instance, when 111 S Main is on both documents, only S Main would be highlighted. Within the same list, 123 N Carson might fully be highlighted when on both documents. Is there something I am doing wrong?
Thank you for your help!
Kevin
Hi everyone,
I am a Mac user and use WCopyfind on my computer. All you have to do is download and install “Wine” for Mac and then you’ll be able to run WCopyfind on your Mac.
Hope it helps,
Luc
Hi, everyone.
I have a “newbie” question, which is simply how do I post a NEW question on the site? I can’t find any info at all on the site that explains how to do this (but maybe I am not looking in the right places).
thanks,in advance.
Steve
Is there a way to make this program scriptable?
I am looking to run it several thousand times via CLI, and I can’t find where in the source I should be modifying it. Can you do this/where should I look?
The Copyfind version of this program is scriptable. Copyfind reads a script file, so you can easily make it do thousands of separate comparisons. At present, it only works under Windows (I wrote it in Visual C++), but I may try to port it to a more generic C++ (probably under eclipse). Having it work properly on a Linux box would make lots of sense. It’s just a time problem for me — a zillion things to do and endless pressure to publish or perish.
Lou
Hi,
if I need to use Latvian language – what language I have to select?
Thanks.
I’m afraid that I have no idea. The language selection affects punctuation and accents, but otherwise it should have relatively little effect on the comparison process. I suggest that you find the language that most closely resembles Latvian and see how it effective the comparison is. It should work pretty well, even if you select English.
Lou
I use the 4.1.1 version and I would like to update to the 4.1.4 version. However, when I try to run it, I get the a message saying that this is not a valid Win32 application (I took care to download the Win32 version, not the Win64). Same thing happens with version 4.1.2 and 4.1.3. I downloaded the 4.1.1 version again, and it works. Is there something that I can do to run the latest version? (I use Windows XP 32 bit, SP3; I know, I should upgrade…). Otherwise, thanks for this nice application!
Peji
Maybe I built it incorrectly. I just tried it on my Window 8 computer and it worked. Perhaps it has trouble under XP? I’m not sure how to enforce compatibility with older operating systems.
Lou
Version 4.1.4 does not work on my windows XP SP3 machine either. I have the same error was posted by peji, ‘Not a valid Win32 application’. It is good to hear that 4.1.1 works so I will give that a try.
Apart from that thanks for making the program I expect it will be very useful.
Wow! Just discovered and downloaded Wcopyfind. Tried it out on a small set of test tiles and it performed flawlessly. As a Code of Conduct officer, I am called on to deal with plagiarism, among many other issues. We have Turnitin available, but this software is far easier to use and allows me to work more independently.
Thanks a million.
Sloppy typing – I meant “a small set of test files.” Apologies to all.
Hello Sir,
I need your help in plagiarism detection software. Can you please tell me how to run the source code? Also please tell me which algorithm you are using in checking the plagiarism?
Can I contact you via email?
Actually I am trying to run the source code in visual studio but I am unable to compile the source code but I am getting following errors:
Error 1 error C1083: Cannot open include file: ‘afxwin.h’: No such file or directory e:\wcopyfind.4.1.4\wcopyfind\stdafx.h 23 1 WCopyfind
2 IntelliSense: cannot open source file “afxwin.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 23 1 WCopyfind
3 IntelliSense: cannot open source file “afxext.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 24 1 WCopyfind
4 IntelliSense: cannot open source file “afxdisp.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 27 1 WCopyfind
5 IntelliSense: cannot open source file “afxdtctl.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 32 1 WCopyfind
6 IntelliSense: cannot open source file “afxcmn.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 35 1 WCopyfind
7 IntelliSense: cannot open source file “afxcontrolbars.h” e:\WCopyfind.4.1.4\WCopyfind\stdafx.h 38 1 WCopyfind
8 IntelliSense: #error directive: “include ‘stdafx.h’ before including this file for PCH” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 8 3 WCopyfind
9 IntelliSense: identifier “WM_APP” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 12 31 WCopyfind
10 IntelliSense: identifier “WM_APP” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 13 35 WCopyfind
11 IntelliSense: identifier “WM_APP” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 14 36 WCopyfind
12 IntelliSense: identifier “HWND” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 22 2 WCopyfind
13 IntelliSense: identifier “CDialog” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 23 2 WCopyfind
14 IntelliSense: not a class or struct name e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 28 30 WCopyfind
15 IntelliSense: identifier “BOOL” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 35 10 WCopyfind
16 IntelliSense: explicit type is missing (‘int’ assumed) e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 39 2 WCopyfind
17 IntelliSense: expected a ‘;’ e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 40 1 WCopyfind
18 IntelliSense: identifier “CALLBACK” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 18 WCopyfind
19 IntelliSense: identifier “CListCtrl” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 59 WCopyfind
20 IntelliSense: identifier “CString” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 77 WCopyfind
21 IntelliSense: identifier “UINT” is undefined e:\WCopyfind.4.1.4\WCopyfind\FileDropListCtrl.h 35 93 WCopyfind
22 IntelliSense: cannot open source file “afxwin.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfindDlg.h 8 1 WCopyfind
23 IntelliSense: cannot open source file “afxcmn.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfindDlg.h 9 1 WCopyfind
24 IntelliSense: expected a ‘{‘ e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.h 28 7 WCopyfind
25 IntelliSense: cannot open source file “afxinet.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 6 1 WCopyfind
26 IntelliSense: cannot open source file “afxdialogex.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 7 1 WCopyfind
27 IntelliSense: cannot open source file “afxwin.h” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 8 1 WCopyfind
28 IntelliSense: explicit type is missing (‘int’ assumed) e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 29 1 WCopyfind
29 IntelliSense: identifier “CWinApp” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 29 34 WCopyfind
30 IntelliSense: expected a ‘{‘ e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 30 2 WCopyfind
31 IntelliSense: identifier “m_dwRestartManagerSupportFlags” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 39 2 WCopyfind
32 IntelliSense: identifier “AFX_RESTART_MANAGER_SUPPORT_RESTART” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 39 35 WCopyfind
33 IntelliSense: identifier “BOOL” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 53 1 WCopyfind
34 IntelliSense: identifier “INITCOMMONCONTROLSEX” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 58 2 WCopyfind
35 IntelliSense: identifier “ICC_WIN95_CLASSES” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 62 20 WCopyfind
36 IntelliSense: identifier “InitCommonControlsEx” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 63 2 WCopyfind
37 IntelliSense: name followed by ‘::’ must be a class or namespace name e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 65 2 WCopyfind
38 IntelliSense: identifier “AfxEnableControlContainer” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 68 2 WCopyfind
39 IntelliSense: identifier “CShellManager” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 72 2 WCopyfind
40 IntelliSense: identifier “pShellManager” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 72 17 WCopyfind
41 IntelliSense: expected a type specifier e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 72 37 WCopyfind
42 IntelliSense: identifier “SetRegistryKey” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 81 2 WCopyfind
43 IntelliSense: identifier “_T” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 81 17 WCopyfind
44 IntelliSense: identifier “m_pMainWnd” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 84 2 WCopyfind
45 IntelliSense: identifier “INT_PTR” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 85 2 WCopyfind
46 IntelliSense: class “CWCopyfindDlg” has no member “DoModal” e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 85 26 WCopyfind
47 IntelliSense: identifier “IDOK” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 86 19 WCopyfind
48 IntelliSense: identifier “IDCANCEL” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 91 24 WCopyfind
49 IntelliSense: identifier “FALSE” is undefined e:\WCopyfind.4.1.4\WCopyfind\WCopyfind.cpp 105 9 WCopyfind
I have been using WCopyfind quite successfully for my usage. In fact you might say I have been abusing it.
When I download ebooks in batch mode I find that there are occasions that books are split into chapters and I am getting not only separate chapters but also the whole book. I then use WCopyfind to find the files with the (duplicated) separate chapter(s) and delete them keeping only the whole book.
It would help me tremendously if there would be a way to sort the percentages, descending/ascending, of the left “Perfect Match” column. This could be in the program window, the html window, or preferably both.
Is this something you could/would add to WCopyFind?
Sincerely,
Juergen
Sorting the table of results make sense and Ill put it on my to do list.
Lou
I am a fan of your program, very nice solution!!!! In one of my various applications of it I got a message saying: “Error 8 Occurred During Report Initialization.” However, everything seems to be OK, the master report and its details. What does it mean then? Perhaps I am missing something in the “CopyFind” script, which is very simple:
// Total number of documents
Documents, 115
// Reporting folder
ReportFolder, C:\Users\Miguel\Desktop\Prueba\Details
// Specify documents and comparison specifications
Document, 1, .\Corpus_IPBES\key_texts
// Comparison parameters
PhraseLength,6 // default 6
WordThreshold,6 // default 100
SkipLength,20 // default 20
MismatchTolerance,2 // default 2
MismatchPercentage,80 // default 80
BriefReport,0 // default 0 = No/False
IgnoreCase,1 // default 0 = No/False
IgnoreNumbers,0 // default 0 = No/False
IgnoreOuterPunctuation,1 // default 0 = No/False
IgnorePunctuation,1 // default 0 = No/False
SkipLongWords,0 // default 0 = No/False
SkipNonwords,0 // default 0 = No/False
// Documents comparison anguage
Locale, English
PrepareForComparisons
Compare,1,1 // Compare group 1 internally
Done
I’ve been so busy with my day job that I haven’t had time to work on these programs for ages. When I have a chance, I’ll look at that error message. It’ll be a while though…
Lou
Good article,i appreciate your effort.i suggest plagiarism checker tool for you that makes your content unique.Thanks for sharing your experience .
https://smallseotools.com/plagiarism-checker/
Hi Mr. Bloomfield
I have a collection of about 1 million files in 2TB.
Now I would like to use your software to find plagiats based on these.
But the software is crashing due to faulty pdfs.
So I would need a new version, where your software does not crash but continues and ignores the faulty pdfs or better writes the filenames into a log-files for later review.
Is that possible?
Thanks a lot
Andreas in Switzerland
Are you using pdftotext.exe? It’s a much better reader of PDFs than my intrinsic code. See the instructions for WCopyfind for (brief) discussion of pdftotext.exe.
Lou
Hi Dr. Bloomfield,
I also sent you an email about this; but can you or someone assist me in understanding the metrics readout for the document L and R comparisons, please? I want to make sure I am interpreting these numbers correctly.
Hi, I’m a highschool teacher from Argentina. I LOVE this program, it really makes my job faster and it made my 2020 a little easyer when I found it. But today is not working with any .docx file. Do you know how can I fix it? I really need it
I haven’t changed the program for several years, so I don’t know what has gone wrong for you. Perhaps there has been some change in the .docx file format or on your computer. My code reads the .docx format simplistically and it may not handle some .docx files properly. If I had infinite time and resources, I would find more information about the .docx file structure and be sure that I follow it perfectly.
Perhaps you can convert the .docx files into another format that my code reads more reliably. A .txt file is ideal.
Lou
Thank you for finding this problem and identifying its cause. I haven’t worked on the code for ages and it looks like I’m not handling the character sets properly. I’ve transitioned some of my various web sites to modern character sets (e.g., utf-8 or utf-16), but I think I’ve left WCopyfind behind. I’ve put that change on my to-do list and will try to get to it soon. The usual problem: there are only 24 hours in a day. Plus, I don’t have the crazy energy I had when I was a kid. Ah, the story of life…
Lou
Hi,
I have version 4.1.5. I try comparing documents, and I systematically get the following message: “Error: File cannot be opened, perhaps because it is already opened by other software.” Interestingly, this only happens when I compare documents in French. Things seem to work fine when I compare English documents. I don’t know what the problem is. I have already compared documents in French before and I had no issue. Can you help me?
Thanks,
Charles
Hi Lou,
I think I found the problem. It seems to have something to do with the encoding of some “É” and “é” in the file names. Once these “É” and “é” replaced with “E” or “e” (without the accent), it works fine. It looks like not all the “É” and “é” cause problems, but some of them do. Again, probably something about the encoding.