Saturday, August 10, 2013

PLAGIARISM--ALL My Posts in One Place, From Earliest to Latest

0. [Turnitin is a computer application, provided in many universities I believe, that takes a paper and searches the net (ala Google) for sources that are similar to what is written in the paper. You can discover others who have quoted you, and you can determine how much of your work is similar to other work--even excluding parts of your work that are in quotation marks and the bibliographic notes.] I ran the manuscript of my new book The Scholar's Survival Manual through Turnitin, and after excluding all the people who quoted from my blog, or sources I had acknowledged, I was at the 1% level at most.


0.1 Another example: I ran a page from a filed dissertation and received 15% similarity. Looked at carefully, and given the information from Turnitin, we have the following situation. On that one page:

Joe Schmoe had almost said the obvious when he wrote "junk is junk....." (p. 145 of Schmoe)

But it turns out that there is another source, PostSchmoe, which says exactly that whole line (what is underlined above). That is, while the quotation from Schmoe is referred to and quotation marks are used, the fact that the whole sentence is taken from PostSchmoe is not referenced nor are there quotation marks around the underlined sentence. This is not acceptable. It is ok to use the quotation from Schmoe with reference and quotation marks, and you can make up your own sentence around it. It is ok to indicate that your original source for the Schmoe quotation "junk is junk...." is from PostSchmoe, but you had better check the Schmoe source to be sure PostSchmoe quoted Schmoe correctly. You can even leave out PostSchmoe if you go to the original source, Schmoe, and are assured that Schmoe is quoted correctly by PostSchmoe--although I am not sure this is really acceptable.


I have deliberately not used "plagiarism" and "theft" and "property" in this posting. I leave that to committees at universities, and to theorists about the nature of copying.

An excellent source on copying is provided by the Harvard College Writing Program .

1. I have put a number of student papers through Turnitin, a program that finds similarity among sources and papers. It is possible to eliminate quoted passages and the bibliography and even similarities that are say 15 words or less. I get anywhere from 1% similarity to 5, 6, 7, 12, 27, 31% similarity after removing quotes, bibliography, and 15 words or less similar in a passage. You have to look at the paper to know the significance of the copying. Almost always 1% is never a problem. But, for example the 5% paper copies passages from a variety of internet sources, the 6% paper seems to coincide with lots of student papers but no other sources, and those coincidences are not serious. The 31% paper quotes long passages from authoritative sources, and all that would be needed is quotation marks or indents.

2. Namely, if you need to give exact quotes from a source, as in a government document or a writer, use quotation marks. If you borrow sentences from a source, and they are not meant to be authoritative you will need to use your own formulation or put quotes there (usually, The sky is a golden yellow. need not be in quotes, but if you are referring to a line in a song, or to a description given on a website, you must give source and use quotes).

You cannot copy word for word and expect to fulfil your obligations by giving a reference. You MUST have quotation marks if it is word for word, and if you are paraphrasing a source should be given.

3. To one class, I wrote: Had I used Turnitin before sending in your grades, I would have had to send many of you to the university committee that is concerned with academic integrity. And as indicated in the syllabus, you would likely have received an F in the course. I had not thought to Turnitin the papers, since none of your copying was so obvious to me.

Some of the students were scrupulous (Turnitin similarity is 1% or less after quotations and bibliography are taken out), so this is not a blanket statement, by far.


4. I am not sure what is the level of most papers that are original and use quotation marks around materials that is taken from elsewhere. But what is striking to me is that there are other forms of borrowing that ought be acknowledged. So if you adopt a periodization of an institution's history, you probably want to indicate your source. So if you are more or less reproducing a map, but now redone in your own hand, so to speak, you probably need to acknowledge the source. And if you are making an argument that is derived from another source's argument, in your own words, you want to acknowledge that other source.

5. For students, it makes sense to put your paper through something like Turnitin to know how it might be rated. If you've done all the right things, and you find that you are borrowing more than you would expect, you'll need to think about how to acknowledge that borrowing without saying that you got it from there. "After writing this essay, I found a similar argument in..." might work.

For the most part, none of this is subtle. And if you have forgotten a source but you are now channeling its language, the Turnitin test will help to keep you honest. Few people can channel more than a few sentences, in any case.
In sum: Practical advice, repeating some of the above:

1. You can put your paper through Turnitin (I believe this is available to students before they hand in papers, but I am not sure) to find places where you quoted from sources, but did not use quotation marks, or where you might put in a source reference if you have not done so already.

2. If you are copying more than few words from a source, quotation marks and reference are needed. If those few words are distinctive, quotation marks and references may be needed.

3. Internet sources are almost sure to be detected. Moreover, I have discovered that nowadays they often dominate a paper's references. That does not help the authority of your paper.

4. If you are taking a general structure from a source--for example the four periods of citizen participation over the last 50 years, or an argument--you want to give a reference. "This division of historical periods is taken from X", "Here I follow the argument given by Z"

5. If you discover a source that mirrors what you are saying, after you have written it, you might say, "After drafting this passage, I found that Y had made similar points. However...."

6. Don't borrow sentence structures, etc. Don't try to compose a paper out of bits and pieces of other papers and sources. You can borrow and use whatever you want to if you use quotation marks and references. But if it appears that the paper is a patched-together set of quotations, you won't do well.

University rules re academic integrity will force your instructor to send your work the a university office to review such apparent violations. Your instructor may call you in to discuss. But the marked up Turnitin version of your paper, which indicates sources and quoted passages (say, just those without quotation marks) is in effect your being caught red-handed. I have never heard a good excuse for such.

Ia//In my university, quoting from the regulations:

"Plagiarism in a graduate thesis or dissertation. ... Expulsion from the university when discovered prior to graduation; revocation of degree when discovered subsequent to graduation."

Hence, doctoral students do not want to think that any yet-undetected plagiarism in their schoolwork will never have consequences. Their final grades may be in, and their dissertation filed and awarded, but ... if it is discovered that plagiarism occurs in their thesis/dissertation, they are likely to have their degree revoked, and if in their schoolwork prior to the dissertation they might well be expelled. Good academic integrity habits have to be inculcated from the beginning of their graduate work, so they do not set themselves up for future disaster. Dissertations are public documents, and anyone might discover plagiarism--although it is remarkable that anyone at all will read the dissertation after it is filed. Of course, once plagiarism is detected, then it will be searched for in other documents. (Martin Luther King, Jr.'s, story is telling and troubling.)

There have been several very public cases in Europe of distinguished politicians having their doctorates revoked. This site has various German dissertations that were revoked, including a map of the plagiarized pages:

If you are interested in more complex issues, the Committee on Publication Ethics has a large number of cases.

Percentage of "similarity" from almost all the papers I received this semester--see below.

Note that each paper needs to be examined carefully since the Originality Report
from Turnitin does not tell you the nature of the copying. Also, it does not tell you about
appropriating an argument without giving credit. And if the quotations are indented
they are treated as if they do not have quotation marks. In any case, these percentages
are for copying that does not havequotation marks, it does not touch the bibliography,
and it says that copying that is 10 words or less in sequence is not to be counted
in the percentage. One can adjust all of these, and again one must look at the paper.
Often, one finds a reference but no quotation marks around the copied passage.
Sometimes, arguments or structures are not referencedand that will not be picked
up by Turnitin.
I have discovered that even in doctoral theses, there is perhaps 5 to 8% or more similarity,

usually because of copying without quotation marks. I am not sure there would be warrant to
revoke these doctorates, but there's no reason to be in this position.  Slide past blank space to next posting.


II//Mathematics of Turnitin Originality Scores:  Say that you have C(N) passages of length N that are copied. The total number of copied words of length N passages is just N times C(N), and the Percentage of the W words-long paper that is similar, where we ignore passages of n words or less that are similar is just

Sim(N)=(1/W)*100* SUM N to W (n x C(n)) percent.
dSim(N)/dN = - (100/W) N C(N)

I checked on longish paper with a high similarity score and
Sim(N)= 0.0011 N x N - 0.4N + 49
was a decent fit from N=5 to N=200, and so
dSim(N)/dN= 0.0022 N - 0.4 (obviously this makes no sense for large N).
The basic point is that you start out with 49%, and lose about 0.4% for each additional word, while at the 25 word cutoff, you lose 0.35%/word.

Two different papers, Originality Scores vs. Length of quotation that does not count
This one is for a paper that had a 29% score at 10 words. It should be monotonic but the Originality Scores I get are not? Why? The chart at the bottom had a score of 42% for 10 words and is the example in the main text.

This one below is for the example in the text.


III//In reading publicly available documents, such as dissertations, or speeches by prominent individuals, or scholarly articles, I am sometimes tempted to check them out on Turnitin. Perhaps their author seems cavalier about the problem of copying. Perhaps you have a run of students all of whom seem to have no idea of scholarly practice--something I am told you learn in high school--about quotation and reference.

For example, here below are two examples of similarity of more than one or two words, where the source is given but the fact of exact quotation of a phrase is not indicated by quotation marks. If it happens once or twice in a piece of work, it's likely to be carelessness. But if it is pervasive, it indicates that the author does not fully appreciate the consequences of such unacknowledged exact quotation. We see it often when students are writing in a language that is not native. But for scholars, there is at least the problem of having your work questioned, your degree rescinded!, your reputation besmirched. I will call this brick wall plagiarism, as suggested by a friend, since the Turnitin marked up manuscript looks like a brick wall. See below.


Also, Secondary Quoting: You see a useful quote in source S1 taken from source S0. You had better go back to S0 to be sure that S1 has quoted S0 correctly. You might say, S0 as quoted by S1, but you then are depending on S1 being scrupulous. And keep in mind the problem of using quotation marks, for S1 may not have done that correctly. Scrupulous scholars (actually, any properly trained writer) are likely to indicate that they were alerted to the S0 quote by S1's reference and quotation (with or without quotation marks--but no need to indicate that!).

---The source is first, the text checked is second. It may be that both have copied from the same source without quotation marks, or we have a S1/S0 situation.------------------------------------------------------------------------------------

IV//A distinguished colleague wrote me:

" [I]t is simply not always possible to paraphrase somebody (even in English there are only so many synonyms for many words and they all don't sound right). But I don't think it is necessary to put quotes around things that aren't paraphrased.
"I suspect this is a spectrum (like so much in the world). I'm not sure how many words from one source I would consider to be ok to use without quotation marks--somewhere between 30 and 70 I think (again as long as sources are given).

"SO the number of sources is very relevant to me, in that I care about the ratio of sources to identical words. Many authors and a few identical words--no problem. Using a few authors with a lot of identical words requires quotation marks."
I was surprised that someone would disagree with me on this. So, the distinguished colleague has made me think twice. I am not convinced their argument is correct, but perhaps others will find it persuasive.

V//Here is a repeat in different words of an earlier posting.

Last semester I encountered a curious form of excuse for what is manifestly copying without quotation marks but with a reference. Perhaps all of you know this already and I am latecomer?


Perhaps several sentences are copied, but transitions or adjectives or ... are paraphrased (but most of the main text is copied verbatim). A footnote gives a reference, although perhaps not the page. But there are no quotation marks around the verbatim phrases. I was told by one person that this was standard in their field (health), reference but no quotation marks, that is how they were taught. But I could not find evidence for this norm, and in any case writers outside the field do the same thing in papers and dissertations.

Most students and scholars I spoke with knew this was a no-no, and learned that in high school. They tell me the copying practice is endemic to consultants and bureaucrats, but I don't know.

I call this "brick wall" plagiarism, since if you put the paper through Turnitin the verbatim phrases are highlighted in color, and so the paragraph looks like a brick wall, the now paraphrased parts being the "mortar." Here the copied text is covered by blue paint (the source is above, the paper is below), and the mortar for the copied text is in silver.

It would be useful for us to emphasize to students that this is not acceptable.

One has to be careful in using these similarity scores, since copied passages that are indented and footnoted are "similar", but this is not plagiarism. Here, also, I have set it that passages in quotation marks do not count for similarity, and similarity less than 10 words in sequence is set not to count (but you can change both of these readily). And if someone has copied the work in question, you also get this original taken as "similar" but obviously the copying is the reverse order. You have to look at the Turnitin markups--they also give you the "original" source. I also did a sample of dissertations and theses and found this problem legion (and so set people up to have their degrees rescinded, although I suspect this is not what universities would consider actionable). The chart below indicates the distribution of similarity scores in a corpus of 40 student papers. (More than half have scores greater than 10%.) One-time/ten-pages probably does not matter, and is just sloppy work. More is a real problem.

VI//Copying and Sampling and Influence in the Arts and Music. Authority in Scholarship. Plagiarism as Sloth...

It is interesting how artists and composers and jazz are filled with various sorts of imitation and copying, and that is acceptable. Never is it plagiarism unless it is total copying.If you take a passage from another musical work, you might be thought to copy but it is what you do with that passage that is crucial. Plagiarism has little to do with art, music, performance. Steal This Music is a title of a book by a colleague. Think of current practice with sampling etc.

It's a no-no to publish a fiction or nonfiction book where you take the plot or a good hunk of text from elsewhere without references and quotation marks. People usually say that they had research assistants, or they have eidetic memory, as an excuse. It usually is not credible. If you are doing a reworking of Dante's Inferno, as does Gloria Naylor in Linden Hills, you don't have to mention it at all, by the way. The same for Jane Smiley in A Thousand Acres and King Lear. The novelist and essayist Jonathan Lethem has a pastiche article on pastiche in Harper's Magazine, February 2007.

But academics/scholarship demand sources for ideas and for quotations. This comes out of a tradition of biblical commentary and legal discourse, giving authority for what you say. Namely, it's not about stealing so much as being authoritative, where your own authority depends on how you stand on the shoulders of giants (see Merton's book, On the Shoulders of Giants). To paraphrase without giving a source of ideas, to copy text without quotations marks and sources, hurts you because your ideas are then just yours, and why should I take you seriously. Idiosyncratic originality per se is not much of an academic virtue. Scholarly originality takes what every knows (and you have referred to it in your review of the literature) and moves forward the argument and discovery.

There is a problem of copying to avoid work, what we see in students and sometimes in dissertations(Germany has had a bunch lately, politicians and social science degrees, but Merkel is a card carrying physical chemist with an earlier career). That is what we want students to avoid. Consultants and bureaucrats seem to copy and give few references if it serves the work they are doing in internal memos etc. No one seems to be too concerned about that, but those standards do not apply to schoolwork or scholarly work.

VII//1. Perhaps this is a useful analogy: You marry, and you yourself have a notion of a more open marriage, while you spouse is more conventional about matters of faithfulness and monogamy. You promise to be faithful to your spouse, even though you have a notion of faithfulness that is different than your spouse's.  For you and your spouse fulfil your conjugal duties on Friday and Monday, happily, at least as often as most married couples. On Wednesday and Saturday late afternoons you meet with other partners, telling your spouse you have to stay late at work on Wednesdays and you go to horror movies on Saturdays (your spouse hates horror movies). And your spouse has little reason to check up on you, since your work demands that you stay late other days as well, and your knowledge of the latest horror movies is legion among your friends. 

When it is discovered that you seem to be working less late than promised on Wednesdays, and your knowledge of horror movies is taken from the New York Times, you plead, either that your spouse never said anything about her notion of faithfulness (and it is old fashioned, to boot), or you surely do work extra on Wednesdays and you do keep up with horror films. ...

Analogously: So you surely put stuff in your own words in your papers (on Friday and Monday), but regularly (on Wednesday and Saturday) copy with attribution parts of the referred to sources, but no quotation marks. 

2. fROM:

"The problem is that the PhD system is designed for people who intend to become researchers. For these cases, plagiarism is not at all a common problem. You are expected to published your research, and you will not have a successful career unless it is widely read and cited. That gives lots of opportunities to get caught, and the penalties for plagiarism are a huge deterrent.
"To the extent you find plagiarism, it's generally people who do not want a research career, but instead view the PhD simply as an obstacle on the road to a teaching (or other) career. Probably the community should scrutinize these sorts of theses more carefully, but it can be hard to work up the energy to do so when most of them are OK, and when these theses really don't matter much for the research world.
"The German politicians are pretty much the worst case scenario. In the US, the stereotypical case is educational administrators. Typically, you have a distinguished person who starts to feel the need for a PhD. Perhaps it's because they associate with academics and feel looked down upon, or perhaps it's because an academic endorsement would make the public value their expertise more. This student is very smart and accomplished, and nobody suspects them of any dishonesty. However, they are also very busy, often working on a PhD while pursuing other projects as well, and academic research is not a priority. At some point, they succumb to pressure and start taking shortcuts. Probably it starts with small things, but the shortcuts gradually grow larger. They rationalize that the thesis doesn't really matter anyway, because they have no intention of following an academic career track. After all, they have the knowledge and experience, and they deserve the PhD title, so what difference does one document make anyway? Meanwhile, the advisor probably doesn't spend that much time working with the student, and has no reason to suspect anything. The advisor really ought to be extra careful in cases like this, but that would seem like an insult to the student, so it's easiest just to trust them.
"So my take on this is that plagiarism is not as widespread as news stories might suggest. It's just particularly likely to happen in cases where it would attract media attention." shareimprove this answer

3. My university's guidance for graduate students