Wikipedia:Spotting possible copyright violations

From Wikipedia, the free encyclopedia.

Jump to: navigation, search

This is a guide to spotting violations of the Wikipedia copyright policy that are simple cut-and-pastes from other websites.

Signs that an article might be cut-and-pasted

There are a number of signs that an article might be cut-and-pasted. None of these are conclusive evidence, but more than one of these signs tends to be apparent in a cut-and-pasted article.

Signs of concern, but possibly innocent :

  • they are typically not wikified
  • or if they are, they are excessively linked, with every occurrence of a word or phrase made into a wiki link (as if search-and-replace had been used to insert the links)
  • they are typically submitted all at once in finished form, rather than "growing" in stages with multiple users editing
  • lacking even minor edits, such as spelling mistakes and corrections
  • the writing style is "too good to be true"


Strongly suggestive signs of cut-and-paste :

  • HTML tags that are not commonly used or valid in this Wiki, such as "HEAD", "BODY", "TITLE" and "HTML" tags. These suggest an attempt to cut-and-paste from a Web page's HTML source, rather than from the rendered page.
  • give-away phrases like "this booklet"
  • they may also have isolated or out-of-context words or phrases such as "top", "go to top", "next page", "click here", that were originally part of the navigation structure of the original website
  • They may contain non-standard characters such as Microsoft "smart quotes" (but may have been composed in Microsoft Word or something similar, and smart quotes in particular are becoming increasingly common in valid articles)
  • they often contain trademark signs (™,®) and similar typical signs of commercial text
  • ASCII art that does not render properly when copied to the Wiki (but may be a newbie who doesn't understand wiki-formatting)
  • the writing style is that of an advertisement or press release
  • has the contributor made other recent suspected copyright violations?

Dead giveaway:

  • Some copied pages even still contain the original site's copyright notice, copied intact! In this case, you can assume that they are almost certainly a copyright violation unless the poster is in fact the copyright owner, in which the burden of proof should lie on them to show that they are that person.

Checking it out

Once alerted by one or more of these suspicious signs, you can then check the article by highlighting a sentence or non-trivial sentence fragment that is unlikely to be found by chance in many documents, copying and pasting it into Google, and searching for it (the Mozilla browser is good if you do this often). You should then check the matching pages, if any, for further correspondence to the submitted article.

For extra thoroughness, you may also want to check out the "groups" option in Google, to check that the article is not copied from Usenet.

Many times an image from some other website is uploaded here under the same name. Hence if you suspect an image to be a copyright violation, you can try searching Google Images for the filename of the image to check if there are matches from other websites for the same image. Even if the image ws uploaded with a different name, a google image search for relevant search terms might help finding the original image in case of a copyright violation.

If you suspect that a page is a copyright infringement

It is not the job of rank-and-file Wikipedians to police every article for possible copyright infringement, but if you suspect one, you should at the very least bring up the issue on that page's talk page. Others can then examine the situation and take action if needed. The most helpful piece of information you can provide is a URL or other reference to what you believe may be the source of the text.

  • Remember: please don't bite the newbies -- many cut-and-paste contributors may not understand that what they are doing is wrong, and some may turn into valuable contributors if educated rather than punished. You can use the user's talk page to discuss your concerns with them.
  • Some cases will be false alarms. For example, if the contributor was in fact the author of the text that is published elsewhere under different terms, that does not affect their right to post it here under the GFDL. Material from public domain resources is sometimes republished with unclear or misleading copyright notices which may obscure the origin. An article from another language's Wikipedia might be translated and published here (bringing with it seemingly suspicious anomolies, particularly if the contibutor's understanding of English and/or wikification is limited). Also, sometimes you will find text elsewhere on the Web that was copied from Wikipedia. In these cases, it is a good idea to make a note in the talk page to discourage such false alarms in the future.
  • Please see the Wikipedia copyright policy document for what to do in difficult cases, such as where a user continues to post copyrighted material in spite of warnings.

See also: Wikipedia:Boilerplate request for permission, Wikipedia:Confirmation of permission

Personal tools
In other languages