Text Mining for plagiarism at arXiv

      Comments Off on Text Mining for plagiarism at arXiv

There’s an article in Nature that shows another yet another way technology is transforming scholarly communication—this time, quantifying the incidence of plagiarism.

Nature 444, 524-525 (30 November 2006) (subscription required)

Examining 280,000 documents in the arXiv archive, researchers report that blatant deception is quite rare (667 cases or about 0.2% of the archive). Substantial text reuse (defined as significant matching text but at least one matching author on both papers) was quite common (approximately 10% of the archive) but this can be explained by the fact conference abstracts and later publications of the same work are often in the database.