January 23, 2007

Text laundering

Yesterday I asked:

Is there a word for thesaurus-driven mis-substitution to disguise authorship? I've used the neologism "thesaurusizing" to describe the process of replacing words with fancier equivalents in order to impress readers. You could use the same word here, but the motivation is different, and it would be nice to have a word that expressed more directly the dishonesty involved.

I've gotten several excellent suggestions, including some words and phrases that I plan to start using right away.

Jim Roberts suggested:

Would "sinonymizing" be too puritanical? I suppose it's a bit cludgy as well and could be taken for an alternate spelling of "synonymizing," the term used to describe the practice by plagiarists and anti-plagiarists alike. (http://www.plagiarismtoday.com/?p=137)

Jim's link describes "synonymized plariarism", an obvious and useful phrase which was new to me (though I'm familiar with the phenomenon it describes):

... the plagiarism war is entering a new, and frightening, territory as thieves discover its usefulness in gaining search engine ranking.

One of the critical tools in this new war is synonymizing software, which is software that takes a work and modifies it using synonyms of key words, producing a work that says practically the same thing but in a way that can’t be easily detected by search engines. This aids the plagiarist by greatly reducing the odds of their copyright infringement being discovered and prevents them from absorbing the “duplicate content” penalty some believe search engines apply.

Could such synonymizing software have played a role in developing the mysteriously botched presidential biographies at goppresident.org? A key aspect of the mystery is why a multi-million-dollar political fund-raising effort would have hired such an incompetent writer to create the copy for their site. But maybe they hired a competent spammer, not an incompetent writer.

(Well, a semi-competent spammer, anyhow. There are obvious algorithms that would do a lot better than thesaurus-driven subsitutions, and I imagine that the more advanced designers of splogs and so on have thought of them.)

Brett Spyker suggested:

How about "camouplage"? Camouflage + plagiarism

That's a nice blend, with echoes of "plague" as well, but I don't think it's going to make it.

Michael Covarrubias wrote:

Of course every teacher of introductory composition has come across this doubly deceptive plagiarising technique. Not only trying to take credit for someone's work without attribution, but also covering up the tracks in case someone tries to find the source.

My wife, who teaches Spanish, has noticed a related technique when grading writing assignments. Students apparently love Babel Fish. So a story about "hanging out" with friends becomes "colgando afuera" in Spanish: hanging/suspended outside. To "drop the book" is to "gota el libro": drop(noun) the book.

I would guess that "Babel Fishing" has been used for this translating method from language A to language B. If there's to be a counterpart when "translating" from language A into language *A, might we call it synonym fishing? Or synfishing if we want to highlight the damnability of such fraud.

Synfishing is a promising combination -- perhaps too promising. It could also refer to virtual fishing games, for example; and as Michael observes, "syn" might also be "sin".

Susan Harrelson:

I would go with "thesaurism." While thesaurisizing  conjures up harmless self-aggrandizement, and even fantasizing about how impressed people are going to be, thesaurism, besides having the right ending, describes something a little reptilian and nasty.

I like Susan's suggestion as a term for thesaurus-driven misuse of words, like "works hard to ascend Medicare and Social Security" in place of "works hard to improve Medicare and Social Security".

But my favorite neologism comes from Ran Ari-Gur: text laundering. He observes that it can be applied more generally to the practice of "superficially modifying a text so as to obscure its source". Beyond thesaurisms and other word substitutions, this would also cover minor rephrasing, such as changing "the end of the draft" to "the draft's end", or "strengthening local control" to "make local control stronger"; and other forms of disguise as well. And you could do your text laundering using synonymizing software, or with old-fashioned human labor.

Is the phrase already in use, in this or some other meaning? Not to any significant extent. Google has only 124 hits for {"text laundering"}. One source uses the phrase to mean "copy as plain text":

You've heard of "money laundering" — how about "text laundering"!

How many times has this happened to you? On your Windows computer, you copy some text from one document into another; and when the text appears in the new document, the fonts are all wrong. Or, you copy some information from a Microsoft Excel spreadsheet into an e-mail message, and everything is enclosed in little boxes.

There’s an easy way to fix problems like these. Just paste the text from the original document into a Notepad document; then copy the text from Notepad into the document where you want it. It will show up as plain, unformatted text, and you can then apply any formatting that you want.

But I think that "copy as plain text" expresses this concept pretty well already, and Jeremy Gillick's "copy as plain text" add-on for Firefox, which I use frequently and enthusiastically, is much more convenient than copying through Notepad or other plain-text editor.

Others appear to use text laundering to mean things like "removing errors from the text of non-native writers", "removing possibly-offensive material", or "removing possible privacy violations". These are all plausible meanings, but none of them are very common. And none of them make as nice an analogy to "money laundering", it seems to me, as Ran Ari-Gur's suggestion: disguising the source of plagiarized text by removing the string-identity that would permit easy discovery by web search.

Posted by Mark Liberman at January 23, 2007 05:53 AM