August 21, 2007

That was a stupidity, certainly

In their neverending quest for fresh sources of filter-fooling text, spammers have apparently started to mine Esperanto translations of Russian science fiction novels.

Gene Buckley sent in this example that arrived in his inbox (and mine) a few days ago, having succeeded in fooling the local spam filters:

Subject: Tio estis stultajxo, certe, kvankam kauxzita de katastrofaj sociaj movigxoj de la dua kvarono de la antauxa jarcento.

Gene notes:

Based on what I learned more than 20 years ago (and the apparent convention that "x" means "accent previous letter"), it says "That was a stupidity, certainly, although caused by catastrophic social movements of the second quarter of the previous century." Properly, <Tio estis stultaĵo, certe, kvankam kaŭzita de katastrofaj sociaj moviĝoj de la dua kvarono de la antaŭa jarcento.>

A quick web search finds the source, apparently the fourth chapter of the (translated) novel Gravitavio «Carido» by Vjacheslav Rybakov.

The message was sent to a local mailing list that we boh subscribe to. I'm puzzled about why it got through the spam filters on our mail server, because the body of the note uses one of the usual recent text-hiding techniques, starting with the headline:


and continuing with a screenful of stuff like this

N*o_t o_n'l'y d,o'e_s t_h,i,s f+i.r*m h.a.v-e fun,damental's,
b+u_t getti*ng t'h+i*s oppor*tun_ity at t'h,e righ_t t*i.m+e*,
righ t befor'e t.h'e is w_h.a_t m+akes t'h,i+s d-e.a+l so swee*t!
T-h-i,s a grea+t opp*ortu_nity to at leas,t dou ble up!

I thought that the current generation of spam filters looked at character n-grams, among other features -- surely this style of substitution should be easily identifiable by that technique -- if you know exactly what weakness the spammers are exploiting here, please tell me.

Posted by Mark Liberman at August 21, 2007 05:45 AM