February 06, 2005

Never pronouncing east Thursday?

The web site of El Sol de Zacatecas, a Mexican newspaper, appears to serve up automatically translated English versions of Spanish-language wire service reports. Whatever MT system they're using, it needs a better model of the contingencies of Spanish/English correspondence. According to the English version of the AFP story on Mel Martinez' historic first official use of Spanish on the floor of the Senate, his words "constituted the first speech in foreign language never pronouncing in the Senate by one of their members" ("el primer discurso en idioma extranjero jamás pronunciado en el Senado por uno de sus miembros"), attributing this to "the official transcription published east Thursday" ("la transcripción oficial publicada este jueves").

Now, the meaning of a text may sometimes be hard to pin down, and the correctness conditions of a language may sometimes be difficult to reduce to simple prescriptions, but I hope we can all agree that in this case the English translation is syntactically incorrect, stylistically aberrant, semantically incoherent and also not an accurate reflection of the content of the Spanish original. I'm assuming that this was a computer-generated translation rather than simply a bad translation, because I can't imagine that any well-intentioned human being with access to a Spanish-English dictionary and a working knowledge of the English language would translate "este jueves" as "east Thursday".

As Geoff Pullum has pointed out, nearly all strings of words are ungrammatical. Also semantically incoherent and stylistically aberrant. The College Board should have no trouble finding genuine problems for its "sentence error" questions, even without resorting to MT output.

The (original) Spanish version of the story:

Primer discurso en idioma extranjero de un congresista en el Senado de EEUU
Por: AFP
Publicado: Viernes, 4 de Febrero de 2005 8:39 AM

WASHINGTON - Unas palabras en español pronunciadas por un congresista estadounidense el miércoles, constituyeron el primer discurso en idioma extranjero jamás pronunciado en el Senado por uno de sus miembros, según la transcripción oficial publicada este jueves.

El discurso fue pronunciado por el senador de origen cubano Mel Martinez durante el debate sobre la designación de Alberto Gonzales como fiscal general.

"El juez Gonzales es uno de nosotros, el representa todos nuestros sueños y esperanzas para nuestros hijos...", afirmó Martinez, uno de los dos legisladores de origen hispano en el Senado estadounidense.
Martinez explicó que con sus palabras quiso dirigirse directamente a los electores hispanos para defender a Gonzalez, de origen mexicano, acusado por la oposición demócrata de formular una doctrina jurídica permisiva que habilitó las torturas a prisioneros.

Según el historiador Donald Ritchie, citado por el diario The New York Times, fue la primera vez en la historia que un senador pronunció en Cámara un discurso en un idioma que no sea inglés.
Estados Unidos no tiene lengua oficial, y la minoría hispana es la más numerosa del país, por lo que muchas organizaciones políticas, oficinas y servicios a los consumidores ofrecen servicios bilingües.

The English version:

First speech in foreign language of a congressman in the Senate of the U.S.A.

The WASHINGTON - words in Spanish pronounced by a American congressman Wednesday, constituted the first speech in foreign language never pronouncing in the Senate by one of their members, according to the official transcription published east Thursday. The speech was pronounced by the senator of Cuban origin Mel Martinez during the debate on the designation of Alberto Gonzales like general prosecutor. "judge Gonzales is one of us, represents all our dreams and hopes for our children...", affirmed Martinez, one of both legislators of Hispanic origin in the American Senate.

Martinez explained that with his words it wanted to go directly to the Hispanic voters to defend to Gonzalez, of Mexican origin, accused by the democratic opposition to formulate a permisiva legal doctrine that qualified the tortures to prisoners.

According to the historian Donald Ritchie, mentioned by the newspaper The New York Times, it was the first time in the history that a senator pronounced in Camera a speech in a language that is not English.
The United States does not have official language, and the Hispanic minority is most numerous of the country, reason why many political organizations, offices and services to the consumers offer bilingual services.

[In fairness to the anonymous MT system, the English version is readable, at least to the point of permitting the probable nature of the event to be inferred. But "east Thursday"? Give me a break. ]

[Update: Google's Spanish/English translation is identical, presumably because Google licenses the same MT technology. Whatever that technology is, we can be pretty sure it doesn't use sensible statistical techniques, since in deciding whether "este" should be translated as "this" or "east", when followed by "jueves", a large enough corpus gives a reasonable approximation to common sense just by counting: 693,000 whG for the English-language string "this Thursday", vs. 8,740 for "east Thursday". More sophisticated models are available, but should not be needed in this case.]

[Update #2: Ray Girvan says that the translation in this case (also the same as Altavista's Babelfish) was done by a SYSTRAN engine, as can be seen here. SYSTRAN, as I understand it, is an old-fashioned rule-based transfer system; though what rule maps "este" to "east" in front of "jueves" I can't imagine.]


Posted by Mark Liberman at February 6, 2005 04:44 PM