February 28, 2006

"Zwicky" on the Slut-o-Meter

Mark Liberman has reported here on the Franusic-Smith Slut-o-Meter and on Jean Véronis's little studies in web pornometry using this tool, which assigns a sluttiness value to an expression based on how big a reduction in webhits is caused by turning on Google SafeSearch (or some similar filter) in searches on that expression.  Mark also calculated this metric for the last names of several of the Language Loggers: "Pullum", with a value of only 6.82%, then in descending order, "Liberman", "Zwicky", and "Thomason".

I am stunned to discover that "Zwicky" gets only what Mark calls a "positively demure" 4.35%.  Since I'm moderately famous for producing racy stuff on soc.motss, ADS-L, and Language Log too, this is at first baffling.  How could this happen?

Well, first of all, I'm not the only Zwicky you can Google up, and the others, decidedly more decorous than I am, might well be swamping my plain talk.  Actually, my guess was that Fritz Zwicky, the astrophysicist (and distant cousin of mine), would be the most-referred-to Zwicky, then Elizabeth Daingerfield Zwicky, the computer scientist (and my daughter), then me, and then the two Zwicky poets, Jan (the Canadian) and Fay (the Australian).

I was wrong.  I win, but by the tiniest of margins over Elizabeth, who holds a substantial lead over Fritz.  Then comes the dark horse, Richard Zwicky of Metamend Software, then Jan, and then a whole bunch of Zwickys, including the antipodal Fay, with less than a thousand hits.  In the #3 position, between Elizabeth and Fritz, is the Zwicky thread company.  (The Zwicky muesli folks and the Zwicky chocolate people have tiny numbers.)  Here are the raw figures:

1.    Arnold M. Zwicky: 28,100
       Arnold Zwicky: 17,900
       Zwicky, Arnold: 879
       total: 36,879

2.    Elizabeth Zwicky: 22,300
       Elizabeth D. Zwicky: 13,300  
       Zwicky, Elizabeth: 501
       Daingerfield Zwicky: 30
       total: 36,131

3.    Zwicky + thread: 23,900

4.    Fritz Zwicky: 20,800
       Zwicky, Fritz: 684
       total: 21,484

5.   Richard Zwicky: 15,100
      Zwicky, Richard: 80
      total: 15,180

6.   Jan Zwicky: 9,930
      Zwicky, Jan: 3,880
      total: 13,810

There are under a thousand hits for (in descending order): Karl Z., Chuck Z., Fay Z., Eric Z., Michael Zwicky Hauschild, Stefan Z., Monica Steinmann-Zwicky, and Charles  Z.

Total hits from all these sources: 150, 561, roughly half of the 316,000 hits for plain old Zwicky.  I'm not sure what the other half are about, but they probably include references to the galaxy Zwicky 18 (named for Fritz), occurrences of "Zwicky" with no first name, and so on.

I am surprised at the strength of my net presence.  I always tell people that I'm extremely famous in a very tiny world.

Now looking at what happens when I turn SafeSearch on, I see that the reductions in webhits are quite modest.  For "Arnold M. Zwicky", the number goes from 28,100 to 26,300, and for "Arnold Zwicky", from 17,900 to 16,800. 

Here, I think, we can see some evidence for a second contribution to the low sluttiness value for "Zwicky":  most of this stuff is about me, not by me.  Notice the large number of hits that include my middle inital, which I don't use in e-mail or in postings; a lot of these hits are from bibliographies and similar sources.

So, despite my occasional use of fuck in postings and my even more occasional references to cocksucking (note that I'm improving my sluttiness score a tiny bit by saying this here), most of what I post passes through SafeSearch; and, anyway, most of the time my name comes up on the net, it's in material about me, not by me; and, in addition, there's tons of stuff by or about my unslutty relatives.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 05:22 PM

Cinderella's slippers: glass or squirrel fur?

I'm a huge fan of Snopes. And Cinderella's "glass slipper" is one of the most striking and absurdly effective details in the whole history of storytelling. So it's with a heavy heart that I must now register some doubt about Snopes' defense of the glass slipper against the claim that it's actually a linguistic mistake for an original fur slipper:

Claim:   Cinderella's slippers were made of fur in the original versions of the fairy tale, but they became glass slippers in later versions as the result of a mistranslation.

Status:   False.

This came up in conversation a few days ago, so I looked into it a bit. And alas, though mistranslation is not the culprit, it seems pretty clear to me that the slippers must originally have been fur, and turned into glass through a misunderstanding.

Snope's discussion of the point is clear and well researched, as usual:

The standard explanation for Cinderella's famous footwear is that it is the result of a mistranslation, someone having mistaken pantoufle de vair, fur slipper, for pantoufle de verre, glass slipper, when making an English version of Charles Perrault's Histoires ou contes du temps passé avec des moralités (1697). (The title of Perrault's collection — in English, Stories or Tales of Olden Times with Morals — also is known as Tales of My Mother Goose, after a line that appears on the frontispiece of the original, Contes de ma mère l'oye.)

The principal difficulty with the standard explanation is that pantoufle de verre appears in Perrault's original text, so this is definitely not a question of mistranslation. Nor does it seem to be a case of mishearing, with Perrault writing verre for vair when transcribing an oral account, since vair, a medieval word, was no longer used in his time. (Vair, variegated fur, from the Latin varius, varied, also is a root of miniver, originally menu vair, small vair, which referred initially to the fur — perhaps squirrel — used as trim on medieval robes and later was applied to the prized ermine, or winter weasel fur, on the ceremonial robes of peers.)

Indeed, the original text of Perrault's tale "Cendrillon ou la petite pantoufle" does use pantoufles de verre ("glass slippers") not once but three times (see below), so it's clearly neither a mistranslation nor a (simple) misprint. However, the argument against mishearing seems to me to be extremely weak. Though I'm not any sort of expert in the history of French, a bit of poking around on Gallica suggests that vair was still used to describe a glamorous and valuable kind of squirrel fur, in the context of talk about the olden days, quite a bit later than 1697. If the word had indeed gone out of everyday usage, then that creates exactly the sort of context in which a creative mishearing would be likely.

Specifically, in the Analyse raisonnée de l'histoire de France by François-René de Chateaubriand (1768-1848), a discussion of medieval society says that

Les chevaliers prenaient les titres de don , de sire , de messire et de monseigneur . Ils pouvaient manger à la table du roi ; eux seuls avaient le droit de porter la lance, le haubert, la double cotte de mailles, la cotte d'armes, l'or, le vair, l'hermine, le petit-gris, le velours, l'écarlate ; ils mettaient une girouette sur leur donjon ; cette girouette était en pointe comme les pennons pour les simples chevaliers, carrée comme les bannières pour les chevaliers-bannerets.

and a description of Edward's invasion of France explains that

Rien n'échappa, par mer et par terre, aux ravages de ce monarque, qui se disait roi des Français, et qui venait pour régner sur des Français ; par mer, tous les vaisseaux, depuis le plus grand navire jusqu'à la plus petite barque, furent pris et réunis à la flotte anglaise ; par terre, toutes les villes et les villages furent saccagés et brûlés. Barfleur succomba la première, et, quoiqu'elle se fût rendue sans coup férir, elle n'en fut pas moins pillée elle perdit or, arpent et chers joyaux ; il se trouva si grande foison de richesses, que compagnons n'avoient cure de draps fourrés de vair .

Here are the contexts in Perrault where "pantoufles de verre" is used:

(1) ...sa maraine ne fit que la toucher avec sa baguette, et en même tems ses habits furent changez en des habits de drap d' or et d' argent, tout chamarrez de pierreries ; elle luy donna ensuite une paire de pantoufles de verre, les plus jolies du monde.

(2) Elle se leva, et s' enfüit aussi legerement qu' auroit fait une biche. Le prince la suivit, mais il ne put l' attraper. Elle laissa tomber une de ses pantoufles de verre, que le prince ramassa bien soigneusement. Cendrillon arriva chez elle, bien essouflée, sans carosse, sans laquais, et avec ses méchans habits, rien ne lui estant resté de toute sa magnificence qu' une de ses petites pantoufles, la pareille de celle qu' elle avoit laissé tomber.

(3) Quand les deux soeurs revinrent du bal, Cendrillon leur demanda si elles s' estoient encore bien diverties, et si la belle dame y avoit esté ; elles luy dirent que oüy, mais qu' elle s' estoit enfuye lorsque minuit avoit sonné, et si promptement qu' elle avoit laissé tomber une de ses petites pantoufles de verre, la plus jolie du monde ; que le fils du roy l' avoit ramassée, et qu' il n' avoit fait que la regarder pendant tout le reste du bal, et qu' assurément il estoit fort amoureux de la belle personne à qui appartenoit la petite pantoufle.

Note that the OED glosses vair as

A fur obtained from a variety of squirrel with grey back and white belly, much used in the 13th and 14th centuries as a trimming or lining for garments.

while the Dictionnaire de l'Académie Française (Huitième Édition, 1932-35) has

VAIR. n. m. Il se disait autrefois d'une Fourrure blanche et grise. Un manteau, des pantoufles de vair.
Il ne s'emploie aujourd'hui qu'en termes de Blason, pour désigner Une des fourrures de l'écu, figurée par de petites cloches alternées d'azur et d'argent, disposées de telle sorte que la pointe des pièces d'azur est opposée à la pointe des pièces d'argent. Tel porte de vair.

The fr.wikipedia entry for Cendrillon says that

Beaucoup de gens affirment que la pantoufle de Cendrillon était de vair et non pas de verre. L'édition de 1697 des contes de Charles Perrault s'intitule bien "la pantoufle de verre", donnée traditionnelle dans le folklore, puisqu'on retrouve des pantoufles de verre ou cristal dans les contes catalans, écossais, irlandais. Il n'empêche que Balzac et Littré voulaient, au nom de la raison, corriger cette graphie en vair (petit-gris, écureuil). Cendrillon irait alors danser en chaussures fourrées. Cette correction n'apporte cependant pas toute satisfaction, car outre que l'on ne fourra jamais par le passé de petit-gris des chaussures, de tels souliers ne semblent pas adaptés à la danse. Sagement, il faut conserver ces poétiques et merveilleuses pantoufles de verre.

[Update: Chris Waigl argues for a different conclusion:

... this seems to be a case of erudition run wild. Balzac's and Littré's (a nineteenth-century man of letters, author of an important dictionary), to be precise. They stipulated the verre/vair confusion. But "pantouffles de verre" (though in various spellings) are in Perrault's tale, and also in Catalan, Irish and Scottish versions. The Grimm brothers' has golden slippers -- not much better than glass, I'd think, to dance in all night. Wikipedia tells me that there are over 400 versions from all over the world, the oldest from China.

I'm not entirely convinced. The fact that the Grimm bros. have "golden slippers" resonates with de Chateaubriand's observation that only knights "avaient le droit de porter ... l'or, le vair, l'hermine, le petit-gris, le velours, l'écarlate" ("had the right to wear gold, vair, ermine, gray squirrel fur, velvet, scarlet"). Note that "glass" is not on the list (though in fairness, I guess that glass was also a luxury item in medieval times). And I wonder what the collection date of the Catalan, Irish and Scottish versions is. I believe that there has been much more diffusion of folk tale details in recent centuries than is commonly assumed. Unless the other versions are from the era of 1700 -- which seems unlikely, since the collection of such tales was more typically a late 18th or 19th-century activity -- it seems as just as likely that Perrault's invention spread to other cultures as that there was a common pre-Perrault source for the idea of "glass slippers".

I'm sympathetic to fr.wikipedia's conclusions that "il faut conserver ces poétiques et merveilleuses pantoufles de verre": "we need to preserve these poetic and marvelous glass slippers." At this point, though, their survival is guaranteed. The small question in front of us is how they were born. ]

[Update #2: Trevor suggests that the slippers were really amber!

DH Green (Language and History in the Early Germanic World) notes that both Pliny and Tacitus used glaesum/glesum to refer to amber, despite being aware of the difference in manufacture between it and glass. This conscious confusion was based on the transparency of both materials, and in the competition between products manufactured thereof–native beads and Roman glass objects.

The relevance of this is to be found a while later in L0pe de Vega’s La Dorotea. Published in 1632, 65 years before Perrault, it recreates the author’s passionate and disastrous fling with actress Elena Osorio in the early 1580s and has the heroine worrying of having to trade in her amber slippers for crudely bound sandals (“Si don Bela quiere, tú verás estos pies que celebrabas trocar las zapatillas de ámbar en groseras sandalias de cordeles”).

Similarly, Quevedo in El mundo por de dentro (1612) has amber slippers being used to disguise sweaty feet (“a veces los pies disimulan el sudor con las zapatillas de ámbar”). Amber slippers were still available in Regency England, and are evoked in contemporary advertising for Miss Natasha Perfume (“this princess of perfumes makes her way on Amber slippers and Lily négligés. A warm and slow burning temptress that stands on her own”).

I, too, have made my way on lily négligés and hope one day to acquire a pair of amber slippers, which must surely have been what Cinderella aspired to wear. Too bad her fairy godmother skimped and gave her the cheaper, glass alternative.

I'm impressed by the quotations from Pliny, Tacitus, Lope de Vega and Quevado. And I certainly wish Trevor well in his quest to complete his wardrobe. But I wonder whether those Regency slippers had any real connection with the hard translucent fossil resin, rather than being merely amber in color. After all, the same outfit features an "amber crape dress" and a "dami-turban formed of plain amber satin". And I'm I'm just a bit skeptical that slippers carved from actual fossil amber would stand up to dancing with or without foot sweat.]

Posted by Mark Liberman at 08:37 AM

February 27, 2006

Far from Slutty

Hmm. Poser rates a spectacularly negative -70.8% on the Slut-o-Meter. Does this mean that ghit counts are unreliable?

Posted by Bill Poser at 08:51 PM

Pornometry and the Slut-o-meter

Jean Véronis has recently carried out two short studies in pornometry (one and two). The data comes from counting web hits with and without Google's SafeSearch (or similar porn filters) turned on. The ratio for different words varies quite a bit, which forms the basis of the Slut-o-meter created by Joël Franusic and Adam Smith. This is a frivolous little web app that evaluates "promiscuity" based on a formula that they give as

Jean was tickled to find that his last name rates a spectacular 61.94% on the Slut-o-meter. In contrast, for example, "Pullum" rates only 6.82%, "Liberman" an even stodgier 4.61%, "Zwicky" is a positively demure 4.35%, and "Thomason" is the lowest I've had time to try, at 1.16% -- though the authors of the Slut-o-meter observe that negative values are sometimes encountered:

If you're wondering why some subjects have a negative promiscuity, well, you're not alone. In general, this happens when the number of safe results is greater than the number of unsafe results (or if there are no unsafe results whatsoever). We're not quite sure why this is the case, but we believe that Google is not telling us the truth.

Google finds 628,000 hits for {"language log"} without SafeSearch, and 620,000 with it, for a rating of 1.27%, while {"language hat"} is slightly racier at (418000-395000)/418000 = 5.5%.

Jean's investigations are more serious, using these counts as a (narrow but meaningful) window on the continuing war between black-hack SEOs and search engines. The high number for "Veronis" requires some special explanation, but it's broadly in tune with a trend that Jean observes for Google (and other American-based search engines) to filter a larger percentage of non-English (well, anyway French) pages. Jean's explanation:

It seems to me that the explanation for these differences is twofold. Firstly, the search engines undoubtedly go too far: since they are unable to work with the level of delicacy required (it’s difficult, I admit!), they have a tendency to overfilter, perhaps using criteria that go beyond simple lexis (as is clearly the case for the European Constitution with Google). This is a general trend, particularly with Google: under pressure from the web-surfing public, filters were put in place very quickly, and apparently, the only way to make a filter work without a particularly discriminating linguistic technology behind it is to bring out the biggest ladle you can find and skim off a lot more than just the cream. I have mentioned this type of problem before when discussing splogs (here and here).

The other part of the explanation comes from the fact that, in terms of linguistic competences, the different search engines vary considerably. I’ve already had cause to mention that Google doesn’t seem to be very good at handling languages other than English (for instance here). The results above would seem to confirm this. Conversely, we can see how Exalead, which is a French search engine, is better with French than with English. Yahoo! is more or less stable from one language to the other.


Posted by Mark Liberman at 06:56 PM

Multiple choice

Three scholars in social psychology (Barry Schwartz, Hazel Rose Marks, and Alana Conner Snibbe) contributed a column to the Sunday New York Times Magazine under the headline, "Is Freedom Just Another Word for Many Things to Buy?" The writers summarize their findings on how class (or at least educational attainment) shapes American perspectives on the concepts of "freedom" and "choice." It's not always easy boiling down research conclusions for a popular audience, but something definitely seems amiss in this paragraph:

In a recent study with Nicole Stephens at Stanford University, we asked college students to pick "three adjectives that best capture what the word 'choice' means to you." A higher percentage of those who had parents with a college education said "freedom," "action" and "control," while more of those whose parents had only a high-school education responded with "fear," "doubt" and "difficulty."

Okay, but what percentage of those college students know what "adjectives" are?

Actually, I doubt the college students are at fault for the apparent mismatch between the adjectival request and the nominal responses. I bet that most of them really did respond with adjectives like "free," "tough," or "afraid." In fact, the students might even have been presented with a checklist of adjectives from which to select. But then the researchers likely did what social scientists usually do with survey results based on open-ended questions: they lumped these responses into several different conceptual categories. And since we tend to label concepts with nouns, that's the part of speech the research team used for coding and categorizing.

Unfortunately, somewhere in the process of writing and editing of the column, this adjective-to-noun move must have gotten lost in the shuffle. My guess is the writers originally said that the students responded with "words having to do with 'freedom,' etc.," but that extra verbiage got edited out for space considerations. Of course, it's still possible that the students provided a range of responses including adjectives, nouns, verbs, clauses, and emoticons. But I'm willing to give them the benefit of the doubt here, and I'd even wager that the questioners used an adjective checklist (as behavioral psychologists are wont to do), taking the part-of-speech hassle out of the students' hands entirely.

Why am I so ready to blame the editors and not the students for misrecognizing nouns as adjectives? (It's not that I have an axe to grind with the editors at the Times Magazine, even though I've needled them in the past.) The reason is simply that well-educated adults regularly get confused on this precise point of grammar. Nouns that don't denote substantive things sometimes don't seem "noun-y" enough to qualify for that part of speech. Hence Jon Stewart can tell a graduating class that the word "terror" is "not even a noun," while Timothy Noah can write on Slate that words like "humbug" and "poppycock" are adjectives. So it wouldn't be surprising if an editor looking to tighten up the writers' prose made a quick redaction that ended up treating such nebulous terms as "freedom" and "fear" as adjectives rather than nouns.

The writers of the column assert that "Americans are increasingly bewildered — not liberated — by the sheer volume of choices they must make in a day." Add choices of grammatical description to that bewildering list.

Posted by Benjamin Zimmer at 01:08 AM

February 26, 2006

Language Log has the story

Over the past few months, I've been watching with amusement -- tinged with dismay -- as various splogs pick up fragments of Language Log posts, either directly or from quotations and references elsewhere on the web. Many of these use a paragraph-collage technique that must be harder for search engines to detect and reject than the random word salad technique in the "en language log splitter" quote that Ben Zimmer gave.

For example, on May 28, 2005, Geoff Pullum posted a funny story under the title: "Gone to get pants: a handwriting recognition story". The (genuine, non-splog) Mirabilis.ca immediately picked this up under the title Gone to get pants:

A certain driver ought to follow up on those handwriting improvement tips. Language Log has the story: Gone to get pants.

Since then, the mirabilis.ca item has been reprinted, along with its title, in a number of pants-themed splogs (which seem to be oriented, in the end, towards improving the page rank of some porn sites). These splogs included "Helpful Tight Pants Blogs", "Our Pants Info", "Internet Plastic Pants Articles", and so on. These particular splogs seem to specialize in finding whole posts about pants and then concatenating them -- thus the post at "Our pants info" amalgamated the "Language Log has the story" with a half-dozen other pants-related fragments, e.g. this mutated item taken from a Joe Lavin column:

There Are Tiny Robots in My PantsDon't worry. That's not some strange pickup line. It's the truth. My new pants from The Gap contain special stain-resistant particles. Yes, somewhere on a molecular level, these nano-particles are working hard to get mandy bright whatever I happen to spill on them. I always suspected there was something exciting happening in my pants, and now I have proof. [where "mandy bright" has been substituted for "rid of", and links to another collage-splog whose theme is people and places named "Lynn".]

Then a non-clothing splog called "Our best used car resources" picked the same fragment up in a dissociated form:

Gone directly from pants
A certain Advances to follow up on those {link25} Poulter tips. Language Log has the Pants Gone to get pants....

Posted by Mark Liberman at 05:22 PM

It's all grammar, redux

Why do many people hate grammar so much?  Possibly because (as I noted back in 2004) they think "grammar" embraces everything to do with language, so long as it's regulated, and that's an awful lot of stuff.  Back then it was punctuation that was at issue.  This time it's naming conventions, in a 2/25/06 entry "Grammar question I should know the answer to" (marked as "OT", that is, off-topic) on a message board devoted to baby care:

I am addressing envelopes. My mom's best friend is married; it is her second marriage and after divorcing her first husband, she went back to using her maiden name and kept it after marrying her current husband. So, I know her husband is "Mr. John Smith" but should she be "Mrs. Jane Jones" or "Ms. Jane Jones"? She is married, but is not really "Mrs. Jones" since her husband is "Mr. Smith", so should I use "Ms."?

I'm inviting them to a formal event, so I would really like to use salutations, not just their first and last names. TIA!

(My thanks to Elizabeth Daingerfield Zwicky for pointing me to this site.  People who replied recommended "Ms. Jane Jones and Mr. John Smith", by the way.)

Back in 2004, I reported:

To PITS, People In The Street, "grammar" embraces pretty much everything having to do with language, spoken or written, so long as it's regulated in some way: syntax, morphology, word choice, pronunciation, politeness, discourse organization, clarity and effectiveness, spelling, punctuation, capitalization, bibliographic style, whatever.

I suppose choosing the correct name form comes in the "whatever" category.  In any case, you'd expect to find the answer to the poster's question in an etiquette book, not a grammar book.

For a while, I entertained the idea that what the ordinary-language non-technical word "grammar" means to PITS is 'everything you were taught about in English class, minus the literature', but I see now that even this broad characterization is insufficient.

In a somewhat different context, it's too broad.  What I have in mind here are the many multiple-choice quizzes on grammar that you can find in magazines and on websites, for instance the "How grammatically sound are you?" on-line quiz, which I took a couple of years ago in preparation for teaching a sophomore seminar on prescriptivism, and which Bill Poser has reported on here.  I got a perfect score -- like Bill, I am, ahem, a Grammar God -- but then I did a lot of second-guessing of the test designers.

About half of the test items have nothing to do with syntax; they're about spelling, punctuation, choosing the right variant for an inflectional form of a verb (lie vs. lay), or choosing the right word (bring vs. take, shall vs. will).  The other half concern word choices that have something to do with syntax (relative who vs. that), government of case forms of pronouns, subject-verb agreement, and modifier placement (only, split infinitives).  So, on the one hand, "grammar" takes in lots of things that have little to do with the system or structure of the language, while, on the other, it's remarkably constrained.  In fact, this quiz pretty much reproduces the form and content of the "grammar" sections of multiple-choice standardized tests.

So what DO we call the domain that takes in spelling, punctuation, choice of inflectional form, word choice, syntactic usage, and actual grammar?  "Usage" is a bit too broad; in fact, usage dictionaries are reluctant to discuss more than a few common misspellings, since there are just too many of them.  "Usage and style" takes in even more.  (Some day I'm going to have to post about the very many senses of "style" that make it so hard to figure out what a book that claims to be about style in language is going to be about.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:35 PM

en language log splitter

Anyone who has used a blog search engine or set up a blog feed knows that spam has thoroughly infested the blogosphere. Spam blogs, or splogs, have been running rampant for the past year or so, especially since last October's "splogsplosion" (to use a lovely double-blend coined by Tim Bray of Sun Microsystems). As with email-borne spam, much of the text of splogs is randomly generated, or at least generated according to a set of esoteric rules known only to the splogger.

By way of example, entering "language log" on Google Blog Search turns up a recent post appearing on a splog with the perfectly Dadaist name of "Separates on Crustal Gerald." Like so much spam text these days, the post reads like found poetry:

en language log splitter
February 25th, 2006

Captain combination upward language log space buy becoming chapter thirty. Tomorrow satisfied draw lie language log castle whispered. Given act wish log splitter establish discover bent park. Pleasant mood lungs funny log splitter splitter Mike variety soon uncle. Wonder flew promised en language tonight half fifty load. Brain fallen fort drawn en language development catch deeply tonight imagine. Attached concerned. Fifty yard log splitter ago Autumn forgot curious obtain. Mississippi constantly en language accept baseball beside. Indicate stop deeply vessels log splitter passage. Exact report american splitter ourselves completely grandfather language log thread continued black. Nearby exclaimed earlier record orange en language meet.

The two embedded links lead to pages full of Google ads for retailers selling — surprise — log splitters. The two pages are similarly packed with spam text, and one of them even shares the rubric of "en language log splitter" with the referring splog. If I had to guess, I would think the splogger's text-generating algorithm loosely relies on collocational frequencies to determine how to string words along. So the target phrase of "log splitter" is preceded by "language" because of the frequency of the collocation "language log" on the Web. (Now there's some dubious fame for you.) The "en" preceding "language" is more difficult to explain, though I see that the collocation "en language" is favored by spammers for some reason.

(I'm loath to direct any additional traffic the sploggers' way, so you needn't bother clicking on the above links. Better yet, you can follow the advice given by Wired from the October splogsplosion and report offenders to the proper authorities. Splog Reporter is fighting the good fight, accumulating examples of splogs to help search engines identify and exclude likely suspects. And sploggers like dear old Crustal Gerald who take advantage of Google ads should be reported to Google's AdSense program.)

[Update: Bruce Rusk sends along some useful reflections on "en language":

The "en" is somehow parasitized from the undergirdings of the web: "en" is the generic code for English (vs "en-us" for US English, etc.), and often appears in HTML tags and in URLs. Googling "language en" will reveal lots of URLs with the string "Language=en" and of course Google ignores the "="; many such hits are from pages available in multiple languages, with the "en" indicating a directory containing English-language file. Now why that translates into the collocation "en language" and why an "en language" search yields mostly sploggy sites I am unsure, but it must have something to do with how Google finds similar pages. ]

Posted by Benjamin Zimmer at 01:42 AM

New evidence for animal language...

speak no evil
While Mark has been checking out monkey business from the comfort of his top floor suite in Language Log plaza, I just happened to spend a couple of nights with my family at the home of some chimpanzees.

Well, it's not the chimps' home in a human legal sense, but they do live in some beautifully appointed cages on the grounds. The following conversation took place between my wife, Moni, and one of the chimps' owners and carers, Karon. You may or may not take it as a summary of the ongoing (never ending?) animal language debate:

Karon: Some of these chimps can sign.
Moni: Oh really! What do they say?
Karon: I don't know. I don't speak sign language.

Posted by David Beaver at 01:22 AM

February 25, 2006

Learnable and unlearnable patterns -- of what?

A couple of weeks ago, I suggested that some much-discussed recent research on "grammar learning" by monkeys might involve sensitivity to relations of sameness and difference in strings, which can't be represented as grammatical constraints (of whatever complexity) on sequences of specific items. Mark Seidenberg pointed out to me that this same issue came up in a 1999 discussion in Science following a paper by Gary Marcus et al. Links to the discussion are available on Mark's website, on a page of readings for a presentation on Language and the Mind -- the relevant items are #5, #6, #7.

The first paper is Marcus, G.F., Vijayan, S., Bandi Rao, S., & Vishton, P.M. (1999). "Rule learning by seven-month-old infants", Science 283, 77-80. The abstract:

A fundamental task of language acquisition is to extract abstract algebraic rules. Three experiments show that 7-month-old infants attend longer to sentences with unfamiliar structures than to sentences with familiar structures. The design of the artificial language task used in these experiments ensured that this discrimination could not be performed by counting, by a system that is sensitive only to transitional probabilities, or by a popular class of simple neural network models. Instead, these results suggest that infants can represent, extract, and generalize abstract algebraic rules.

The approach was to familiarize infants with syllable sequences having either the pattern ABA or ABB, where ABA might correspond to "ga ti ga" or "li na li". ABB might correspond to "ga ti ti" or "li na na". Then the infants were tested for their relative interest in sets of new ABA or ABB patterns, by measuring how long they looked at a flashing light associated with the source of the sound. In another experiment, AAB patterns were compared with ABB patterns. The point was that "a system that was sensitive only to transitional probabilities between words could not account for any of these results, because all the words in the test sentences are novel and, hence, their transitional probabilities (with respect to the familiarization corpus) are all zero".

This led to a set of responses: "Do Infants Learn Grammar with Algebra or Statistics?" / Letters from: Seidenberg & Elman, Negishi, Eimas, Marcus, (1999) Science 284, 433.

The letter from Seidenberg and Elman argued that

... the conclusion by Marcus et al. that the infants had learned rules rather than merely statistical regularities is unwarranted. ... these "grammatical rules" created other statistical regularities. AAB, for example, indicated that a syllable would be followed by another instance of the same syllable and then a different syllable. Thus, in the pretraining phase, the infant was exposed to a statistical regularity governing sequences of perceptually similar and different events.

The letter from Peter Eimas observed that

... there is evidence that 7-month-old infants can discriminate objects by means of the abstract relations, same or different.

citing D. J. Tyrrell, L. B. Stauffer, L. B. Snowman, Infant Behav. Dev. 14, 125 (1991).

In a later issue, we get Altmann, G.T.M. (1999), "Rule learning by seven-month-old infants and neural networks", Science 284, 875a. This paper shows that a previously developed PDP network for learning sequential patterns -- published in 1995 -- behaved similarly to the 7-month-olds in the cited paradigm:

Like the infants studied by Marcus et al., our networks successfully discriminated between the test stimuli. The conclusions by Marcus et al. stated in the report are premature; a popular class of neural network can model aspects of their own data, as well as substantially more complex data than those in the report. The cognitive processes of 7-month-old infants may not be so different from statistical learning mechanisms after all.

It seems to me that it's not very helpful to try to draw a distinction between learning "abstract rules" and learning "statistics", without being very precise about what kind of "rules" and what kind of "statistics" are at issue. Marcus et al. tried to create patterns that "statistics" couldn't learn -- and indeed it's true that counting syllable n-grams would not distinguish their patterns, as long as disjoint sets of syllables are used in familiarization and test. But for exactly the same reasons, no formal grammar constraining sequences of a specific terminal vocabulary could solve their problem either. In contrast, a learner who pays attention to patterns of same and different items can learn their distinctions by trivial methods -- either "statistical" or "grammatical".

In the case of the Fitch & Hauser paper that we discussed two years ago, the patterns ABAB and ABABAB, vs. AABB and AAABBB were the same in both familiarization and testing phases. (More exactly, A and B represent classes of syllables, with A being one of {ba di yo tu la mi no wu} spoken by a single female speaker, while B is one of {pa li mo nu ka bi do gu} spoken by a single male speaker; strings of syllables are formed by random selection from the respective sets without replacement.) As a result, the monkeys (and the undergraduates) might have been learning either sequences of item-classes (probably just "female voice" vs. "male voice") or sequences of same vs. different item-classes or (most likely) both. And they might have learned (or failed to learn) the patterns either by attending to "statistics" or to "rules" -- though every algorithm that I can think of, in this case, could be described using either word, depending on taste.

Posted by Mark Liberman at 04:27 PM

No snowclone left behind

"On rare occasion, a political phrase becomes a template for a variety of causes," writes William Safire in his Sunday "On Language" column. Safire presents two examples of such "template phrases," the first of which is "No X left behind," modeled on "No child left behind." (Safire says the phrase was popularized in 1993 by Marian Wright Edelman of the Children's Defense Fund, but he gives ultimate credit to Ronald Reagan in his 1983 remarks to the National Council of Negro Women: "[We] have begun to outline an agenda for excellence in education that will leave no child behind.") The second "template phrase" is "We are all X now," harking back to Milton Friedman's 1965 declaration, "We are all Keynesians now," and echoed by Le Monde after 9/11: "We are all Americans now." The latest iteration is "We are all Danes now," an expression of solidarity with the Danish cartoonists who notoriously caricatured the Prophet Muhammad.

Gee, wouldn't it be great if someone came up with a catchy designation for these "template phrases"? I dunno, something like "snowclones"? Apparently this felicitous LanguageLogism is good enough for the Times of London but not for the Times of New York.

[Update #1: Safire missed a much earlier example of "We are all X now." Some time around 1888 the British Liberal leader Sir William Harcourt is credited as saying, "We are all Socialists now." That would be the obvious model for Friedman's comment about Keynesians, but the recent declarations of transnational empathy ("We are all New Yorkers / Americans / Londoners / Danes now") seem much more evocative of John F. Kennedy's "Ich bin ein Berliner" speech.]

[Update #2: Two more predecessors for "We are all X now" (though the first lacks the "now")...

"We are all republicans &mdash we are all federalists." — Thomas Jefferson in his First Inaugural address, 1801 (sent in by Language Hat)

"We are all sons of bitches now." &mdash Trinity A-bomb test director Kenneth Bainbridge to Richard Feynman, 1945, according to James Gleick's Genius (sent in by Blake Stacey)

Posted by Benjamin Zimmer at 03:14 PM

February 24, 2006

The House of Fame

I didn't know the marvelous passage from Charles Babbage's Ninth Bridgewater Treatise that Ben Zimmer quotes at the end of his post on the ancient-pottery-as-sound-recording urban legend. But Babbage's argument that "The air itself is one vast library, on whose pages are for ever written all that man has ever said" reminded me of a similar idea expressed in Chaucer's House of Fame. The narrator is being instructed (in his dreams) by a giant eagle who has grabbed him "with his grimme pawes stronge / Within his sharpe nayles longe" and is carrying him off to what will turn out to be a special place, a sort of Elephants' Graveyard of the spoken word.

The eagle explains, in a modern-English translation:

... speech is sound, else no man could hear it. Now hearken to what I shall teach you. Sound is naught but broken air: and every speech that is uttered, aloud or privily, good or ill, is in substance nothing but air. For as flame is but lighted smoke, sound is broken air. But this may be in many ways, of which I will tell you two; as sound that comes of pipe, or of harp when a pipe is blown strongly, the air is twisted and rent with violence; lo, this is mine interpretation. And when men smite harp-strings, heavily or lightly, lo, the air breaks apart with the stroke. Even so it breaks when men speak, thus you have learned what speech is.

Next now I will teach you how every word or noise or sound, though it were piped by a mouse, must needs through its multiplication come to the House of Fame. I prove it thus: take heed, now, by experiment; for if now you throw a stone into water, you know well that anon it will make a little round spot like a circle, peradventure as broad as a pot-lid; and right anon you shall see how that wheel will cause another wheel, and that, the third and so forth, friend, every circle causing another wider than itself was. And thus from small circle to great, each circumscribing theother, each caused by the other's motion, but ever increasing till they go so far that they be at both brinks. Although you cannot see it from above, these circles spread beneath the water as well, though you think it a great marvel. And whoever says that I vary from the truth, bid him prove the reverse. And even thus, of a certainty, every word that is spoken, loud or privy, first moves a circle of air thereabout, and from this motion anon another circle is stirred. As I have proved of the water, that every circle causes a second, even so is it with air, my dear brother; each circle passes into another greater and greater, and bears up speech or voice or noise, word or sound, through constant increase till it come to the House of Fame; take this in earnest or no; it is truth .

And in the original:

762 ... speche is soun,
763 Or elles no man mighte hit here;
764 Now herkne what I wol thee lere.
765 `Soun is noght but air y-broken,
766 And every speche that is spoken,
767 Loud or privee, foul or fair,
768 In his substaunce is but air;
769 For as flaumbe is but lighted smoke,
770 Right so soun is air y-broke.
771 But this may be in many wyse,
772 Of which I wil thee two devise,
773 As soun that comth of pype or harpe.
774 For whan a pype is blowen sharpe,
775 The air is twist with violence,
776 And rent; lo, this is my sentence;
777 Eke, whan men harpe-stringes smyte,
778 Whether hit be moche or lyte,
779 Lo, with the strook the air to-breketh;
780 Right so hit breketh whan men speketh.
781 Thus wost thou wel what thing is speche.
782 `Now hennesforth I wol thee teche,
783 How every speche, or noise, or soun,
784 Through his multiplicacioun,
785 Thogh hit were pyped of a mouse,
786 Moot nede come to Fames House.
787 I preve hit thus -- tak hede now --
788 Be experience; for if that thou
789 Throwe on water now a stoon,
790 Wel wost thou, hit wol make anoon
791 A litel roundel as a cercle,
792 Paraventer brood as a covercle;
793 And right anoon thou shalt see weel,
794 That wheel wol cause another wheel,
795 And that the thridde, and so forth, brother,
796 Every cercle causinge other,
797 Wyder than himselve was;
798 And thus, fro roundel to compas,
799 Ech aboute other goinge,
800 Caused of othres steringe,
801 And multiplying ever-mo,
802 Til that hit be so fer ygoo
803 That hit at bothe brinkes be.
804 Al-thogh thou mowe hit not y-see,
805 Above, hit goth yet alway under,
806 Although thou thenke hit a gret wonder.
807 And who-so seith of trouthe I varie,
808 Bid him proven the contrarie.
809 And right thus every word, y-wis,
810 That loude or privee spoken is,
811 Moveth first an air aboute,
812 And of this moving, out of doute,
813 Another air anoon is meved,
814 As I have of the water preved,
815 That every cercle causeth other.
816 Right so of air, my leve brother;
817 Everich air in other stereth
818 More and more, and speche up bereth,
819 Or vois, or noise, or word, or soun,
820 Ay through multiplicacioun,
821 Til hit be atte House of Fame; --
822 Tak hit in ernest or in game.

Chaucer and Babbage have got hold of a simple idea -- that all sounds ever made ought to be somewhere, somehow recoverable -- which is pretty obviously false, but not easy to show to be impossible.

Chaucer flourishes some obviously funky renaissance science, about how sound, being light, will always rise ("fyre or soun, Or smoke, or other thinges lighte, Alwey they seke upward on highte"), and about how everything has a particular place that it naturally tends to seek ("every river to the see Enclyned is to go, by kinde. ... Thus every thing, by this resoun, Hath his propre mansioun, To which hit seketh to repaire"). Babbage's argument is in tune with the physics and mathematics of his own era:

these aerial pulses ... are ... demonstrated to exist by human reason ; and, in some few and limited instances, by calling to our aid the most refined and comprehensive instrument of human thought, their courses are traced and their intensities are measured. If man enjoyed a larger command over mathematical analysis, his knowledge of these motions would be more extensive ; but a being possessed of unbounded knowledge of that science, could trace every minutest consequence of that primary impulse.

I guess that classical statistical mechanics is enough to demonstrate that this requires more exactness of measurement than could ever plausibly be achieved; and quantum theory presumably turns this from a practical to a theoretical impossibility; but I haven't seen a rigorous demonstration of either of these ideas. This reminds me a bit of Olber's paradox, where the correct argument for a simple conclusion is not trivial to find.

And intuitive notions of what "mathematical analysis" can and can't do are not very reliable either -- how many people would have imagined that a two-dimensional Fourier transform of the human body's re-radiation of appropriately shaped radio-frequency pulses in a magnetic field gradient would produce a tomographic image of the internal anatomy, i.e. MRI?

Anyhow, I've sometimes wondered whether some wag at the NSA might have put up a sign in their computer room: The House of Fame.

Posted by Mark Liberman at 09:46 AM

February 23, 2006

A phonographic phony

There's a Belgian video clip (in French) that's been making the rounds, purporting to show an amazing new archaeological find. Here is a description of the video from The Raw Feed:

Belgian researchers have been able to use computer scans of the grooves in 6,500-year-old pottery to extract sounds -- including talking and laughter -- made by the vibrations of the tools used to make the pottery.

The link to the video got passed around online quite a bit, even showing up on the moderated Linguist List. The possibility of hearing voices from 6,500 years ago would obviously be an unprecedented breakthrough in the study of ancient languages. Unfortunately, the whole thing is a hoax, as acknowledged by the clip's creator, Bilge Sehir. On Sehir's website the video is described under the title "Poisson d'avril de journal televise" ("April Fool's Day newscast"), indicating that the story was concocted last year for Belgian television as an April Fool's prank. Sehir is a videographer who evidently came up with the stunt as an art project of sorts (an apparent accomplice is Phillipe Delaite, an art historian at the Liège Royal Academy of Fine Arts who poses in the video as a pipe-smoking archaeologist).

One tipoff for the careful viewer is a Latin phrase at the end of the video's voiceover: "Credo quia absurdum," or "I believe it because it is absurd." And the story no doubt has an appealing absurdity to it, so much so that it has been kicking around in one form or another since at least the late 1960s, with speculative roots back to the 1830s.

A search on the Usenet archive finds numerous discussions of the idea of phonographic pottery, in such newsgroups as sci.archaeology, sci.archaeology.moderated, sci.lang, sci.physics, soc.history.what-if, rec.audio.tech, misc.writing.screenplays, and most informatively, alt.folklore.urban. (Googling on paleoacoustics finds even more online discussion.) Everybody seems to have a vague memory of an old science-fiction plot relying on the possibility of unlocking ancient voices from the grooves of pottery or some other manmade creation. But the earliest speculations along these lines come from two science journals, both in 1969.

In the Feb. 6, 1969 issue of the New Scientist, the idea was discussed in the humorous "Daedalus" column by David E. H. Jones. (The column, in which Jones puts forth various harebrained yet somehow plausible scientific schemes, later moved to Nature, where it continued until recently.) As recounted in The Inventions of Daedalus, Jones wrote:

[A] trowel, like any flat plate, must vibrate in response to sound: thus, drawn over the wet surface by the singing plasterer, it must emboss a gramophone-type recording of his song in the plaster. Once the surface is dry, it may be played back.

As it turns out, an independent scholar named Richard G. Woodbridge III had already been working quite seriously on this matter. Woodbridge wrote in to "Daedalus," saying that the replication of his idea must be "one of those very, very odd coincidences":

How very odd, that I should have sent to Nature, a paper (dated 13 January, 1969) entitled "Acoustic Recordings from Antiquity"; which paper was perfunctorily rejected as being 'too specialized'. In my paper I noted my early experiments (1961) in the recording of sound (music, voices, etc.) on clay pots and on paint strokes applied to canvas (as in oil paintings) and the successful reproduction of such sound using a crystal phono pickup and a spatulate, wooden 'needle'.

Woodbridge would eventually find an outlet for his paper, in Proceedings of the IEEE, Vol. 57(8), August 1969, pp.1465-6. In truth, the "paper" is no more than a letter describing his experiments with making pots and paintings that could then be "played back." From analyzing the grooves of a pot, he claimed to have extracted the humming sound of the pottery wheel. Even more improbably, he said he could discern the word "blue" from an analysis of a painting's blue patch. ("Acoustic Recordings from Antiquity" is available in PDF form for those with an institutional subscription to IEEE Xplore.)

From these rather dubious origins, paleoacoustics was then explored in science fiction, notably in Gregory Benford's short story "Time Shards," appearing in the 1979 anthology In Alien Flesh. In the story, pottery from medieval England is analyzed to reveal conversations between the potter and his assistant in Middle English. (I haven't read Benford's story, but Kari Kraus on her Wordherder blog Accidentals and Substantives found it "really unsatisfying.")

More recently, the story line has shown up twice on American television. An episode of The X-Files ("Hollywood AD," Apr. 30, 2000) revolves around a bowl supposedly made in the presence of Jesus when he was resurrecting Lazarus. When the "Lazarus bowl" was played back, the rumors went, it had the power to raise the dead. And just last year CSI: Crime Scene Investigation had a similar plot ("Committed," Apr. 28, 2005), where the CSI team is able to recover voices from a pot made by a mental patient. On Digg, one of the forums discussing the Bilge Sehir hoax, a screenwriter for the CSI episode named Uttam Narsu wrote in to share this footnote he included in the original story treatment:

Recovering sound from pottery was suggested by Richard Woodbridge in "Acoustic Recordings from Antiquity", Proceedings of the I.E.E.E. 1969, pp. 1465-6). Years later, similar experiments were made in Gothenburg, Sweden, by archeology professor Paul Åström and acoustics professor Mendel Kleiner (see The Brittle Sound of Ceramics - Can Vases Speak? by Mendel Kleiner and Paul Åström, Archeology and Natural Science, vol. 1, 1993, pp. 66-72, Göteborg: Scandinavian Archaeometry Center, Jonsered, ISSN: 1104-3121). They were able to recover some sounds.

Narsu said he included the footnote "so the staff was sure it was science, not science-fiction." And sure enough, in the script for the episode, Grissom explains: "In the '60s, experiments were done on clay pots and painted canvas. Scientists were able to ferret out sounds that were captured during the creative process in the clay and the paint." So Woodbridge's tenuous claims live on in popular science-y fiction.

There are other paleoacoustical predecessors that do not necessarily rely on pottery or paintings to recapture the sound of voices from the past. For instance, the BBC aired a movie in 1972 called "The Stone Tape" about an old room made of a type of stone that could store both sounds and images. And a commenter on Accidentals and Substantives noted a much earlier forerunner for such speculation, from Charles Babbage's Ninth Bridgewater Treatise, written by the computing pioneer in 1837 (discussed by John Picker in Victorian Soundscapes). Here is an excerpt from Chapter 9, "On the Permanent Impression of Our Words and Actions on the Globe We Inhabit":

The pulsations of the air, once set in motion by the human voice, cease not to exist with the sounds to which they gave rise. Strong and audible as they may be in the immediate neighbourhood of the speaker, and at the immediate moment of utterance, their quickly attenuated force soon becomes inaudible to human ears. The motions they have impressed on the particles of one portion of our atmosphere, are communicated to constantly increasing numbers, but the total quantity of motion measured in the same direction receives no addition. Each atom loses as much as it gives, and regains again from other atoms a portion of those motions which they in turn give up. The waves of air thus raised, perambulate the earth and ocean's surface, and in less than twenty hours every atom of its atmosphere takes up the altered movement due to that infinitesimal portion of the primitive motion which has been conveyed to it through countless channels, and which must continue to influence its path throughout its future existence.

But these aerial pulses, unseen by the keenest eye, unheard by the acutest ear, un-perceived by human senses, are yet demonstrated to exist by human reason ; and, in some few and limited instances, by calling to our aid the most refined and comprehensive instrument of human thought, their courses are traced and their intensities are measured. If man enjoyed a larger command over mathematical analysis, his knowledge of these motions would be more extensive ; but a being possessed of unbounded knowledge of that science, could trace every the minutest consequence of that primary impulse. Such a being, however far exalted above our race, would still be immeasurably below even our conception of infinite intelligence. But supposing the original conditions of each atom of the earth's atmosphere, as well as all the extraneous causes acting on it to be given, and supposing also the interference of no new causes, such a being would be able clearly to trace its future but inevitable path, and he would distinctly foresee and might absolutely predict for any, even the remotest period of time, the circumstances and future history of every particle of that atmosphere. Let us imagine a being, invested with such knowledge, to examine at a distant epoch the coincidence of the facts with those which his profound analysis had enabled him to predict. If any the slightest deviation existed, he would immediately read in its existence the action of a new cause ; and, through the aid of the same analysis, tracing this discordance back to its source, he would become aware of the time of its commencement, and the point of space at which it originated. Thus considered, what a strange chaos is this wide atmosphere we breathe! Every atom, impressed with good and with ill, retains at once the motions which philosophers and sages have imparted to it, mixed and combined in ten thousand ways with all that is worthless and base. The air itself is one vast library, on whose pages are for ever written all that man has ever said or woman whispered. There, in their mutable but unerring characters, mixed with the earliest, as well as with the latest sighs of mortality, stand for ever recorded, vows unredeemed, promises unfulfilled, perpetuating in the united movements of each particle, the testimony of man's changeful will. But if the air we breathe is the never-failing historian of the sentiments we have uttered, earth, air, and ocean, are the eternal witnesses of the acts we have done.

A very enticing absurdity indeed.

[Update: Tensor, said the Tensor has also blogged about the phony phonography, helpfully providing the text of the 1969 Woodbridge letter. Additionally, Bill Poser sends along a citation for another speculative paper on paleoacoustics:

Heckl, W. M. (1994) "Fossil voices", in Krumbein, W. E., Brimblecombe, P., Cosgrove, D. E. and Staniforth, S. (eds.) Durability and change: the science, responsibility, and cost of sustaining cultural heritage. Chichester and New York: John Wiley & Sons. Appendix 3, pp. 292-8.

An abstract to an article by Heckl with the same title can be found here.]

[Update, 2/28: Many people have said they have vague recollections of a 1950s science-fiction television show using a paleoacoustic plot line. Doug Throp and Barbara Zimmer helped pinpoint the likely source of these recollections: an episode of Science Fiction Theater called "The Frozen Sound," originally airing nationwide in late July 1955. Here is the plot summary as it appeared in the July 31, 1955 Washington Post:

Also, Martin Ternouth sends along another sci-fi variation:

One fiction reference you may not know is a short story by the English writer JG Ballard called The Sound Sweep. It's about a boy who is deaf and dumb and is employed to hoover away the sounds of everyday activity so that the audio patina of the centuries is left pure in cathedrals and concert halls. I understand that it was also the inspiration for the 1970's hit "Video Killed the Radio Star" by The Buggles - which was the first video ever played on MTV. ]

Posted by Benjamin Zimmer at 06:55 PM

W's Conundrum

This all started with bit of presidential mis-morphology:

And I want those who are questioning it to step up and explain why all of a sudden a Middle Eastern company is held to a different standard than a Great British [sic] company.

W came up with the wrong answer to the question "what's the form for Great Britain as a modifier?", but the right answer is by no means obvious.

As a modifier of company, the English language sometimes prefers to use place names in their base form, instead of using the corresponding adjective. Thus the string "Pennsylvania company" has 16,078 MSN hits ("mhits"?), while "Pennsylvanian company" has only 30. However, this is emphatically not true in certain cases. Consider "a French company" with 66,807 mhits, compared to "*a France company" with only 132. Likewise we expect "a German company", "a Russian company", "a Chinese company" -- not "a Germany company", "a Russia company", "a China company".

So there seems to be a general rule: when the place name is the name of a country, always use the adjectival form. But where the name of a country doesn't have any adjectival form, we're normally fine with using the name as a modifier: "a UK company" has 97,286 mhits, and there's likewise nothing wrong with "a U.S. company" (80,732 mhits), or "a Cayman Islands company" (2,510 mhits).

But what about Great Britain? There's no directly corresponding adjective, it seems -- "Great British" is (alas for W) a bad joke. And yet the base form of the name is not a plausible modifier: "a Great Britain company" garners 0 mhits, though Google manages to find a paltry 49.

One hypothesis would be that words like UK and US can be zero-derived adjectives as well as nouns, while Great Britain can't, presumably because of blocking by British. But "Great British" is also ruled out, perhaps because British is in fact the (irregular) adjectival form of "Great Britain" as well as "Britain", or perhaps for some other reason -- though Adj+N place names generally accept adjectivization of the N part with good grace: "West Virginian", "Northern Irish", "East Anglian", etc.

An alternative hypothesis would be that this is just another set of quasi-regular facts about English morphology.

On the subject of weird facts, note this asymmetry in mhits:

(with periods) mhits (without periods) mhits with/without ratio
"a U.K. company" 4,867 "a UK company" 97,286 0.05
"a U.S. company" 80,732 "a US company" 73,898 1.09

In the context "a __ company", U.S. is 22 times more likely to retain its periods than U.K. is. What's up with that?

It's not a Europe/U.S. difference, or at least not entirely, since American initialisms vary widely in this measure.

(with periods)
(without periods)
with/without ratio

It might well involve a desire to avoid confusion with a capitalized version of the pronoun us, but again, the difference between ACLU and NCAA must be simply a matter of convention, so it's hard to be sure that the treatment of U.S. isn't purely conventional as well.

And one more strange fact:

"a U.S. company"
"A US company"
"a U.S.A. company"
"a USA company"


Personally, I feel that U.S.A. (with or without periods) is not a possible modifier in this case. And the web agrees with me, 96.2% of the time anyhow. Why in the world should there be anything wrong with "a USA company" as an instantiation of the "a <country name> company" pattern?

(If you're not already sick of the quirks of toponymic morphosyntax, take a look at the 5/15/2004 post "All your base are belong to which lexical category".)

[To forestall anguished emails, let me say that I'm aware that the correct solution to W's conundrum would have been to use the form British (or maybe to pick another phrasing, like "a company based in Great Britain".) The point is that this particular fact of usage apparently has to be learned by rote -- if you try to solve the puzzle by rule, or by intelligent inference from other examples, you're very likely to get an ambiguous or flat-out wrong answer. Many of W's reported morphological miscues are over-generalizations, apparently substituting intelligent guesswork for rote memorization.]

Posted by Mark Liberman at 04:01 PM

Trent Reznor Prize nomination

Tim Macdonald has nominated a candidate for the Trent Reznor Prize for Tricky Embedding.

It's a bit of dialog from R.K. Milholland's 2/15/2006 Something Positive strip. Davan MacIntire comes in the door and has this exchange with his father Fred:

Davan: Hey Dad. I'm back from dropping Peejee off at the airport. I'm just gonna go get the mail and then we can head over to the book store.
Fred: I already got the mail. Your boss mailed ya your paycheck.
Davan: Really? Where is it?
Fred: Mom's holdin' on to it for ya.
Davan: Uh, Dad... I don't really see how -- [sees the check under his mother's funeral urn] Damn it, Dad! You've gotta stop using Mom's urn to mark where you left things! It's just ... really weird.
Fred: Yeah, you're right. I'd hate to make payin' a man an idiotic sum of money to burn my wife into a fine powder and stick her in a $400 bowlin' trophy 'cuz she requested it into somethin' weird.
Davan: Dad, I have some wonderful news. I'm positive it won't be Alzheimer's that kills you.

In the last line, I think that Davan is making a hyperbolic joke about his father's tendency to irritate him: "keep talking like that and I'll kill you". But in fact there's some evidence connecting high values of "idea density" and "grammatical complexity" with resistance to Alzheimer's.

Posted by Mark Liberman at 03:19 PM

Ride and smile

[Guest post by Ben Yagoda]

The most extravagant linguistic virtuoso I’ve recently encountered is Hannah Teter, a 19-year-old Vermonter. Teter won the Women's Halfpipe gold medal at the Olympics last week, listening through her iPod to a band called Strive Roots ("They’re super good," she would tell the Today Show’s Matt Lauer) as she took big air and made amazing 900-degree spins.

Afterwards she faced the obligatory questions at the obligatory press conference. She gave anything but the obligatory answers. Borrowing what she wanted from African-American, surfer and hippie vernacular -- and throwing into the mix the gnomic lingo of her sport -- she talked the way she snowboards: with full-throttle abandon and improvisational joy.

Asked about the injury she overcame to win, she said, "I had a little thing going with my knee. I nursed it and iced it and got in the pool and did yoga and lit the candles at night and zoned in. Ain’t no big thang."

And her thoughts before she made her first of two runs?

"I was standing up there, and Gretchen [Bleiler, her teammate, who won silver] threw down so hard. It was like, 'Whoa, I'm going to have to step it up…' It was like, 'Represent, U.S.'
" I wanted to try to go as big as I could and tweak my grabs hard.
" I just wanted to ride and smile. That’s what’s going on. That’s what’s rolling."

At one point, the Times reported, Teter’s ponytail came undone. She took a hat from a police officer and put it on. "I’m superstoked," she explained.

Teter’s first run was so good that she could have just coasted in the second one and still taken gold. "My coach said, 'OK, victory lap,' and I was like, 'No way, victory lap,'" she said. "I wanted to step it up and do my thing." And she did—surpassing her first score by a significant margin.

A reporter asked the ultimate cliché question, Olympic division: what will you do with your medal?

Teter: "Hang it on the wall. Is that a good thing to do with it? I'll probably take it up to our house in Vermont. I'll probably just staple it to the wall."

Finally it came time to bring the press conference to an end. Teter nailed that one too. "Gotta go pee in a cup, guys," she said by way of goodbye.


[Guest post by Ben Yagoda]

Posted by Mark Liberman at 11:14 AM

February 22, 2006

Not your usual modifier attachment problem

Collectors of infelicities in writing are especially fond of modifier attachment problems, which often lead to entertaining misinterpretations.  In one classic form of the problem, a modifier M is intended to apply to a phrase P1 preceding it, but ends up being understood as applying to a nearer phrase P2, as in this example from the Palo Alto Daily News of 4/19/05, in the Atherton police blotter:

(1) First block of Mosswood Way, 7:52 p.m.: A resident reported a large animal in a tree with tall and pointed ears.

The writer's intention was that the M with tall and pointed ears should modify the P1 a large animal, but unfortunately another modifier of P1, in a tree, intervenes between P1 and M and provides a nearer P2, a tree, for M to attach to, and for a moment we are visualizing a tree with tall and pointed ears.

Often, the seductive P2 is in fact inside P1, at the end of it, and we get "low attachment" rather than the intended "high attachment", as in this example from "Top officials to have salaries made public" in the PADN 7/7/05:

(2) Last night, Domanico presented a history of the hospital district and how the hospital is governed to dispel misconceptions that the public might have had.

In this case, the M to dispel misconceptions that the public might have had was intended to apply to the whole clause Domanico presented a history of the hospital district and how the hospital is governed, but is likely to be (temporarily) understood as applying just to the clause the hospital is governed.

Now, for a change, a modifier attachment problem of the reverse sort, where M is intended to apply to P2 (at the end of P1) but ends up being mistakenly understood, for a moment, as applying to all of P1:

(3) Ruisdael's canvases of Bentheim Castle, for instance, which he [Ruisdael] saw on a trip to the Dutch-German border, are invariably acclaimed...

(From Sanford Schwartz, "White Secrets", a review of a Jacob van Ruisdael exhibition, in the New York Review of Books 2/9/06, p. 8.  Thanks to Johannes Fabian for pointing out the example to me.)

Modifier attachment problems are often treated as a type of dangling modifier, on the grounds that they exemplify a failure to put modifiers adjacent to (or at least very close to) the phrases they modify.  This is not a satisfactory analysis for some types of dangling modifiers, and it's generally unsatisfactory for modifier attachment problems, which illustrate a very different sort of problem.

[Digression: I suspect that the reason so many advice writers lump modifier attachment problems together with classic dangling modifiers is that they want to group writing problems under a few umbrella admonitions, like "Keep modifiers as close as possible to the things they modify".  In my experience, these abstract generalizations are not particularly useful for writing students, who have to learn to distinguish the various subtypes and find fixes for them separately.]

In Type 1 modifier attachment problems, as in (1), the difficulty is that there are two parallel postmodifiers for P1, and they cannot BOTH be adjacent to it.  If we move with tall and pointed ears up next to a large animal, we just end up with pointed ears in a tree instead of a tree with tall and pointed ears.  If we want to avoid the possibility of misinterpretation, some more significant rewording is called for, say:

A resident reported (that) a large animal with tall and pointed ears had been sighted in a tree.

In Type 2 modifier attachment problems, as in (2), both P1 and P2 are already adjacent to M; the question is whether M attaches to the bigger and higher P1, or to the smaller and lower P2.  One way to ensure that M attaches to P1 rather than P2 is to make it a premodifier of P1:

Last night, to dispel misconceptions that the public might have had, Domanico presented a history of the hospital district and how the hospital is governed.

[Digression: In much of the advice literature, Type 2 modifier attachment problems are viewed as having P1 further from M than P2, so that the problem could be fixed by moving M up and making it adjacent to P1.  This view is made possible by the common assumption, in this literature, that modification is a relationship between a modifying expression and a modified WORD, not phrase.  In (2), the infinitival modifier to dispel misconceptions that the public might have had would be seen as modifying not a clause, but just the head verb of that clause, presented.  The simplest fix would then be to move the infinitival modifier up to follow the verb immediately, perhaps setting it off by commas:

Last night, Domanico presented, to dispel misconceptions that the public might have had, a history of the hospital district and how the hospital is governed.

This version, with the verb separated from its direct object by considerable intervening material, is awkward at best.  Further modification is needed.

There are good reasons for rejecting the idea that modification is, in general, a relationship between a modifying expression and a WORD (rather than the whole phrase this word heads), but exploring these reasons would take me too far afield.]

Now, finally, to (3).  Why is this different from (1) and (2)?  Why do we have a moment of contemplating the absurd idea that Ruisdael saw his own canvases of Bentheim Castle on a trip to the Dutch-German border, when clearly it must have been the castle that he saw on this trip (and then painted)?

The source of the problem is the parenthetical for instance.  If we remove it, there's no problem:

Ruisdael's canvases of Bentheim Castle, which he saw on a trip to the Dutch-German border, are invariably acclaimed...

Now, low attachment, which was the problem in (2), gives the intended interpretation, without fuss or mess.  (High attachment is still possible, but unlikely to be made by most readers.)

Why is the for instance such a troublemaker here?  Because it sets off the whole NP Ruisdael's canvases of Bentheim Castle and focuses on the canvases denoted by this NP, so that the reader is at first likely to take the relative clause to be adding information about the canvases.

Such a subtle effect.  Throwing in a parenthetical, in particular the parenthetical for instance, which is so useful in organizing discourses, tilts parsing in the wrong direction.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:09 PM

Richard Grant White

Just off the phone with Erin McKean, talking Oxford University Press business, during which I suggested that someone should write about the life and works of Richard Grant White, the great American grammar ranter of the 19th century.  White turns out to be an even more interesting character than I'd imagined.

I came across RGW in the "Brief History of English Usage" essay at the beginning  of MWDEU.  From p. 9a:

    The most popular of American 19th-century commentators was Richard Grant White, whose Words and Their Uses, 1870, was... compiled from previously published articles.  [The book went through dozens of editions.]  He did not deign to mention earlier commentators except to take a solitary whack at Dean Alford for his sneer at American English.  His chapters on "misused words" and "words that are not words" hit on many of the same targets as [Edward S.] Gould's chapters on "misused words" and "spurious words," but White's chapters are longer.  Perhaps his most entertaining sections deal with his denial that English has a grammar, which is introduced by a Dickensian account of having been rapped over the knuckles at the age of five and a half for not understanding his grammar lesson.  White, who was not without intellectual attainments--he had edited Shakespeare--was nevertheless given to frequent faulty etymologizing, and for some reason he was so upset by the progressive passive is being built that he devoted a whole chapter to excoriating it.

To my mind, RGW's chapter on the progressive passive is definitely the high point of the book. Delicious stuff.  I do a little performance about it for my sophomore seminars on prescriptivism.

(I am indebted to Elizabeth Daingerfield Zwicky for finding a copy of Words and Their Uses for me a few years ago.)

RGW was not only a Shakespearean scholar and a loose cannon on the grammar front, but also clearly a man of culture.  He was a cellist, and founded a string quartet (named after him) that survived into the 1930s.

In addition, he was the father of Stanford White, who is famous for being the great architect of the Gilded Age in America and for having been shot dead in 1906, on the roof garden of Madison Square Garden (which White designed), by Harry Thaw, whose wife, Evelyn Nesbit Thaw, White had been having an affair with.  (See Ragtime for a fictionalized version of some of this story.)

RGW came back to my attention last week when I finally laid hands on a copy of J. Lesslie Hall's English Usage (1917) -- more on Hall and his book in a later posting -- and discovered that Hall spends a lot of space critiquing RGW, not at all kindly; one of Hall's longer sections is on the progressive passive, in fact.

In any case, RGW deserves a book.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 06:02 PM

Linguistic enforcement, Canadian style

When the Organisation Internationale de la Francophonie wanted to ensure that French was strictly implemented as one of the official languages of the Winter Olympics, they dispatched just the right type of person to do the job: Lise Bissonnette, a noted writer, journalist, and library director from Quebec. What better place to find an enforcer of public French usage than a province where business owners are fined if their commercial signs don't have French lettering that is at least twice as large as any English lettering?

AP and Reuters detail how Bissonnette has been making the rounds at Turin as La Francophonie's Grand Témoin, or "Great Witness." Bissonnette makes sure that every Olympic sign, announcement, and document appears in French as well as English (the other official language of the Games) and Italian (the language of the host country). And it seems that Bissonnette has been so successful in her linguistic monitoring that Quebec-style signage is the norm: AP reports that in the main media center, "all the signs are in French, with the English and Italian written in smaller letters below." (This despite the fact that French is spoken by only about 3 percent of the world's population.)

As Bissonnette told Reuters, it's all done by the book, just as it is in Canada (where the French and English languages by law enjoy "equal rights and privileges"):

"The opening ceremony was very much in the three languages," she said, referring to the English, French and Italian used by the announcers and on TV screens. "It was 'a la canadienne', rigorously calculated."

But as in Canada, where despite the language laws only 18% of all citizens are bilingual, the insistence on the public display of French doesn't do much to stem the tide of Anglophone hegemony. To Bissonnette's chagrin, English is still the spoken language of choice among the international participants (except, she says, at ski events). And she has no control over the English-only signs from Olympic sponsors, or even the English articles from the official Olympics news service.

Even among locals seeking to boost the "romantic" image of Turin/Torino, French gets no respect: the city's motto chosen for the Games appears only in English: "Passion Lives Here."

Posted by Benjamin Zimmer at 12:15 PM

Omit offensive reading

According to The Chronicle of Higher Education (no link; it's behind a subscription wall), change may be coming to the higher ed biz in Arizona. Personal taste may be introduced, through legislation, as grounds for a student to decline a reading assignment without having seen the material in question:

College students in Arizona may be able to opt out of required reading assignments they consider personally offensive, under a bill approved on Wednesday by the State Senate's Higher Education Committee. The measure would allow students to decline assignments that "conflict with the student's beliefs or practices in sex, morality, or religion." Critics say the legislation is too broad and could undermine the integrity of courses.

That's going to come as a blessed relief to those many students who are still being told to read Strunk and White's vile collation of stupid advice and false claims about grammar in The Elements of Style, isn't it? The work is unquestionably offensive, as I have so often pointed out on Language Log (random examples here). And I'll write a letter saying so for any Arizona student who otherwise might be under threat of having to look at it. Don't put up with being told to read anything offensive. (Sex, morality, and religion are all considerations here, by the way. Trust me. To see that I'm right you would of course have to read it, and if you're a college student in Arizona that would undercut the whole point of the planned legislation, so don't.)

Posted by Geoffrey K. Pullum at 10:35 AM

Hey buddy, can you spare an adjective?

There's some recent evidence that President George W. Bush really does believe in morphological regularization of toponymic adjectives:

In other words, we're receiving goods from ports out of the UAE, as well as where this company operates. And so I, after careful review of our government, I believe the government ought to go forward. And I want those who are questioning it to step up and explain why all of a sudden a Middle Eastern company is held to a different standard than a Great British [sic] company. I'm trying to conduct foreign policy now by saying to people of the world, we'll treat you fairly. And after careful scrutiny, we believe this deal is a legitimate deal that will not jeopardize the security of the country, and at the same time, send that signal that we're willing to treat people fairly. [emphasis added]

(This transcript, with its [sic] and all, is from the White House web site. I've expressed some skepticism about earlier examples, but this is surely an authoritative source.)

In this case, I'd like to point out that our president has highlighted a genuine linguistic problem. In the first place, the British Isles have got the most confusing nomenclature around. There are at least 15 names of major overlapping political and geographical entities here, ignoring all the counties and bailiwicks and islands and the like. But the real problem is the endemic shortage of adjectives. Of the 15 names, 8 have no adjectival form, as far as I can tell. One (Scotland) has three different adjectival forms: Scots for the language and (mostly) the people; Scotch for the local distilled liquor; Scottish for everything else, more or less. There are four other (ambiguous) adjectives, all irregular formations with -ish or similar endings: British, English, Irish, Welsh. But the large-scale formal political entities centered in London -- United Kingdom, Great Britain -- are entirely bereft of corresponding adjectives, except for the jokey UKish and the irregular, ambiguous and confusing pair British and Britannic.

Name Type Adjective
1. British Isles geographical: includes everything in this table no specific form
2. United Kingdom of Great Britain and Northern Ireland political are you kidding?
3. United Kingdom political: short form of 2 none
4. U.K. political: shorter form of 2 none (though UKish is sometimes seen as a joke)
5. Great Britain geographical: the big island no specific form
6. Great Britain political: ((England + Wales) + Scotland) no specific form
7. Britain an informal term for 6 or 2 British or Britannic (also used 1, 5, 6 and sometimes 2, 3, 4)
8. England and Wales political: one of the United Kingdoms (?) none
9. England A nation constituting half of the Kingdom of England and Wales, occupying the southeastern 2/3 of 5; or short for 8 English
10. Scotland A constituent part of the United Kingdom, in the northern part of 5 Scots or Scottish, depending
11. Wales A constituent part of the United Kingdom, in the southwestern part of 5; less independent than Scotland since it doesn't have its own legal system Welsh
12. Republic of Ireland political: the southern part of 15
(but see note below: this is apparently the offical English "description" for the state whose official name is Éire
13. Northern Ireland political: province of 2, northern part of 14 none (?)
14. Ireland political: short form of 12 Irish
15. Ireland geographical: the western island Irish

If we go into more details, it gets worse. Is there an English adjectival form for the Bailiwick of Jersey? The Wikipedia entry for Jèrriais says:

Jèrriais is often called "Jersey French" or "Jersey Norman French" by English-speakers (who lack an adjective for Jersey in the English language) and "jersiais" or "normand de Jersey" by French-speakers. Care should be taken to distinguish between Jèrriais and the Jersey Legal French used for legal contracts, laws and official documents by the government and administration of Jersey. For this reason, some prefer using the term "Jersey Norman" to avoid ambiguity and to disassociate the language with French. [emphasis added]

I believe that this modifier shortage is a specifically English problem, not to be blamed on the other inhabitants of the British Isles. The Irish and Welsh have got their adjectives, the Scots have frugally retained at least three of them (leaving Gaelic out of it), and even the inhabitants of the Bailiwick of Jersey are plentifully supplied. But over the past few centuries, the English have been creating a bewildering agglomeration of half-digested acquisitions and new organizational initiatives -- a sort of political Enron -- while completely neglecting their duty to supply these entities with adjectives. The NGOs are nowhere to be seen; U.S. unilateralism is out of style, so an adjectival Marshall Plan is not in the cards; this is clearly a case for U.N. intervention.

[Seriously, I think the standard approach is to use the form British to refer to people, companies etc. based in Great Britain. (Or should that be the United Kingdom of etc.?) And of course the British Empire, with its current residue, was built by Scots and Irish and Welsh as well as English people, not to mention the Hessians and Sikhs and so on. But it's the English language that's lacking, here, and it's easy to see how a United Statesish president could slip up on this one.]

[Update: Several people have written to straighten me out on various aspects of this question. For example, Aidan Kehoe pointed out

"Northern Irish" in adjectival usage exists -- c.f. {http://google.com/search?q="northern+irish+politics"} -- but I venture that people find it ugly, since {http://google.com/search?q="northern+ireland+politics"} has ten times as many hits.

and also asserted that:

The political adjective form for the "United Kingdom of Great Britain and Northern Ireland" is "British." Ditto "United Kingdom." Ditto "U.K." Ditto "Great Britain" in its political and geographic senses. Search for any of these things in UK law, and look for the adjectival usage around them.

As background, Aidan explained that

"Britain" is equivalent to "Great Britain" in English, with the caveat that it’s slightly less high-register. The "Great" is a calque from French, where "Grande" is needed to distinguish Britain from Brittany. The Wikipedia is wrong in describing it as "informal"; were it so, http://www.number-10.gov.uk/output/Page39.asp in introducing the Prime Minister’s home would not use it. [in the phrase "Britain's 51 Prime Ministers", presented sans "Great" -- ed.]

OK, now we're getting somewhere -- we can blame it all on the French!

On the other side of the argument, Lora Totten-Schwartz wrote that

I'm emailing you to pester you (as I'm sure others are/will be) about calling people from Great Britain "British." 

I'm trying very hard not to say, "G'wan, say that to a Scotsman!" with a giggle.  People who live outside of England are not British.  That's why they have that "United Kingdom" thingy in the official name of the country.  Scots are Scottish, Welsh are Welsh, Irish are Irish, English are either English or British.

Of course she's talking about people, whereas Aidan is talking about places and political institutions. [But see below for a strongly contrary opinion from "the Pedant-General in Ordinary".] So my ignorance is becoming deeper and more nuanced by the minute.

Aidan Kehoe closes by observing that

Now, certainly that’s not as clear as it could be, but someone doing foreign policy for a living through English can reasonably be expected to figure it out.

I believe this is intended to be a criticism of our president. Although I'm not persuaded that "doing foreign policy for a living through English" is an appropriate description of his job, I'll certainly grant that any literate citizen of the English-speaking world needs to know that whatever the adjective corresponding to Great Britain might be, if any, it's not "Great British". Alas. ]

[It's also been pointed out to me that "Great British" is hardly a true regularization, which would have to be something like "Great Britainian". Instead, it's merely a sort of over-generalization, or perhaps a blend whereby great leaks from the nominal Great Britain over into the adjectival British. Whatever.]

[Another update: Lane Greene writes

... while we're piling on, the "Republic of Ireland" may be a "political" name in your taxonomy, but isn't official. The country is officially just Ireland, and Eire. So 14: "Ireland - Short form of 12" isn't quite right. "Ireland" is the full form.

Incidentally, I admire that. Few countries do it - Canada is one that springs to mind. Knock it off with the People's Democratic Wonderful Republic of Whatever.

I think I see. The Wikipedia article says that

The Republic of Ireland (Irish: Poblacht na hÉireann) is the official description of the sovereign state which covers approximately five-sixths of the island of Ireland, off the coast of north-west Europe. The state's official name is Ireland (Irish: Éire), and this is how international organisations and citizens of Ireland usually refer to the country.

So if Wikipedia is right, Lane is not strictly correct that the "Republic of Ireland" denomination "isn't official" -- it's just the "official description", not the "official name". But I think we've split this particular hair into enough slices.

Meanwhile, I wonder -- are there any other countries that have "official descriptions" different from their official names? ]

[What I hope is the last update: Rich Alderson weighs in on behalf of the history of languages and populations:

I'd like to point out that from the point of view of historical linguistics, Lora Totten-Schwartz has it wrong when she writes

Scots are Scottish, Welsh are Welsh, Irish are Irish, English are either English or British.

The English are English, of course, but the Welsh are British (as may be some of the Scots, though they are sort of English when they aren't Gaelic). The Cornish used to be British, as are the Armorican (not to be confused with the American) expatriates, who moved to Armorica and renamed it Brittany when the English showed up in Britain.

I hope that clears that all up.

Me too.]

[TG Gibbon write:

I am no expert but I do use national adjectives everyday at work and I really have to strongly disagree with Lora Totten-Schwartz; 'British' is widely accepted, even in Scotland, as being the adjective applying to nouns from out the UK or GB. True, you'll get a Glasgow Kiss if you call a punter 'English' on the Cowgate in Auld Reekie, but much as they may regret it they do understand they are part of both an island and a state for which the adjective is 'British.' This was the rule in my (American) home growing up, it was reenforced by my (American and Scottish) education, and is currently the policy of my (American and English) company. Of course now that I think about you stand a pretty decent chance of running into an Englishman on the Cowgate.

Is Tony Blair not British? Is Edinburgh not a British university, John Cale a British musician, and Calders a British beer?

With Northern Ireland there is some added ambiguity. Those fellas could be called, correctly, either British or Irish. One for the state, the other the island. At work I would certainly 'apply the term' (as we say) 'British' to both Liam Neeson and Kenneth Branagh as we base our classifications strictly on states.


[Update: The Pedant-General in Ordinary writes to disagree in the strongest terms with the view that Scots are not British:

I am an enormous fan of the Language Log, and have shamelessly purloined some articles in the past. But to allow Lora Totten-Schwartz to describe Scots as not British is simply absurd.

She has it completely the wrong way round: To accuse a Scot of being "English", or to use the term "English" when you mean British (e.g. the "English Army" as opposed to the British Army, though oddly of course, the Armed Services pertain to the UK, not GB, and assuming of course that such mention relates to the Army post 1707), is a grave offence, not the other way round.

You compound the problem in your attempts to clear up the matter:
"Of course she's talking about people, whereas Aidan is talking about places and political institutions."

Nonsense. The adjectives apply across the board. Scots, English and Welsh people are ALL British. Scots are not English. English people are not Scots. But they are both British.

To give a simple example (for the sake of argument and without prejudice to your real actual place of birth), you are Pennsylvanian. A resident of LA could be described as Californian. But you are both American. The simplest analogy is that Scotland, Wales, England and Northern Ireland are similar to States in the US, with the UK being at the level of Federal Govt. The analogy is far from perfect since we do not have the strict separation of powers or subsidiarity that is enshrined in the US Constitution, but you get the gist. Creditting California specifically with an achievement of the US as a whole, or worse still of Pennsylvania, would be obviously wrong.

This is a howler of such epic proportions that it profoundly discredits the academic merit of the site in my eyes. It displays an ignorance of things British that I can scarcely credit. An apology and promise to do some fairly basic research would be in order: the updates with the text of emails almost suggest that there is debate on this or a legitimate difference of opinion. There is not.

(Disclaimer: I am a Scot, but I'm proud of my British Passport...)

I apologize for getting this distinction wrong. It's apparently a serious matter: over at Infinitives Unsplit, the same author is (mock-) threatening violence. ]

Posted by Mark Liberman at 08:37 AM

February 21, 2006

Reclassifying Linguistic History

Today the New York Times reports (see here) that for the past seven years our intelligence agencies have been hard at work at the National Archives, secretly expunging from public access thousands of historical documents that have been available for years. If they can do it, why can't we? It seems like a good idea for linguists to clean up all the erroneous ideas we've held over the years--especially the ones that embarrass us. For starters, we could excise the foolish notions of Eighteenth Century linguists who wasted their time trying to figure out which language was spoken in the Garden of Eden. Then we could remove from memory the past theories of tagmemics, all those early transformations, the misguided uses of the word, "Negro," by early sociolinguists, and maybe even the errors in Jim McCawley's "Important Dates to Remember in the Month of May" (see here).

What do you think? Are we up to this?

— Roger Shuy

Posted by Roger Shuy at 03:18 PM

The Antiquity Illusion

Geoff Pullum recently returned to the Recency Illusion (if you've noticed something only recently, you believe that it originated recently) and the Frequency Illusion (once you notice a phenomenon, you believe that it happens a whole lot), to add "a kind of inverse counterpart to the Frequency Illusion", the Infrequency Illusion, characterized by Daniel Ezra Johnson as "the belief that something you've found is less common, and therefore perhaps a more interesting 'discovery', than it probably really is."

You're probably guessing that I'm going to add a kind of inverse counterpart to the Recency Illusion, call it the Antiquity Illusion: if you use some linguistic feature naturally and regularly, you believe that it has been in the language for a long time -- at least since your early years.  And so I am.

The larger lesson here is that impressions and memory are both deeply unreliable, and linguists are not in general less subject to the distortions of selective attention and the flaws of memory than non-professionals.

The Infrequency Illusion and the Antiquity Illusion can both be observed wherever linguists chat about the facts of their language -- for instance, on the American Dialect Society mailing list (the archives of which can be viewed on the ADS website).  Someone writes in with some odd turn of phrase in English, only to be told that it's been discussed an ADS-L several times, or that MWDEU has an excellent entry on it, or that it's in DARE or the OED, or that there was a substantial article on it in American Speech two decades ago (or, worse, two months ago).  Someone notices a fabulous eggcorn, only to be told that it was one of the early entries in the eggcorn database and gets thousands of Google webhits.  The Infrequency Illusion at work.

I forbear to name names here, though I'll note that mine is one of them.

The illusion is closely related to a phenomenon familiar in scientific research.  From Gina Kolata's 2/5/06 column "Pity the Scientist Who Discovers the Discovered" in the NYT Week in Review, p. 4:

The discovery that your discovery has already been discovered is surprisingly common, said Stephen Stigler, a statistician at the University of Chicago who has written about the phenomenon.  Not only does it occur in every scientific field, he said, the "very fact of multiple discoveries has been discovered many times."

The Infrequency Illusion is a manifestation of hope.  The Antiquity Illusion is a manifestation of familiarity: what's familiar and routine for you must have been around forever, well, at least throughout your life.  You will be inclined to reconstruct a past in which current linguistic items (lexical items, idioms, pronunciations, constructions, meanings) are projected back from the present.  You may well "remember" conversations from some time ago in which these items occur.  I do.

Try this little experiment: ask a number of friends (linguists or not) how long they think the idiom the whole nine yards has been around; if they're over 30, ask them if they remember reading or hearing it when they were young.  I myself believe that it was in common use when I was in high school and college (in the 50s and early 60s).

My belief is almost surely false, since much tedious digging by lexicographic types has gotten attestation of the idiom back only to 1966.  I still find this astonishing.  (The origins of the idiom are also in dispute, and might never be clarified.  PLEASE, DO NOT WRITE ME WITH PROPOSALS ABOUT ITS ORIGIN.  Look at the discussion in Michael Quinion's Ballyhoo, Buckaroo, and Spuds.)

More recently, Jon Lighter started a discussion on ADS-L about 'phobe or phobe as a clipping of homophobe.  He had it back to 1989, which struck me as awfully recent.  I was pretty sure it had been used in gay publications and on the newsgroup soc.motss before that.  But searches through soc.motss got me back only to 1992, so maybe my memory is wrong.  Maybe not.  What is clear is that my memory can't be trusted.

(By the way, the clipping 'phobe/phobe is an item whose origin will probably never be traced, simply because, like the compound foodcast recently discussed here by Mark Liberman, it's likely to have been invented independently by different people on different occasions.  The best we can do is get a rough time period for its origin, plus a rough estimate of when it became relatively frequent.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 11:12 AM

Into the future at McFriendly's

Garry Trudeau's Doonesbury cartoon of 2/18/06 has two characters working at a restaurant called McFriendly's:

Zonker: Hey, Newt, why do you suppose they changed all the phrases in our manual to the future tense? Like, "This will be your table," and "We'll be offering three kinds of bread" and "The chef will be preparing it on a bed of peanut shells."
Newt: Well, they use the future tense in all the high-end restaurants now.  It's supposed to make us sound classy.
Zonker: Oh.
Newt: It's also handy if the food's late.
Zonker:Good point!  It gives the customer hope!
Posted by Arnold Zwicky at 09:23 AM

Pinging people

Talking of ping (as Mark just was), I recently noticed a friend of mine — an audio production expert in Australia (not in the computer business) — using the transitive verb ping to mean "send email to". That's not the generally accepted meaning. There is a Unix command ping that sends a series of packets to a designated IP address and reports on what comes back (some versions give a message like "unix.ucsc.edu is alive"), so you can check on whether that machine is working, independently of whether it is serving up web pages, or whether a person using it is replying to your emails. The English verb mostly comes from that (reinforced by the earlier sonar use, perhaps, but that wasn't commonly used in this sense before the modern age of ubitquitous net technology). Pinging and emailing are different. But you can see the route of semantic change to this very reasonable secondary meaning: we often send emails to each other just to say "Are you there?" or "Did you get that last message of mine", and that is just like pinging a human being. My friend appears to have just generalized this to sending an email for any purpose, not just to check whether someone is alive.

[Update: Nori Heikkinen of Swarthmore College tells me that in fact this usage is known elsewhere: in the community of people who work at Google in Mountain View, California, it is now quite common. Doubtless it is spreading fast, and has reached Australia via the international techie grapevine. I am not assuming that just because I noticed it this month it therefore originates recently. That would be the recency illusion again.]

By the way, with regard to the false story about the word originating as an acronym from packet Internet gopher, first, that seems syntactically odd (surely Internet packet gopher would sound more like English), and second, Internet has never been two words (Inter- is a prefix), so the result should have been a Unix command called pig, right? I'm glad that isn't what happeened. "I pigged him, but he didn't answer." Ugly.

Posted by Geoffrey K. Pullum at 09:22 AM

Ping ding

To Roger Shuy's list of maxims for would-be linguistic experts (be informative, be relevant, be sincere, be clear), we can add one more, which Roger took for granted, but William Safire too often ignores: be careful. Don't assert, in an authoritative voice, something that anyone with an internet connection and 15 seconds of spare time can discover to be wrong.

As Language Hat notes, Safire's 2/19/2006 On Language column asserts that

A ping is not just the word for a sound anymore. It is also an acronym for "packet Internet gopher," a program that tests whether a destination is online ...

Hat links to a LiveJournal post by Jonquil Serpyllum, who typed {ping history} into Google and found, as the top link, "The Story of the PING Program", by Mike Muuss, whose second line says

I named it after the sound that a sonar makes, inspired by the whole principle of echo-location. In college I'd done a lot of modeling of sonar and radar systems, so the "Cyberspace" analogy seemed very apt. It's exactly the same paradigm applied to a new problem domain: ping uses timed IP/ICMP ECHO_REQUEST and ECHO_REPLY packets to probe the "distance" to the target machine.

If Safire's research assistant had wanted to do more than 15 seconds of research on this topic, (s)he might have looked in the OED, which would explain that before being adopted by WWII sonar operators, the word ping started life as "An abrupt ringing sound, such as that made by a rifle bullet in flying through the air, by a mosquito, the ringing of an electric bell, etc.", with citations from 1835 onward:

1835 J. E. ALEXANDER Sk. Portugal xi. 262 If a button was shown, ‘ping’ went a bullet at it immediately.
1909 KIPLING Rewards & Fairies (1910) 272 Ping-ping-ping went the bicycle bell round the corner.

(Note, by the way, these lovely precursors of quotative go.)

I've defended William Safire more than once. But he sometimes gets simple things wrong, where a quick glance at an easily-available reference work like the OED would reveal the truth. In fairness to Safire (or more properly, to his research assistant), {"packet internet gopher"} gets 684 hits, including some things that purport to be glossaries of internet terms. The second item on today's results debunks the false etymology, but the mistake is a widespread one.

I haven't been able to find out who was originally responsible for the mistake. It seems likely to have started as a joke that someone took seriously, or perhaps a mnemonic device that escaped from captivity.

[Update: many readers have pointed out that the Jargon File entry for ping, along with many other sources, makes it clear that the apocryphal acronym should be `Packet INternet Groper', not 'Packet INternet Gopher'. In other words, the claimed "gopher" etymology is not only wrong, it's also wrong, as is clearly indicated by the Google counts (84,900 for "groper" compared to 684 for "gopher").

More than one correspondent suggested that gopher might have been substituted for groper out of prudery; and several also observed that there was a pre-web document search and retrieval protocol call gopher, now almost extinct, which may have played a role in this bit of malapropic comstockery. Finally, the indefatigable Ben Zimmer observes that

Dave Mills (mentioned by Mike Muuss) claims to have coined ping = "Packet InterNet Groper" in 1979, documentable from 1980:


If true, that would predate the Muuss usage by a few years.

Most people's usage -- including mine -- stems from the unix ping program, which Muuss wrote, so I guess that even if his coinage is somewhat later, it still counts as the basic one, at least for people like me: I've been using "ping" since the mid-1980s, figuring that it was named about the sonar pulse, and I never knew about the groper/gopher business until today. And even Mills' acronym must originally have started with the goal of finding a phrase whose first(ish) letters would echo the sonar "ping".]

[Update #2: Several other readers have pointed out that if Safire's research assistant had checked the AHD entry for ping, (s)he would have found the etymology given as "p(acket) in(formation) g(roper)", which is contested but at least historical, as opposed to "... g(opher)", which is entirely bogus. ]

Posted by Mark Liberman at 06:36 AM

February 20, 2006

Washing the Unwashed

Nobody has to tell linguists that it's difficult, maybe even impossible, for us to use our specialized, insider technical language when we talk to outsiders. For example, some months ago Arnold Zwicky stressed this in his post (see here). His focus was about how one planetary scientist at MIT (of all places) pondered whether the newly discovered planetary orb was a planet or something else, calling the quandary "just a matter of semantics."

It's easy for us to cringe at such mangling of our field but it's a very serious problem for those of us who try to translate our field's knowledge to the linguistically unwashed. Take the courtroom, for example.

Many times I've heard lawyers try to discount arguments with the same mantra, "its just a matter of semantics." I take this to mean that the expert witness chair may not be best place for linguists to represent that we're offering a semantic analysis--at least not until we've first clarified what this means and doesn't mean. It may be more effective to use a slow start-up, first using word meanings or something like that. The same goes for terms like phonology, lexicography, syntax, sociolinguistics, and other ways we have to deal with language. After we've caught their attention, we can say what linguists call these things. It's more understandable to juries if we first use terms more familiar to them, such as the sound system of language, words, sentence formation, language variation, among others. On the cusp of possible jury understandability is psycholinguistics but it, unfortunately, also can conjure up images of mental illness. And dialectology , almost invariably leads to the familiar Henry Higgins response. When we mention, grammar , juries, along with educators and most of the people we meet at dinner parties, usually roll their eyes and mistake it to refer to usage, spelling and punctuation, or anything else they dreaded about their high school English classes.

I suspect that linguists aren't doing a very good job of representing our field to the general public and sometimes I wonder if at least part of our communication problem may be less in our stars than in ourselves. We like to sound brilliant to anyone within hearing range. For example, whenever I try to present my linguistic analysis at trial in ways that the jury will understand, I worry that another linguist might be out there in the audience noticing how I seem to be dumbing things down for jury consumption. When linguists see other linguists out there, the natural tendency is to show off their technical knowledge. Wrong. Wrong. Wrong. At least in this context anyway. To be effective, we have to start with jurors where they are and use terms and illustrations that they can understand. The courtroom is a long way from an LSA meeting.

The trick is to testify in a way that gives the jury enough about our field to justify lawyers using expert linguists in the first place, at the same time taking great care to be truthful, accurate and effective as we talk. For me this is a juggling act where I try to apply Grice's maxims as much as possible:

Be informative. Tell the jury only what is necessary, no more and no less. Don't be more informative than is required or else the testimony won't be effective, maybe not even listened to.

Be relevant. Make all the testimony relate directly to the topic, which is determined by the dimensions of the lawsuit. This should be worked out clearly in advance with the lawyer.

Be sincere. Never give the jury false information, opinions or guesses and don't testify about anything for which there is inadequate evidence.

Be clear. Testimony should void obscurity and ambiguity while being brief and orderly.

Academics, and maybe linguists in particular, who are called on to give testimony in depositions or trials often are not familiar with the strange courtroom culture where only lawyers can introduce topics, interrupt, argue, change the subject, ask all the questions and have access to the many other important language rights that everyone else has when they're not the courtroom. When we do get a turn to talk in court, we're tempted to toss in more information than is needed and sometimes we wander off topic to discuss points that may be important to us but not to the jury or even to the case itself. To juries, academics often seem obscure and anything but brief. We're probably more scrupulous about adhering to the maxim of sincerity because this is more consistent with our academic work. But even then we often say more than is necessary in a way that seems obscure to jurors.

Sure, linguists like Arnold have every right to complain that semantics wasn't used properly by MIT's planetary scientist. But this misuse of semantics may rank rather low in the scale of misunderstandings that linguists confront. The problem is a whole lot broader, as those of us on the firing line of applying linguistics to legal matters can attest.

— Roger Shuy

Posted by Roger Shuy at 10:22 PM

Collateral damage

The usual complaint about the expression for free is that it's pleonastic.  Lose the for.  Omit needless words.  Not You can get it for free, but You can get it free.

Richard Lederer and Richard Dowis (Sleeping Dogs Don't Lay: Practical Advice for the Grammatically Challenged, St. Martin's Griffin, 1999) go beyond this pedestrian complaint to maintain that the expression is syntactically ill-formed.  On page 40:

for free (never)
Free is an adjective, not the object of a preposition.  We're not charging you extra for that information.  We're giving it to you for nothing; we're giving it to you freeFree works best when it is free of adornment.

GENERALIZING a proscription against a particular expression -- and especially providing an EXPLANATION for the proscription -- is a potentially dangerous step.  There can easily be collateral damage, extending to all sorts of expressions the proscriber didn't have in mind.  So it is in this case, as in many others.  Innocent bystanders will be wounded.

According to MWDEU, the campaign against for free began in 1943 (OED2 has cites for the expression from 1887, "chiefly U.S."), probably in reaction to a fashion for it.  "Wordy slang" is a typical slam.  The alternatives are (at least) free, gratis, without charge, and for nothing, and MWDEU notes that these are often unsatisfactory.  Gratis and without charge are stiff and formal (in addition, without charge really works only in a selling context, not in a buying one; I got it without charge is possible, but odd); for nothing has a potential for ambiguity (I shoveled the neighbors' snow for nothing could mean 'to no purpose, with no good result'); and plain free is sometimes (to my ears) barely acceptable at all (??I shoveled the neighbors' snow free -- vs. I shoveled the neighbors' snow for free).  In addition, for free is (like for nothing) parallel in structure to for X, where X is an amount of money (I got it for twenty dollars), which is a point in its favor.

Lederer & Dowis hint at the pleonasm criticism ("works best when it is free of adornment"), but focus on a perceived syntactic flaw in the expression: "Free is an adjective, not the object of a preposition."  Let's pass over the odd juxtaposition of syntactic category ("adjective") and syntactic function ("object of a preposition" -- why not "noun"?) and get right to the core of the objection, which is that for free is a combination of a preposition and an adjective, and that's just not the way English syntax works.

This is deeply silly.  For free is an IDIOM, and idioms fairly often show bizarre syntax.  By and large is a textbook example, and others are easily listed.

But it's worse.  As Tommy Grano (who came across the Lederer & Dowis bit while browsing in advice books for something completely different) pointed out to me, there's in vain, also apparently P + Adj.  We then immediately came up with for real and possibly for good (it's not clear whether good is an adjective or a noun here), and since then I've thought of for sure/certain, in short/brief, and at first/last.  Probably there are more.  Another seven (or eight) will do.  [Yes, there are more: at large and in full, for example.  You can stop sending me more cases.  Please.]  The point is that if for free is bad because it's P + Adj, so are all the others.  This is fairly severe collateral damage from the proscription against for free.

Once proscriptions against particular expressions are generalized (and, often, provided with some sort of grounding explanation in grammar), they very often take in a host of other expressions the proscriber never had in mind.  The red pencil turns into a prediction.  This is usually not a good thing.

Another case: Garner's Dictionary of Modern American Usage (Oxford, 2003) warns against relative pronouns with possessive antecedents.  Simplifying Garner's example from real life: "There may have been inimical voices raised among the committee, such as Nikolaus Esterhazy's, who just then had had an unpleasant brush with the composer."  I am as unhappy with such examples as Garner is.  (The history is complex: relative clauses with possessive heads are a survival from much earlier English, and are occasionally to be found in recent times, but now strike most readers as at best awkward.  Things are even worse when the possessive has an overt head: "Nikolaus Esterhazy's voice, who just then had had an unpleasant brush with the composer, was especially strong."  Still worse: "Nikolaus Esterhazy's, who just then had had an unpleasant brush with the composer, voice was especially strong.")

Ok so far.  Such relative clauses are fairly often deprecated in the advice literature.  But Garner doesn't stop there; he goes on to say that the proscription is necessary, citing a more general proscription, with an explanation for it:

The relative pronoun who stands for a noun; it shouldn't follow a possessive because the possessive (being an adjective, not a noun) can't properly be its antecedent.

Eek.  Now Garner has invoked the Possessive Antecedent Proscription (which he does not otherwise seem to espouse) in its full power, and he's set himself against innocent bystanders like Mary's father adores her.  The problem is that if possessives are bad antecedents for relative pronouns because they are adjectives -- they aren't, of course, but Garner thinks they are -- then they're also bad antecedents for personal pronouns like her and she.  That's SERIOUS collateral damage.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 09:02 PM

Not your mother's snowclone

In an entertaining piece on shelter magazines in the March 2006 Atlantic Monthly ( "Home Alone: The dark heart of shelter-lit addiction"), Terry Castle reports (on p. 122) that

    If you enter the words "not your mother's" on Google, you'll get nearly 200,000 results, a huge number of which point you immediately toward shelter-mag articles.  "Not your mother's [whatever]" turns out to be an established interiors trope, endlessly recycled in titles, pull quotes, advertisements, photo captions, and the like.  "Not Your Mother's Tableware" is a typical heading--meant presumably to assure you that if you acquire the featured cutlery you will also, metaphorically speaking, be giving your mom the finger.

Castle is exploring the message of the shelter magazines that you can free yourself from your mother's influence and make "your own space".  Along the way she has stumbled upon a snowclone that we haven't discussed here: "not your R's X" (where R is a kin term) conveying that this X is new, unprecedented, improved, superior, unconventional, etc.

We haven't commented on Not Your R's X here, but at least once we've used it: Mark Liberman, in a brief posting "Word Wars" (6/16/04) linking to a site on Scrabble competitions, quoted "This is not your grandmother's scrabble."

Back on the mother front, Castle enumerates:

Other online items that are not your mother's: wallpaper, mobile homes, Chinette, faucet sponges, slow cookers, backyard orchards, and Tupperware parties.  Beyond the realm of interior decoration--it's nice to learn--you can also avoid your mother's menopause, divorce, Internet, hysterectomy, book club, Mormon music, hula dance, antibacterial soap, deviled eggs, and national security.  Thank you, Condi.

On 2/19/06, I got 256,000 raw Google webhits for "not your mother's" and 70,000 for "not your grandmother's".  Some of these occurrences are literal, but an astonishing number of them are snowclones.

Surely the most famous instance of Not Your R's X is the ill-fated "Not Your Father's Oldsmobile" advertising campaign that General Motors launched back in the 90s, a campaign that alienated older car buyers and didn't attract enough younger ones.  The Oldsmobile is no more, but the snowclone thrives; "not your father's Oldsmobile" gets 9,190 hits!

In any case, "father's" beats out "mother's" on Google, 492,000 to 256,000, while "grandfather's" trails "grandmother's" by a bit (57,600 to 70,000).  "Parents'" gets a respectable 79,000, and then the numbers drop precipitously:

grandparents' - 818
brother's - 826
sister's - 609
uncle's - 481
aunt's - 176

For example: "not your brother's GameBoy" (referring to a Nokia handheld), "not your sister's Nancy Drew" (referring to America's CryptoKids, a National Security Agency website for children -- I am not making this up), and "not your uncle's Lexus" (will they never learn?).

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:15 PM

Genericide unsweated at Google?

If you're not a regular TV watcher, you should still take notice of the recent 30-second Pontiac ad, which closes with a brief glimpse of Pontiac being typed into the Google search box, with the voice-over "Don't take our word for it, Google 'Pontiac' to find out!" (There's some commentary here.)

One dog that isn't barking in the discussion: Google doesn't seem to be making a fuss in this case about genericide. The company has been worried about the problem at least since early 2003, when Google's lawyers asked Word Spy to "to make sure that when people use 'Google,' they are referring to the services our company provides and not to Internet searching in general", by adding "Note that Google™ is a trademark identifying the search technology and services of Google Technologies Inc." to the entry for google v. See here, here, here for more discussion about this ancient history.

In contrast, the coverage of the Pontiac ad features stuff like this:

Google tells me Pontiac did seek their permission to use Google in the ad. Google did not pay to be in the ad, nor was it any type of comarketing activity. Says Google:

"We are happy that Pontiac has featured Google search in their television ad campaign. This is evidence that mainstream brand advertisers are increasingly realizing the close relationship between broadcast advertising and search usage."

But the ad's voice-over just invites us to "google Pontiac", not to "check out 'Pontiac' using Google ™ brand search" or whatever. The ad does show a glimpse of Google's search page, which might count legally as a substitute for all the TM verbiage, I don't know. But it seems to me that there's also a key practical difference between Google's situation and the situation of Xerox or Kleenex. As long as Google controls the domain names, anyone who generically googles something will in fact do it via Google rather than a rival service.

See this BusinessWeek article for additional context.

Posted by Mark Liberman at 12:01 PM

Popular lexicography on the internet: when neologisms aren't new

From a newsletter sent out today by the folks at eGullet:

eG Radio is here. We recently launched our series of eG Radio foodcasts (we thought we invented the word, but then we Googled it and 135 other people already had). You can download the first one any time and listen at your leisure. It so happens our guest from that foodcast is here this week for the eG Spotlight conversation, so . . . Listen. Chew. Discuss. [emphasis added]

I recognize that it's bad manners to quote yourself, unless it's part of an attempt to clarify a misunderstanding. On the other hand, it's also bad manners to cut-and-paste without citation, even from yourself, and gratuitous paraphrase is a waste of time -- so forgive me for quoting from a Language Log post of 12/6/2004:

Robert Merton once wrote that "Anticipatory plagiarism occurs when someone steals your original idea and publishes it a hundred years before you were born". One of the less widely recognized consequences of internet search is a significant increase in the rate of (a shorter-term variety of) anticipatory plagiarism.

More seriously, the note on foodcast is a clear example (among many) of parallel independent coinage of a new word. People often assume that a new word must have some single specific source, but I suspect that this is not always true, and may even be the exception rather than the rule.

Posted by Mark Liberman at 11:50 AM

German and Italian and Ladin and...English?!...in Bolzano

In the Provincia Autonoma di Trento, the southern half of the Trentino-Alto Adige region of Italy, roughly everything is Italian and nothing is German, though one does find a few things like Krapfen in pastry shops and the occasional Imbiss sign on a little free-standing snack shop; we have not (yet) heard German spoken on the streets of Trento. But if you take the train north to Bolzano, less than an hour away, you cross the line between Trentino and the Provincia Autonoma di Bolzano-Alto Adige/Autonome Provinz Bozen-Südtirol, where things are very different.

First you notice the place names, which are bilingual -- on maps, either first Italian and then German (as on our big Michelin map of northern Italy, which even has the Italian name underlined in red) or first German and then Italian (as on our big Kompass map of Trentino and environs, but with no red emphasis on the German name). In the train stations themselves, the signs have Italian first, German second. If this signage is a residue of the old fascist government's efforts to Italianize the South Tirol by transplanting thousands of Italians there, it doesn't seem to have worked awfully well: on the streets of Bolzano we heard a lot of German and no Italian (though someone must have been speaking it somewhere there). In the official tourist office and all but one of the shops, people addressed us in German -- possibly, of course, because we look so extremely unfashionable as to be implausible Italian speakers. In the restaurant where we had lunch, the menu was as German (or rather Austrian) as the spoken language, though it was a bilingual German/Italian menu. We were especially struck by the local bookstore system: on one street there was an Italian bookstore, with no visible word of German in the place; a few doors away there was a German bookstore, with no visible (or audible) word of Italian.

Then there's Ladin. According to one of our maps, there are seven Ladin villages in Trentino, and there must be many more in Bolzano-Alto Adige; according to Ethnologue, there were 30,000-35,000 speakers of Ladin in Italy as of 1976. It has some official status in Bolzano-Alto Adige, because there are laws regarding Ladin education in Ladin villages; Ladin schools have been in operation for over thirty years, and the Uniun Generela di ladins dles Dolomites was founded in 1946. But a Ladin presence on signage in Bolzano, at least on the signs we saw in our brief visit to the city center, was confined to the exit sign on the railway platform, which was in German, Italian, and Ladin -- whose orthography is striking, for a Romance language, because of the umlauts over some of the e's.

Bolzano has a new university, with several highly successful departments, including Computer Science. And we heard yesterday from a computer-scientist friend in Trento that the official language of the university is...English. And yet some people claim that English is not taking over the world, at the expense of all those other really really interesting languages!

Posted by Sally Thomason at 04:43 AM

Why it's usually the first that lasts

I've been thinking about first and last lines of novels. Are endings less striking than beginnings, in the sense that we don't so easily or so often remember them verbatim? I've convinced myself, at least, that endings are less likely to be quoted.

One of the most striking last sentences that I can think of is from Ernest Hemingway's The Sun Also Rises:

“Isn’t it pretty to think so?”

This is widely enough quoted or otherwise re-used that {"pretty to think so"} gets 16,900 Google hits. Another candidate is the last line of Charles Dickens' A Tale of Two Cities:

"It is a far, far better thing that I do, than I have ever done; it is a far, far better rest that I go to than I have ever known."

which yields 38,500 for {"a far far better thing"}.

But Moby Dick's opening {"call me Ishmael"} gets 155,000 hits, and the start of Pride and Prejudice {"in want of a wife"} gets 74,300. And the opening of A Tale of Two Cities gets 1,780,000 for {"the best of times"}, and even 405,000 for {"the best of times" "the worst of times"}.

Of course, there are plenty of great beginnings that are rarely quoted -- James Joyce's {"stately plump buck mulligan"} gets only 10,900 hits, and Dodie Smith's {"I write this sitting in the kitchen sink"} gets a mere 805. So you'd need a much more systematic survey to conclude reliably that beginnings are more often quoted or snowcloned than endings. But beginnings do have an obvious advantage: they come first.

Beginnings are designed to work out of context, or rather in the context of no context. The first line needs to seize our attention and keep it, even if we don't entirely understand what it says. And if we remember the opening line when we've forgotten most of what follows, then we're just back where we were when it first impressed us.

Endings are different: they depend on the fact that we've experienced the rest of the story. When we're not immersed in that context, a brilliant ending may seem dull. Consider the end of Vladimir Nabokov's Pale Fire:

But whatever happens, wherever the scene is laid, somebody, somewhere, will quietly set out -- somebody has already set out, somebody still rather far away is buying a ticket, is boarding a bus, a ship, a plane, has landed, is walking toward a million photographers, and presently he will ring at my door -- a bigger, more respectable, more competent Gradus.

If we've read the book, we know who Gradus is, and whose door his superior counterpart will ring at, and why. Or more precisely, we don't know, because everything we've learned comes from the most unreliable narrator ever, created by a writer who once explained that

A creative writer must study carefully the works of his rivals, including the Almighty. He must possess the inborn capacity not only of recombining but of re-creating the given world. In order to do this adequately, avoiding duplication of labor, the artist should know the given world. Imagination without knowledge leads no farther than the back yard of primitive art, the child's scrawl on the fence, and the crank's message in the market place. Art is never simple. To return to my lecturing days: I automatically gave low marks when a student used the dreadful phrase "sincere and simple"-- "Flaubert writes with a style which is always simple and sincere"-- under the impression that this was the greatest compliment payable to prose or poetry. When I struck the phrase out, which I did with such rage in my pencil that it ripped the paper, the student complained that this was what teachers had always taught him: "Art is simple, art is sincere." Someday I must trace this vulgar absurdity to its source. A schoolmarm in Ohio? A progressive ass in New York? Because, of course, art at its greatest is fantastically deceitful and complex.

So at the end of Pale Fire, our ignorance of Gradus and Kinbote and Shade is as richly structured and emotionally overwhelming as our ignorance of real life. In that context, its last sentence is one of my favorites, but without the rest of the book, it's inert and lifeless.

The last phase of The Sun Also Rises -- "Isn't it pretty to think so?" -- is intrinsically striking because it uses pretty as an evaluative adjective in the frame "it's __ to think so", where we expect something like nice. In effect, the decontextualized quote gets its force from subverting an idiom and forcing us to think about why the speaker chose that slightly ill-fitting adjective. But the effect changes, I think, with just enough additional context to give us a human frame for reasoning about word choice:

“Oh, Jake,” Brett said, “we could have had such a damned good time together.”
Ahead was a mounted policeman in khaki directing traffic. He raised his baton. The car slowed suddenly pressing Brett against me.
“Yes,” I said. “Isn’t it pretty to think so?”

And the impact changes again if we've read the whole book, and know why they didn't have a good time together, and why they aren't about to start.

One more example -- the last sentence of Patrick O'Brian's The Letter of Marque:

She urged West out of the cabin and on deck, and there he and the amazed foremast hands saw a blue and gold coach and four, escorted by a troop of cavalry in mauve coats with silver facings, driving slowly along the quay with their captain and a Swedish officer on the box, their surgeon and his mate leaning out of the windows, and all of them, now joined by the lady on deck, singing Ah tutti contenti saremo cosí, ah tutti contenti saremo, saremo cosí with surprisingly melodious full-throated happiness.

When I read this at the end of the book, in the context of the preceding 11 books in the series and the plot of Le Nozze di Figaro, knowing who the lady on deck is and what her relationship has been to the men coming along the quay, this brought tears to my eyes. Out of context, it's just a long sentence describing an implausible and rather silly scene.

[Update: John Cowan points to this compendium of notable last lines. I used it to try a little experiment on myself. Of the 100 works cited in the last-lines list, I believe that I've read 97; but only 18 of the quoted last lines seemed familiar to me as specific word-sequences. (Note that the last-lines list contains several poems, many Shakespeare plays, and at least one short story.) In the case of the Pantagraph list of the 100 best first lines, I've read only 79 of the works, but remembered the wording of 65 of their openings.

John also cited this blog discussion of last lines, which in turn references an 11/21/2005 article about last lines in The Telegraph, which ends this way:

All-time favourite ending? Too many to choose among, but it might be hard to beat the end of The Code of the Woosters, with its drowsy confusion of quotation and happy oblivion, now that everything has been tidily sorted out: "And presently the eyes closed, the muscles relaxed, the breathing became soft and regular, and sleep, which does something which has slipped my mind to the something sleeve of care, poured over me in a healing wave." You'd sleep soundly after that, no doubt about it, whether you'd written it, or merely read it.

That one I remember. ]

Posted by Mark Liberman at 12:01 AM

February 19, 2006

Currents in the sea of speech

Robert Siegel interviewed Bill Labov on All Things Considered, 2/16/2006: "American Accent Undergoing Great Vowel Shift". Siegel is an intelligent and skillful interviewer, and Bill gives a terrific performance. Listen to it!

Unfortunately, Bill chose to publish the Atlas of North American English with Mouton de Gruyter, who in turn felt compelled by the economics of publishing -- as they see it -- to price the result at $620 a copy (until March 31, 2006, and $749 thereafter, plus $125/year for access to the online site), and to ask Bill to remove the summary chapter that he used to have on his web site about this stuff. So if you want to learn more, the price of admission is pretty steep.

Mouton did invest quite a bit in producing the maps and so on for the print version, and in supporting the development of the CD-ROM by Jürgen Handke at the University of Marburg. On the other hand, the underlying research was supported by years of U.S. government support (from NSF), and it's a shame to see the results -- which are of interest not only to linguists and linguistics students, but to many others as well -- locked up behind such an extraordinarily expensive wall.

Posted by Mark Liberman at 08:30 AM


Language Log readers are by now well acquainted with the Secret Annual Cabal, but are perhaps not aware that the American Association for the Advancement of Science has a section devoted to Linguistics and the Language Sciences. Indeed, our fellow Language Loggers Lila Gleitman and Arnold Zwicky are Fellows of the AAAS.

I'm in St. Louis at the annual meeting of the AAAS. On Friday there was a symposium entitled In Search of Genes that Influence Language: Phenotypes and Molecules in the morning and one entitled Language Evolution: New Perspectives from Genetics, Neuroscience, and Human Infants in the afternoon.

Of course there are interesting things outside of linguistics. Today I went to the symposium Amazonian Dark Earths: New Discoveries, a session on political interference in science by the Bush administration, and a plenary lecture on The History of Nature: Why Aren't We Teaching It in Our Schools? by Ursula Goodenough. Tomorrow I'm planning on going to a symposium on First Human Entry into the Americas: A Critical Assessment of New Models and New Evidence.

You don't have to be an academic to find it interesting. I just had dinner with a forester from northern British Columbia I met in Vancouver airport on the way down. He comes to the AAAS just out of interest.

Posted by Bill Poser at 01:36 AM

February 18, 2006

The infrequency illusion: a linguist's prayer

Arnold Zwicky has mentioned on Language Log a few times (like here and here and here) that there are "two seductive effects of selective attention" often noted among the soi-disant language mavens and blowhard usage pontifcators: "the Recency Illusion (if you've noticed something only recently, you believe that it in fact originated recently) and the Frequency Illusion (once you notice a phenomenon, you believe that it happens a whole lot)." His point is that "your impressions are unreliable; you need to find out what the facts are." Daniel Ezra Johnson of the University of Pennsylvania just made a lovely observation by email: that linguists may often be prone to a kind of inverse counterpart to the Frequency Illusion about linguistic data: "the belief that something you've found is less common, and therefore perhaps a more interesting ‘discovery’, than it probably really is." It's a lovely point. It makes me want to say a prayer each morning (and I adapt to this purpose a saying of Confucius about noticing treachery): "Lord, let me be the first to notice a phenomenon when it is rare or unusual, and the last to think a phenomenon is rare or unusual when it is not." Amen.

I'll give you an example. A while back I noticed that I could only think of three businesses whose names were indefinite noun phrases: A Friendly Inn, a B&B on Cambridge Street near Harvard Square that I pass every day on my way to the Radcliffe Institute; A Clean Well-Lighted Place for Books, a bookstore with a few locations in the San Francisco Bay area; and A Pacific Café, a gourmet restaurant in Kapa‘a on Kaua‘i in the Hawaiian islands. But my feeling that this was an extremely rare phenomenon was completely wrong. Just look under "A" in the business white pages of any large city and you'll see. Because I had noticed something, I thought it was special. It wasn't. You need to find out what the facts are. Indefinite NPs are much less common than definite NPs as business names, but not so rare as to occasion comment by linguists.

Posted by Geoffrey K. Pullum at 03:42 PM


Near the end of Maureen Dowd's 2/18/2006 column "Hunting for a Straight Shooter":

Our message is supposed to work because it has moral force, not because we pay some Lincoln Group sketchballs millions to plant propaganda in Iraqi newspapers and not because the press here plays down revelations of American torture. [emphasis added]

Though I don't think that you'll find the term sketchball in any dictionary on your shelf, it's not MD's invention.

Sketchballs has 683 Google hits, in contexts like an April 2005 Interview article that describes Jennifer Alba as "the ball of fire who has seduced, outfoxed, and overpowered a litany of criminals, creeps, and sketchballs on her way to becoming the new face of Hollywood heroines", or a November 2005 piece in the Auburn student paper headlined "The Survivor's Guide to Sketchballs". And the singular sketchball has 865 more, including this meta-commentary on the term's transparency:

Its funny, when you switch firms, all of the sudden, everyone starts picking up on terms that you've been using for years and you realize you've come up with your own lingo or borrowed some from others and made it your own. I called someone a "sketchball" the other day and Fred was all over it. However, the term "sketchball" itself isn't that interesting to me, so I'm not going to make it a Charlieism. I think if I call someone a sketchball, you'll know what it means.

[You can also find a number of sketchball-related contributions over at the Urban Dictionary.]

The first part of sketchball is from sketchy, which has recently (?) taken over from sleazy as the favored slang expression for "untrustworthy", "shady", "of questionable character" (though some people seem to use it to mean nothing more than "deprecated" or "unpopular"). The -ball part is older. Quoting from J.E. Lighter in The Atlantic of June 1998, "Eyes on the ball":

As the essayist and novelist Nicholson Baker has noted, the suffix -ball has become an important resource for the slangy smart-alecks of our time. Think of the belittling butterball, cheeseball, cornball, dirtball, goofball, hairball, nutball, oddball, sleazeball, slimeball, and weirdball. Most of these arrived well after mid-century, and in most the -ball element is only a metaphor. The spiritual progenitor of this burgeoning array of ball-bearing compounds, though, is undoubtedly a real ball -- the familiar screwball, first noted in print in 1928. Originally this designated the deceptive baseball pitch that breaks in a direction opposite to that of an ordinary curve ball. The screwball (similar to the earlier fadeaway) gained its nom de guerre largely through the efforts of Carl Hubbell, the left-handed Hall of Fame ace who pitched for the New York Giants from 1928 to 1943. The winner of twenty-four consecutive games stretching over the 1936 and 1937 seasons, Hubbell perfected his signature screwball in the minors; he remembered having a catcher in Oklahoma tell him the pitch was "the screwiest thing [he] ever saw." Shortly thereafter the descriptive screwball was bestowed on human beings -- people who display an unpredictable twist. The linguistic leap was made easy by the prior existence of screwy, which had the same connotation. Screwy derives from the nineteenth-century expression "having a screw loose"-- that is, "having something missing or defective," as in machinery ("There's a screw loose somewhere ...").

Lighter also mentions scuzzball, which was the occasion of his column.

Most of the cited examples involve adding -ball to a two-syllable adjective ending in -y, but with the -y removed: corny, dirty, goofy, nutty, sleazey, slimey, scuzzy, sketchy. There are a few where the base seems to be a monosyllabic adjective: odd, weird. There are some cases where it's not entirely clear what the base is: are hairball, cheeseball and nutball from hairy, cheesey and nutty, or are they influenced by the independently derived hairball (the ball of hair that a cat spits up), cheeseball (a ball made of cheese), etc. Jerkball is pretty common (more than 4,000 Google hits) but the adjective jerky is much less prominent than the noun jerk.

The XyXball transformation is not foolproof, though: silly doesn't yield *sillball, presumably because sill is not a morpheme here. And in general polysyllabic insults don't take -ball. You could use the made-up word idiotball to describe a particular approach to the strategy or tactics of baseball -- along the lines of Michael Lewis' neologism moneyball -- and in fact Google will tell us that on 6/25/2004, in the Sox vs. Cubs Game Day Thread on whitesoxinteractive.com, HomeFish did so:

Garland gives up the run. Idiotball at its finest.

But it seems totally implausible to refer to someone as an idiotball -- or a bastardball or an a**holeball either. In contrast, polysyllabic nouns for nasty substances seem plausible as a base. Thus mucousball ought to work, it seems to me, even though it's not to be found in Google's index. Corpus linguistics still has some limitations, I guess.

[A note of the subject matter of MD's column. The Lincoln Group, which self-identifies as "a strategic commnications and public relations firm providing insight & influence in challenging & hostile environments", has a plausible claim to sketchballosity. It was flagged around Thanksgiving by the LA Times for "secretly paying Iraqi newspapers to publish stories written by American troops in an effort to burnish the image of the U.S. mission in Iraq". And back around Memorial Day, Billmon at Whiskey Bar was already muttering darkly about the Lincoln Group in the context of remarks about the spoils system, political "oppo" research, "propaganda and disinformation", and so on.

On the other hand, psychological warfare has a long history, much of which would probably be viewed positively by the same people who were scandalized by the idea of planting stories in the Iraqi press. For example, Ralph Ingersoll, who helped Harold Ross set up the New Yorker, and founded the leftwing newspaper PM, worked in on psychological warfare operations for the U.S. Army in WW II.]

[Update: Mark Jason Dominus writes

Bonnie Webber of the Penn CIS department used to have a clipping on the door of her office that tabulated which combinations were permissible, something like this:

                -ball   -wad    -bag    -head   -bucket

        Screw-    X                        X

        Cheese-   X              X        (1)
        Scuzz-    X              X                  X

        Scum-     X      X       X                  X

        Slime-    X                                 X

        Douche-          X       X

I don't remember which prefixes and suffixes were actually tabulated, though.

I'll ask Bonnie (who is now at Edinburgh) if she remembers, or if she has a more up-to-date version. Certainly this appears to be a much-needed gap in the literature on English derivational morphology.]

[Update #2: Claire, commenting at Languagehat's site, observes that the Australian equivalent of "sketchy" is "dodgy", whereas "dodgeball" is (she thinks, and I agree) not a likely neologism -- due to blocking by a prior coinage, I guess, as well as dialect incompatibility.]

Posted by Mark Liberman at 01:12 PM

Best first lines -- and last?

A couple of weeks ago, The American Book Review, a "nonprofit journal published at the Unit for Contemporary Literature at Illinois State University", came up with a list of the 100 best first lines from novels. I was surprised and pleased to see that they included the openings of William Gibson's Neuromancer, Flann O'Brien's At Swim-Two-Birds and Dodie Smith's I Capture the Castle. Some may be surprised to see that #22 is from Edward Bulwer-Lytton's Paul Clifford, "It was a dark and stormy night".

I don't know any similar compendium of endings.

[Update: Michael de Mare writes

I read your post on the Language Log about best last lines, and was sufficiently inspired to start a compendium:


Good last lines seem rarer than good first lines; I only managed to find six of them so far.

At least, good last lines don't make as much of an impact and are harder to remember. I can't think of many of them unprompted, though I have a feeling that when I start pulling books down from the shelf and looking at the end, it will turn out that there are plenty of excellent endings.]

Posted by Mark Liberman at 09:39 AM

February 17, 2006

This week's baffling grammatical advice

Paula Bell's Hightech Writing: How to Write for the Electronics Industry (Wiley, 1985) has a short section on relative clauses in its chapter on grammar.  It begins (p. 110):

Relative clauses--which are introduced by the relative pronouns what, whose, who, whom, that, and which--should immediately follow the nouns they modify.

Careful readers will have noticed that in this list.  That certainly can introduce relative clauses, but it's also certainly not a pronoun.  However, this mis-categorization is very common in writing about English grammar -- even MWDEU gets this wrong -- so let's just pass over it, scowling.  (Meanwhile, she fails to mention "zero", or "contact", relatives, with no relative marker, as in the book I just read.)

Really careful readers will also  have noticed what on the list: when can what be used as a relative pronoun?  In fact, pretty much every book of grammatical advice observes that the relative pronoun what, though quite old in English, is now thoroughly non-standard, though sometimes rustically colorful: "Dance with the ones what brung you".  Is that what Bell is thinking of?

No, and we don't have to wait to find out.  She takes up the relative markers in the order she listed them, so what comes first.  This is the ENTIRE subsection on what:

Don't use what in sentences like [The following paragraphs tell what steps to take]; instead use the.

Baffling.  It's thoroughly confused, and stunningly unhelpful to boot.

First, what steps to take in the example above is not a headed relative clause; it does not follow a noun, much less a noun it modifies.  It's a subordinate clause serving as the direct object of the verb tell.

Second, what steps to take in the example above is not even a headless, or "free", relative.  Now, what can indeed introduce free relatives -- as in What she had in her hand sparkled 'that which she had in her hand sparkled, the thing (that/which) she had in her hand sparkled' -- so it does indeed belong on a list of relative pronouns.  It's just that what steps to take in the example above is not a free relative; it's an interrogative clause, with the WH determiner what modifying the following noun steps.  (Which would be possible in the place of what, though with a slightly different meaning.)  The determiners what and which cannot, in fact, occur in restrictive relative clauses, whether headed or free; they are specifically interrogative:

  the solution which occurred to them
 *the solution which/what idea occurred to them

  What she had in her hand sparkled.
 *What/which thing she had in her hand sparkled.

A further relevant difference between free relatives and subordinate interrogatives: both can be infinitival rather than finite, but infinitival free relatives (as in The person to see is Kim) do not allow a relative marker (*The person who/which/that to see is Kim), while subordinate interrogatives must of course have their WH words (I don't know who/which to see).

Corresponding to the differences in syntax between free relatives and interrogative subordinate clauses (of which I've given only a sampling here), there's also a subtle semantic difference: free relatives denote ordinary individuals, like the sparkly thing in her hand, while subordinate interrogatives denote answers to questions, like the answer to the question "What steps should I take?"

To summarize so far: Bell's example is utterly irrelevant to a discussion of relative clauses.  It has an interrogative construction, not a relative one, in it.

And there's nothing wrong with it, grammatically or stylistically.  Yes, it has a near-paraphrase with the in place of what: The following paragraphs tell the steps to take.   This is a kind of covert interrogative, and I suspect that it's harder to process than the version with what, which is overtly interrogative.

Finally, Bell just tells her readers not to use what in "sentences like" The following paragraphs tell what steps to take.  How on earth can the reader be expected to generalize from this one example?  What other sentences, exactly, are like this one?  (Does the proscription extend to I don't know what steps to take?  To The following paragraphs tell which steps to take?  To What steps to take is a mystery? And on and on.)

Yes, I understand that Bell might have been reluctant to introduce a lot of grammatical terminology.  But if she wants to pick out subordinate interrogatives with WH determiners for special treatment, she's going to have to get technical.  Otherwise, her advice is simply useless, and the little subsection on "relative" what would have been better omitted.

(Once again, thanks to Elizabeth Daingerfield Zwicky, for continuing to feed me obscure, out-of-print, used, and remaindered books on grammar, usage, and style.)

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:44 PM

Roses of Muhammad, bread of Vienna

You've probably heard that the Tehran confectioners union has ordered "Danish pastries" (shirini danmarki in Farsi) to be renamed gul-e-muhammadi, or "roses of Muhammad", in an echo of "freedom fries" and "liberty cabbage".

Since this 1/7/2006 AKI story announced the change as a fact (and it was reported on other wire services on 2/14/2006), I'm puzzled about why Aljazeera and other sources reported it today as if it was new news. Also, I don't understand the relevant theology, but it seems to me that calling pastries "roses of Muhammad" inches uncomfortably close to promoting idolatry, at least among those who are excessively fond of the items in question.


Danish Pastry is in Danish called Wienerbrød, Viennese bread, though it is completely unknown in Vienna. In Denmark, it has been known since 1840 and is said to have been created by immigrant bakers from Vienna, perhaps strike breakers.

[Update: Ben Zimmer points out to me that there is a certain amount of controversy about whether what used to be called Persian should be called Farsi or not. I was once instructed in the sternest terms by an Iranian that "Persian" is to be shunned as a relic of Orientalist prejudices; but I've seen equally strong complaints that "Farsi" is insulting because it's false exoticizing. As far as I can see, "Farsi" has become the standard American term for the national language of contemporary Iran, and so that's what I use.]

[On the other hand, Anders Ringström writes that

I have translated a fair amount of documents from Iran, first translated into English by Government approved translators, embellishing their output with lots of official stamps. They invariably testified that the document was translated from the _Persian_ of the original.

And checking on the Islamic Republic News Agency (IRNA) web site, I see things like

Thursday's edition of the Persian language daily `Iran' ...
a morning daily quoted the Persian service of the Islamic Republic News Agency (IRNA)
A three-day international seminar on "Satire and Humour in Persian Literature" kicks off its work here on February 14, 2006, at the Department of Persian Language in Delhi University


Posted by Mark Liberman at 01:55 PM

Annals of prescriptivism: a Roman instructs the Germans in their own language

It's about 460 or 470 A.D., and Sidonius Apollinaris writes a letter to his friend Syagrius, who's been spending his time among the Burgundians, a German tribe who have set up housekeeping in "Sapaudia", which was near present-day Geneva (or maybe Lyons, different sources tell me different things about this). Sidonius exclaims that

... immane narratu est, quantum stupeam sermonis te Germanici notitiam tanta facilitate rapuisse.
... it's frightful to say how astonished I am that you have picked up knowledge of the German language with such facility.

and he adds that

aestimari minime potest, quanto mihi ceterisque sit risui, quotiens audio, quod te praesente formidet linguae suae facere barbarus barbarismum.
You can hardly imagine how much I and the others laugh to hear that in your presence the barbarian fears to commit a barbarism in his own language.

This is an interestingly complex little joke.

(In passing, note it as another piece of evidence against the idea that linguistic correctness was an 18th-century invention.)

Lewis and Short's entry for barbarismus tells us that the word is borrowed from Greek barbarismos, and gives some relevant history:

    I. an impropriety of speech, barbarism; esp. of pronunciation (acc. to Gell. 13, 6, 14; cf. id. 5, 20, 1, not in use before the Aug. per.; in Nigidius, instead of it, rusticus sermo)

On this story, barbarismus was borrowed from Greek during the Augustan period, and before that the corresponding idea would have been "rusticus sermo", i.e. "hick speech". Thus we start with urban disdain for rural ways of talking, and then move on to make fun of the mistakes made by barbarian foreigners, at a time when the Empire was globalizing.

But I wonder what it might have meant, 500 years later, for Burgundians to fear making an error in their native German in front of a Roman visitor. There's a clue in Sidonius' next sentence:

adstupet tibi epistulas interpretanti curva Germanorum senectus et negotiis mutuis arbitrum te disceptatoremque desumit.
The stooped elders of the Germans are astounded at you interpreting their writings, and choose you as mediator and judge in their disputes.

Nothing is now left of Burgundian except some names, but it was an East Germanic language that should have been pretty close to Gothic, for which we have quite a few texts, e.g. Matthew 6:1

Atsaiƕiþ armaion izwara ni taujan in andwairþja manne du saiƕan im; aiþþau laun ni habaiþ fram attin izwaramma þamma in himinam.
Take heed that ye do not your alms before men, to be seen of them: otherwise ye have no reward of your Father which is in heaven.

The main reason that we have Gothic texts is that Bishop Ulfilas created a Gothic bible around a hundred years before the time of Sidonius' letter, and the Goths prospered over the next few centuries, resulting in enough copies of this work for some to have survived at least in fragmentary form to the present day. Syagrius' Burgundians, meanwhile, had survived a rough century, having been nearly annihilated by the Gepids in around 370, and then nearly wiped out again by Hun mercenaries hired by the Romans in 437.

I don't know much about this period, and there may be evidence to contradict this hypothesis, but perhaps the Burgundians were illiterate in their own language, and were impressed by Syagrius' ability to read the Gothic bible and other Gothic texts. After all, Gothic was associated with a larger and more powerful tribe as well as with the prestige of a written form, and was probably close enough to their own dialect to be mutually intelligible, or nearly so. Ulfilas had been an Arian Christian, just like the Burgundians were, and he was the one who was mainly responsible for converting the Germans who had been in contact with Constantinople during its Arian phase. And it's plausible that Syagrius' Gothic was better than their own, especially in terms of pronunciation, which he could get from the orthography. This might also be true of morphology, where Syagrius could probably reproduce Ulfilas' forms more reliably than the Burgundians could, and even syntax, where Ulfilas' bible is said to have been influenced by Greek patterns.

We might also learn something by looking at Greek barbarismos, which Liddell and Scott gloss as "use of a foreign tongue or of one's own tongue amiss". They point to a section of Aristotle's Poetics where barbarismos used to refer to the results of combining rare words, and is rendered as "jargon" in this version of the passage, translated by W.H. Fyfe:

The merit of diction is to be clear and not commonplace. The clearest diction is that made up of ordinary words, but it is commonplace. ... That which employs unfamiliar words is dignified and outside the common usage. By "unfamiliar" I mean a rare word, a metaphor, a lengthening, and anything beyond the ordinary use. But if a poet writes entirely in such words, the result will be either a riddle or jargon; if made up of metaphors, a riddle and if of rare words, jargon. ... We need then a sort of mixture of the two. For the one kind will save the diction from being prosaic and commonplace, the rare word, for example, and the metaphor and the "ornament," whereas the ordinary words give clarity.

The key to understanding this passage seems to be something that Aristotle writes just before, suggesting that "rare words" often or usually means "words from a different dialect" (or perhaps "from a different time"):

An "ordinary" word is one used by everybody, a "rare" word one used by some; so that a word may obviously be both "ordinary" and "rare," but not in relation to the same people. sigunon, for instance, is to the Cypriots an "ordinary" word but to us a "rare" one.

Note that Aristotle uses glôssa for what Fyfe translates as "rare word":

all' an tis hapanta toiauta poiêsêi, ê ainigma estai ê barbarismos: an men oun ek metaphorôn, ainigma, ean de ek glôttôn, barbarismos.
But if a poet writes entirely in such words, the result will be either a riddle or jargon; if made up of metaphors, a riddle and if of rare words, jargon.

And LSJ glosses glôssa (abbreviated and emphasis added)

A. tongue
         b. g. larungos, = glôttis, larynx
      2. tongue, as the organ of speech
      3. of persons, one who is all tongue, speaker
   II. language; dialect
      2. obsolete or foreign word, which needs explanation
      3. people speaking a distinct language
   III. anything shaped like the tongue

So Aristotle is telling us that text or speech full of obsolete or dialect words will be barbarismos: but surely that is exactly what Gothic would have been to the Burgundians. And also what Burgundian would have been by reference to the norms of Gothic -- especially the Gothic of a century earlier. So if we assume that Gothic had become a prestige dialect from the point of view of the Burgundians, associated with the bible and perhaps other formal writing, then Sidonius' letter makes perfect sense.


Those whose interest is limited to the history of prescriptivism can stop reading here. But there are some left-over goodies that are fun to read, like this from the Preface to the Online Edition of Dalton's translation of Sidonius' letters:

Sidonius Apollinaris was a Roman aristocrat of the 5th century AD. Born around 431 AD, he held estates in Gaul. He pursued an official career under the emperors Avitus (a kinsman), Majorian, and Anthemius, rising to be Prefect of Rome. But all these emperors were murdered in turn by the sinister Ricimer, a barbarian general holding the highest office in the state, that of Patrician, or Prime Minister. Ricimer ostensibly governed in the Roman interest. In reality he pursued no interest but his own, and his murder of the capable Majorian ensured the collapse of the empire.

As Roman rule weakened, the barbarians occupied more and more of Gaul. Sidonius had returned to Gaul under Anthemius. Like so many other aristocrats, he had reluctantly become Bishop in his local town, Clermont in Arvernia. The advancing Visigoths under their king Euric moved into the region; Sidonius helped organise resistance, since none of the Roman forces paid for from the crushing taxation of the time were available to defend them. But after enduring a siege, he found to his appalled horror that the imperial government was plotting to betray the Arvernians, some of their strongest supporters. (His outraged letter to Bishop Graecus, one of the go-betweens, is included in this edition). And so it proved. Sidonius himself was imprisoned by Euric.

States prepared to sell their own allies to appease an advancing enemy have little prospect of survival. In less than a dozen years, Roman rule had ceased everywhere in the West; the consequence of its rulers placing themselves in the power of those whose loyalties were ultimately non-Roman. Sidonius lived long enough to outlive the last emperor, Julius Nepos. He died, sometime after 480, and is canonised as a saint.

There's the stuff of a mini-series there, don't you think?

Here's the full Latin text of the letter (book V, letter V):

Sidonius Syagrio suo salutem.

1. Cum sis consulis pronepos idque per virilem successionem (quamquam id ad causam subiciendam minus attinet), cum sis igitur e semine poetae, cui procul dubio statuas dederant litterae, si trabeae non dedissent (quod etiam nunc auctoris culta versibus verba testantur), a quo studia posterorum ne parum quidem, quippe in hac parte, degeneraverunt, immane narratu est, quantum stupeam sermonis te Germanici notitiam tanta facilitate rapuisse.

2. atqui pueritiam tuam competenter scholis liberalibus memini imbutam et saepenumero acriter eloquenterque declamasse coram oratore satis habeo compertum. atque haec cum ita sint, velim dicas, unde subito hauserunt pectora tua euphoniam gentis alienae, ut modo mihi post ferulas lectionis Maronianae postque desudatam varicosi Arpinatis opulentiam loquacitatemque quasi de + harilao vetere novus falco prorumpas?

3. aestimari minime potest, quanto mihi ceterisque sit risui, quotiens audio, quod te praesente formidet linguae suae facere barbarus barbarismum. adstupet tibi epistulas interpretanti curva Germanorum senectus et negotiis mutuis arbitrum te disceptatoremque
desumit. novus Burgundionum Solon in legibus disserendis, novus Amphion in citharis, sed trichordibus, temperandis amaris
frequentaris, expeteris oblectas, eligeris adhiberis, decernis audiris. et quamquam aeque corporibus ac sensu rigidi sint indolatilesque, amplectuntur in te pariter et discunt sermonem patrium, cor Latinum.

4. restat hoc unum, vir facetissime, ut nihilo segnius, vel cum vacabit, aliquid lectioni operae impendas custodiasque hoc, prout es elegantissimus, temperamentum, ut ista tibi lingua teneatur, ne ridearis, illa exerceatur, ut rideas. vale.

Here's Dalton's 1915 English translation (vol. 2. pp. 48-78):

Book V letter v, to his friend Syagrius, undated:

THOUGH you descend in the male line from an ancestor who was not only consul----that is immaterial----but also (and here is the real point) a poet, from one whose literary achievement would certainly have gained him the honour of a statue, had it not been secured for him already by his official honours,----witness the finished verse that he has left us; and though on this side of his activity his descendants have proved themselves no wise degenerate, yet here we find you picking up a knowledge of the German tongue with the greatest of ease; the feat fills me with indescribable amazement.

I can recall the thoroughness of your education in liberal studies; I know with what a fervid eloquence you used to declaim before the rhetor. With such a training, how have you so quickly mastered the accent of a foreign speech, that after having your Virgil caned into you, and absorbing into your very system the opulent and flowing style of the varicose orator of Arpinum, you soar out like a young falcon from the ancient eyrie?

You can hardly conceive how amused we all are to hear that, when you are by, not a barbarian but fears to perpetrate a barbarism in his own language. Old Germans bowed with age are said to stand astounded when they see you interpreting their German letters; they actually choose you for arbiter and mediator in their disputes. You are a new Solon in the elucidation of Burgundian law; like a new Amphion you attune a new lyre, an instrument of but three strings. You are popular on all sides; you are sought after; your society gives universal pleasure. You are chosen as adviser and judge; as soon as you utter a decision it is received with respect. In body and mind alike these people are as stiff as stocks and very hard to form; yet they delight to find in you, and equally delight to learn, a Burgundian eloquence and a Roman spirit.

Let me end with a single caution to the cleverest of men. Do not allow these talents of yours to prevent you from devoting whatever time you can spare to reading. Let your critical taste determine you to preserve a balance between the two languages, holding fast to the one to prevent us making fun of you, and practising the other that you may have the laugh of us. Farewell.

I'll suggest, by the way, that Dalton mistranslates the last phrase:

... ut ista tibi lingua teneatur, ne ridearis, illa exerceatur, ut rideas

It seems pretty clear that Sidonius is saying that if Syagrius lets his Latin slip, he'll be laughed at by the Romans, but if he continues to work on his German, he can laugh at the Burgundians. So a better translation would be something like

... holding fast to the one so as not to be laughed at by us, and practicing the other so that you can laugh at them.

Whatever the exegesis of who might laugh at whom, this is the earliest description that I've seen of the role of linguistic norms in teasing and other forms of oneupsmanship.

[Note: I came upon the quote from Sidonius on p. 420 of Peter Heather's The Fall of the Roman Empire, which I've recently read with pleasure. Heather's thesis, according to the OUP editorial review, is that contact with Rome "turned the neighbors it called barbarians into an enemy capable of dismantling the Empire that had dominated their lives for so long". Apparently linguistic prescriptivism was one of the things that the barbarians learned along the road to success. It's clear that many cultures have independently developed a fear of "artificial errors" in speech or writing -- errors defined by dialect prejudice, by resistance to historical change, or by a self-appointed arbiter's arbitrary fiat. But did the Germans invent this meme on their own, or did they pick it up from the Romans?]

Posted by Mark Liberman at 12:03 AM

February 16, 2006

I didn't write shit today

The sentence I didn't write shit today is ambiguous: the idiomatic meaning typically says you didn't write (or at best, you wrote essentially nothing); the literal meaning typically says that you did write, and what you wrote could not be described as shit. But, I just noticed today, the two roughly opposed meanings can both be true in the same situation! Consider someone who expects daily output to be between 15 and 20 pages, and today they wrote only a page and a half, though it was of high quality. Then on the idiomatic meaning, they didn't write shit (because a page and a half counts as approximately nothing). But on the literal meaning, they didn't write any shit: it was all good stuff, not excrement. Both meanings are true! So... umm... I have no idea why I told you this. It just occurred to me on the walk home, after a couple of drinks, that's all. I guess it's just another of those observations about the English language not exactly being a perfectly designed instrument for the precise expression of thought. You know, rather like the old observation (which comedian first said it?) that, paradoxically, we drive on the parkway but we park on the driveway.

The people who have commented (here, for example) that there is a difference in intonation that can keep the two meanings apart are right, of course.

Posted by Geoffrey K. Pullum at 10:23 PM

The nesting of clauses that lay in the sentence that Cheney said

"Well, ultimately, I'm the guy who pulled the trigger that fired the round that hit Harry."

When I heard Dick Cheney's admission to Brit Hume on FOX News, my first thought was: "Why is Cheney snowcloning 'The House That Jack Built'?" The nursery rhyme is a classic example of recursively nested relative clauses — indeed, the whole purpose of the rhyme seems to be to teach children how clause-nesting works in English. (Such cumulative ditties pop up in other languages as well.) The structural similarity is obvious:

This is the cat (that killed the rat (that ate the malt (that lay in the house (that Jack built)))).

the guy (who pulled the trigger (that fired the round (that hit Harry))).

A quick blog search finds that I'm far from the only one who had this reaction:

His first and most important sentence should have been I shot Harry. Not, "I pulled the trigger, that shot the round...that milked the cow. That lay in the house that Jack Built."
 (Body Language Lady)

All Cheney has admitted is the obvious: "I'm the guy who pulled the trigger that fired the round that hit Harry." And all in the house that Jack built.

"I'm the guy who pulled the trigger that fired the round that hit Harry."
Wait I know this one... "that lives in the house that Jack built."

"I'm the guy who pulled the trigger that fired the round that hit Harry..."
...who fell on the ground, that scared the quail, who flew through the air, that carried the rain, that fell on the roof, that covered the house that Jack built.

"I'm the guy who pulled the trigger that fired the round that hit Harry."
...that milked the cow that pushed the plow that lived in the house that jack built.

I see Der Sneer has finally fessed up: ""I'm the guy who pulled the trigger that fired the round that hit Harry (that tumbled the house that Jack built)."

"I'm the guy who pulled the trigger that fired the round that hit Harry,"...who sat in the house that Jack built.

Did Cheney have Mother Goose in mind when he concocted his quail-hunting confession? It's possible, though it would be an oddly playful allusion to make at such a serious moment. More likely he hit upon the clause-nesting formulation as a way to establish a clear chain of causality: from his hand to the trigger to the shotgun round to poor Harry Whittington. This was evidently Cheney's method of disowning the suggestion previously made by White House spokesman Scott McClellan that somehow Whittington was at fault for not following "the protocol [of] notifying the others that he was there." Instead, Cheney traces the causal path of the injury back to the gun in his hands. This was made more explicit in the full context of the admission:

Q: Right, and so you know all the procedures and how to maintain the proper line and distance between you and other hunters, and all that. So how, in your judgment, did this happen? Who — what caused this? What was the responsibility here?

A: Well, ultimately, I'm the guy who pulled the trigger that fired the round that hit Harry. And you can talk about all of the other conditions that existed at the time, but that's the bottom line. And there's no — it was not Harry's fault. You can't blame anybody else. I'm the guy who pulled the trigger and shot my friend. And I say that is something I'll never forget.

There's another allusive echo in Cheney's causal chain — it evokes the old NRA slogan, "Guns don't kill people; people kill people." Geoff Pullum suggests the new elaborated version: "Guns don't kill people; rounds don't even kill people; not even triggers. People do." Thus Cheney's manful assertion of responsibility absolves not only Harry Whittington but also his own shotgun, lest any gun-control types seek to exploit the incident for political gain. To this I can only add the line from British comedian Eddie Izzard: "Guns don't kill people; people kill people. But monkeys do too... if they've got a gun."

Posted by Benjamin Zimmer at 02:51 PM

Not my kind of Spanish

A colleague of mine has spent a not inconsiderable part of the last six weeks arm-wrestling by phone with one of these disgusting health insurance management contracting companies that seem to work as if their fiduciary duty to their stockholders is best performed by impeding customers' access to payment on their insurance claims. The claim in question involves a lung operation that had to be carried out on my colleague's daughter while she was in Argentina. And one of the myriad excuses he got over the phone (during six weeks of "We've had some trouble reading the fax" and "We're still awaiting authorization" and "We're having difficulties finding out the exchange rate") was from a Spanish-speaking woman on the staff who said she was having trouble reading the bills because she is a speaker of Mexican Spanish and the bills are in Argentinian Spanish. Just about the lamest language excuse I ever heard. I've had a look at the bills. They're impeccably clear, and they say things like "HABITACION INDIVIDUAL VISTA AL JARDIN" (single garden-view room), "HIDROCORTISONA 500MG" (500 milligrams of hydrocortisone), "DEXTROSA 5% EN AGUA X 500 ML" (500 milliliters of a 5% solution of dextrose in water), "LLAMADAS TELEFONICAS" (phone calls), "OXYCONTIN 10 MG" [O.K., it's time you tried some Spanish translation; see if you can guess what this item might have been] . . . In other words, absolutely nothing arises in the bills that could possibly be thought to vary between the Spanish of Mexico and Argentina. Do you think the employee really had a language barrier?

Posted by Geoffrey K. Pullum at 12:47 PM

Tannen in the New York Times

The NYT recently featured a Q & A by Claudia Dreifus with Deborah Tannen, motivated by the publication of "You're Wearing That? Understanding Mothers and Daughters in Conversation", which currently #5 on amazon.com's book-sales list. I'll wait until I've read the book to post anything more about it, but there were a couple of nice things in the interview.

First, the last exchange:

Q. How did you become a linguist?
A. I'd always loved words and talk. After my first marriage ended, I wanted to reclaim my intellectual life. I'd been teaching remedial writing at Lehman College in New York. With my newfound freedom, I registered for a summer institute in linguistics and fell in love with the discipline.
That summer, I'd found a calling. Linguistics combined my lifelong fascination with talk and my interest in people. Thirty-two years later, I can't imagine any other life.

The popularity of her books shows that many people share Tannen's tastes. In fact, the analysis of interpersonal interaction must be high on any list of human interests. Given that there are many other things as well that fascinate people about speech and language, and plenty of practical applications for the results of science and scholarship in this area, it's downright bizarre that the field of linguistics has ended up in such a marginal position. Deborah Tannen is one of the people who are doing something effective to improve the situation. And her books are interesting and fun to read. So three cheers for her and for her books.

In the first exchange in the interview, Tannen makes a striking point in a way that gives an interesting twist to a usage that we've observed several times before. The general pattern is a sort of hyperbolic lexicostatistical metonymy, in which the alleged (non) occurrence of words in a certain context is used to imply something about the relationship of the things they denote. A common version involves claiming that certain words or phrases always or never occur in the same sentence. The new twist (well, new to this weblog -- I'm sure it's hundreds if not thousands of years old) involves asserting that certain words are not used by certain people:

Q. Many of the women you've interviewed for your new book complain of mothers who criticize their appearance. Are they right to be annoyed?
A. "Right" and "wrong" aren't words a linguist uses. My job is to analyze conversations and discover why communications fail.

Prof. Tannen may be making a point about description vs. prescription, which is absolutely correct as a characterization of the attitudes of most linguists. And she also may be reinforcing her own generally non-judgmental approach to disagreements: she typically describes arguments in terms of different understandings of the goals of the interaction and the meaning of contributions to it, rather than in terms of rights and wrongs or even in terms of conflicting interests.

So to say that linguists doesn't use the words right and wrong is a striking way to emphasize analysis over evaluation. However, as a factual statement about the lexical habits of our profession, it's not entirely accurate. In fact, it's downright, well, uh, wrong. For example, the Linguist List archives have 1076 instances of the word "wrong", starting in Sept. 1991 (emphasis added):

I've been hanging back from the sound-change-teleology discussion because I have a feeling that the question of teleology in sound change -- regardless whether argued pro or con -- may be the wrong question.

and ending in Feb. 2006:

The authors strongly recommend a multilingual and multicultural perspective toward issues related to language and aging, contending that ''[A]s with monolingualism, the assumption of mono-culturalism in any society is wrong, and that applies to the elderly population as much as it does [to] other populations'' (p. 77).

That's about one "wrong" every five days over 15 years (which seems remarkably civil, now that I think about it).

Scanning a paper by Barbara Partee that I happen to be reading at the moment, I find at least one relevant use of the word "right":

But at this point we should probably bear in mind the “Janus-faced” nature of the genitives that we noted in section 5: for “pure” non-elliptical predicate genitives, it may not be right to call this a “genitive” relation at all; this is where the distinction between “genitive” and “possessive” may become important.

Amazon's "Search inside the book" tells me that Noam Chomsky's Minimalist Program has the word "wrong" on 12 pages, and mentions of "right" on 34 pages, at least some of which mean "correct" rather than "later in time and thus to the right in English orthography":

"... it will hold only for A-chains, not for A-chains. That conclusion seems plausible over a considerable range, and yields the right results in this case. Let us return now to the problem of binding-theoretic conditions at S-Structure. We found a weak ..."

Linguists mostly agree that there are not "right" and "wrong" ways to talk and write (though of course there are "standard" and "non-standard" patterns, and confusions and mistakes of many kinds as well). But linguists, like any other scholars or scientists, frequently discuss the "right" and "wrong" ways to analyze the phenomena they study. And even those who analyze argumentative discourse while trying not to take sides are likely to find reasons to use words like right and wrong in an evaluative sense from time to time.

Indeed, Prof. Tannen herself is a relatively frequent user of right and wrong: according to Amazon's "Search inside the book", in You Just Don't Understand there are 29 pages on which "wrong" occurs, and 69 pages on which "right" occurs. This is one right or wrong for each 3.6 pages (compared for example to one right or wrong for every 6.3 pages in Strunk & White's Elements of Style). For that matter, in the cited NYT interview we can find one additional use of each word:

The mother feels she's caring. The daughter feels criticized. They are both right.

So when mothers and daughters spend a lot time talking about personal matters, it gives them countless opportunities to say the wrong things to each other.

Now, these examples are not really inconsistent with what I think she meant, which is that her professional interest is in description and analysis, not moral evaluation or judgments about winners and losers. The first example ("they are both right") is an attempt to put moral evaluation aside; the second example ("countless opportunities to say the wrong things to each other") uses wrong to mean something like "ineffective in advancing the speaker's goals", or "leading to unnecessary conflict".

And yet...

Posted by Mark Liberman at 07:01 AM

February 15, 2006

Odium against "podium"

Based on my previous Olympic-y posts I've received two independent queries from readers wondering, "What's the deal with these Olympics people using podium as a verb?" It's also come up on the Usenet newsgroup alt.usage.english and has rankled several bloggers:

Dear Mr. Announcer,
Just because you are working at the snowboarding competition, it doesn't mean you can talk like a 16-year-old who's strung out on Red Bull and skipped school to spend her days in a halfpipe. "Podium" is a noun — not a verb. Please don't say "She can definitely podium" again.
(Montecore the Tiger)

These linguistic absurdities continue, aided and abetted by the network people. The latest is truly bizarre, from the olympic coverage:
"She was unable to podium."
Arrrrgh. Grrrrr. Comment unnecessary.
(New Spew)

You know you are getting older when you find new language inventions (such as 'verbing') very distracting. As a linguist, I try to be descriptive instead of prescriptive in my attitude about other people's language use but I admit that the use of podium ("I'm just so happy to podium in this race" when finishing in the top 3) as a verb during the Olympic coverage very distracting — it catches my attention every time.
(BJ's Musings & Meanderings)

When so many people are suddenly and simultaneously peeved by a particular usage, it's time for us language kibitzers to sit up and take notice.

As with many other usage peeves, those darn kids get the blame — this time it's those darn kids with their snowboards and weird slang. Here's the commentary of Bob Wolfley, sports columnist for the Milwaukee Journal-Sentinel:

There has been a horrible development at the Olympics and it has nothing to do with Michelle Kwan's withdrawal.
You're reluctant to blame it on the snowboarders, because you imagine these poor kids and their sport have been blamed for something or another their whole lives and they do talk in a language all their own.
However, it's undeniable this troubling development came out of the halfpipe competition.
A few of the competitors said they were hoping to "podium," as in win a medal. They used "podium," a noun, as a verb.
You should have seen this coming.
For years athletes and others have said they want to "medal" at the Olympics, yet another noun used as a verb.
It's a little step from there to "podium" as a verb.

Not surprisingly, it looks like we've got another case of the Recency Illusion on our hands, combined with its comrade-in-arms the Adolescent Illusion. Even if those young snowboarders in Turin/Torino are using podium as a verb, that doesn't mean they invented it or even popularized it. The Factiva newspaper database turns up Olympics-related examples going back to 1992:

Sydney Morning Herald, Feb. 21, 1992
Kirstie Marshall may have failed to gain a place at the Winter Olympics, but she's right up there with the stars when it comes to TV commentary. On Channel 9, she gave us the word "podiumed" as in, "She hasn't won an event this season, but has podiumed a couple of times". We suppose this means the person in question came either second or third.

Kirstie Marshall was competing at the Winter Olympics in Albertville, France in aerial skiing, which was then only a demonstration sport. (She would later gain notoriety in Australia as a member of the Legislative Assembly, when she was ejected from the Parliament chamber for public breastfeeding.) Marshall was a 22-year-old at the '92 Games, competing in a flashy new winter sport, so the lineage to the young snowboarders of today seems pretty clear. But in the meantime, the usage has spread to various other sporting competitions. As noted by Peter Duncanson on alt.usage.english, the verb podium is commonly used in Formula One racing in the UK and elsewhere. In racing, too, the usage has its detractors. Chris Zelkovich of the Toronto Star has complained about it on at least two occasions:

Toronto Star, Sep. 3, 2001
Global auto racing announcer Chris McClure did a fine job during yesterday's Indy, but let's hope he never again uses the word podium as a verb, as in "Carpentier podiumed in Detroit."

Toronto Star, July 14, 2003
Varsha uttered the ugliest "word" of the day when he referred to the number of drivers who have "podiumed" this year. Ugh!

So what makes the use of podium as a verb, whether in winter sports or auto racing, so "ugly" for some listeners? A lot of English speakers have a general predisposition against innovative verbing (it "weirds language," as Calvin of "Calvin & Hobbes" famously put it). Some acts of verbing are considered more offensive than others, however. In the case of podium, one problem might be that the relation between noun and verb is rather indirect in a metonymic sort of way: 'to podium' means 'to make it to the (medals) podium,' which itself is allusive for 'to win a medal.' The metonymic predecessor of podium noted by Bob Wolfey, medal, has had better success as a verb, perhaps because the sense of 'winning a medal' is more transparent to noncompetitors. (This isn't to say that medal as a verb doesn't bug some people — it gets commented on pretty much every Olympics, despite the fact that it can be found in sports pages going back to 1966.)

Another problem some might have with the verbing of  podium is its length. As Mark Liberman has noted, just about any monosyllabic noun can get verbed. Two-syllable examples also abound — particularly if, like medal, the second syllable is unstressed. But a three-syllable base form seems beyond the pale for many speakers, even if it ends in two unstressed syllables. Another trisyllabic noun-to-verb that rubs people the wrong way is leverage (attested since 1937), though in that case the peeve seems to be more directed against buzzwordy corporate-speak in general.

Even among frequent users of podium as a verb, its length may be a bit of a problem. Extending the word to four syllables by adding -ing seems relatively rare, as there are far more relevant examples of podium preceded by a modal or in the form podiumed. (I suspect the podium-gripers don't like the past-tense form very much, probably because the final consonant cluster /md/ sounds a bit awkward.)

Though Olympic coverage may help to mainstream this usage, for now it's still mostly an ingroup thing. But the cachet of athletic ingroupness has proven enticing in the past (cf. my bad) and the verb podium may spread in unforeseen directions, despite its critics. To quote a commenter in a French archery forum: "Bravo à tous les podiumers!"

[Update, 2/17: More on podiuming from the Sacramento Bee. Language Logger Geoff Nunberg notes another trisyllabic noun-to-verb form that gets people's goat: dialogue.]

[Update, 2/26: And now more from Jan Freeman in the Boston Globe.]

Posted by Benjamin Zimmer at 06:02 PM

Peter Ladefoged and Big Fierce Animals

The best Peter Ladefoged story ever has to be the one about his days as a graduate-student guinea pig in a phonetics lab in Edinburgh, long before the days of Institutional Review Boards and their dominion over research involving human subjects, when his mentors stuck large needles into his intercostal muscles to test Stetson's (I think it was Stetson's?) chest-pulse theory of the syllable. But although I heard the story from Peter and his wife Jenny, I probably have some of the details wrong, and I'm sure others have told it elsewhere in recent days, in the outpouring of respect, affection, and reminiscences that was set off by Peter's death. So I won't try to repeat that one here. I do have a small story that no one else would know about, though, from the visit Peter and Jenny made to Montana in the early 1990s so that Peter could accompany me on a field trip to record phonetic data on Montana Salish from the elders I work with. The visit was primarily a linguistic one, but the story isn't -- although it does suggest some of the hazards Peter encountered in his fieldwork in Africa.

Peter and Jenny stayed with us at our place way back in the woods of northwestern Montana, one mountain range east of the Flathead Reservation and just west of the huge Bob Marshall wilderness. It's a remote and fairly wild location, and we are almost the only people in our part of the valley who haven't clear-cut most of our land, so our woods have become a much-traveled wildlife corridor for (among other critters) black bears, grizzlies, and mountain lions moving from the Swan Mountains on the eastern side of the valley to the Mission Mountains on the western side. When Peter and I weren't over on the reservation working with elders, he and Jenny were enthusiastic about exploring, so they came along when I went knapweeding (they laughed a lot at the idea of weeding the woods) and gathering wild raspberries, and then when I went through thick brush to a small gooseberry patch so I could get berries for muffins. As we pushed through the tangle, I remarked that the reason I was deliberately making so much noise was to alert any nearby bears or lions to our presence, so they'd get out of our way. ``Hm!'', Peter observed. ``We would never do that in Africa.'' I asked why not. ``Well,'' he said, ``There, the lions [and other dangerous animals] wouldn't get out of the way.'' A different vision of wildlife. And although there are lions and grizzlies and black bears on the Flathead Reservation too, they are mostly confined to the still-woodsy streamsides, and even the occasional story of a mountain lion that wandered into town doesn't raise the possibility of danger from wildlife as a feature of fieldwork. Peter and I stayed at the cabin to work and so missed the excitement on the day that -- if I'm remembering correctly -- Jenny went on a hike up the steep Swans with Rich and a few others; I'm not sure she even saw the two grizzly cubs that came bouncing down a slope toward the trail, but she did turn and run with the others when the person in front shouted, ``Run back! Grizzlies!'' Unlike a lone bear or mountain lion, a mama grizzly would be inclined to hang around if she discovered humans near her cubs and couldn't get the cubs away quietly (and I've noticed that bear cubs tend not to want to go anywhere quietly).

Posted by Sally Thomason at 05:38 PM

Jane Austen and the Super Bowl

As far as I know, Jane Austen was not a Pittsburgh Steelers fan, so that's not why I link her name with the 2006 Super Bowl. Instead, it's all about me me me, and specifically about my current focus on Italian, which makes sense because Rich and I are spending a month in Trento, Italy, admiring spectacular views and even getting some work done. So I am trying to reduce my brimful cup of ignorance of Italian. Mostly this effort involves things like reading all signs and notices that I run across, hurling barely intelligible Italian sentences at shopkeepers and others who (unfortunately for me) then turn out to speak quite good English, and reading, verrrry slowly and with much dictionary consultation, Rex Stout's mystery Some Buried Caesar (boringly titled La Guardia al Toro, `The Watch Over the Bull', or maybe `The Bull Guard', in Italian). But Jane Austen and the Super Bowl have provided further entertainment along the way.

First, Austen. My daughter Lucy, who always seems to find such tidbits, reports that an Italian phrase in Austen's Emma was incorrect in the original manuscript and was then corrected by editors. The phrase, uttered by the deliciously awful Mrs. Elton, is in this passage: ``I must do my caro sposo the justice to say that he need not be ashamed of his friend. Knightley is quite the gentleman.'' But apparently Austen actually wrote cara sposo, with a gender mismatch (feminine `dear', but masculine `spouse'). According to Lucy, `titanic battles are waging between the modern editors of Jane Austen who believe in following the original manuscript, and the modern editors of Jane Austen who believe in following the first edition of her novels that made her wildly popular.' The obvious question is whether Austen intended to get the phrase wrong; that would be perfectly in keeping with her portrayal of Mrs. Elton's pretentious chatter....or would erroneous Italian be too subtle for an early 19th-century audience? Maybe someone out there knows?

Now, about Italian and the Super Bowl. Although Jane Austen could not have been a Steelers fan, I am, and have been for the last thirty years or so, and for much of that time I have been fuming about the team's loss the last time they were in the big game. So I couldn't stand the thought of not seeing this year's Super Bowl, but there were obstacles, even aside from the fact that it started at 12:30 A.M. Trento time. I finally tracked down the only hotel in town that seems to have the SKY sports network; a friendly desk clerk checked her own personal SKY schedule at home to make sure they would indeed be broadcasting the game; she arranged with the bartender for me to have the run of the bar through the wee and not-so-wee hours of the morning; and so on Feb. 5 I checked into the hotel, napped, and then settled down to watch the game. The commentary was in Italian, which is what I expected, given the Italians' zeal for, and skill in, dubbing American (and other) films and TV shows and translating Rex Stout and Jane Austen and just about everyone else you've ever heard of into Italian. This was good: since I don't understand much Italian, I was unable to process the uncomplimentary comments the announcers presumably made about the Steelers when they played badly, which their offense did, for instance, for the entire first quarter of the game. But some basic phrases were easy to understand: primo e dieci `first and 10', secondo e sei `second and six', and the like. The best phrase of the entire game, though, was their comment on the disastrous interception thrown by the other team's quarterback late in the game: ``Intercetto doloroso!'', they hollered, or rather uttered urbanely. (The Steelers' quarterback, the generally terrific Big Ben Roethlisberger, threw two interceptions, not just one, but since the Steelers won -- Go Stillers!, as Pittsburghers say -- neither of those turned out to be all that doloroso.)

Posted by Sally Thomason at 05:34 PM

Doomed to mediocrity by accent?

The following remarkable statement, pointed out to me by Jan Freeman, was reported in the Boston Globe from an affidavit given by Mrs Priscilla Matterazzo to State Police Detective Lieutenant James Connolly:

Priscilla Matterazzo told Connolly that her daughter returned to Massachusetts with her husband and baby in part because, the affidavit said, "Neil would never amount to anything in England because of his accent: He was obviously a coal miner's son from a working class background."

Here's the background, which relates to a case that is in the news a lot in the Boston area. In 1999 a young man named Neil Entwistle met a young American woman named Rachel Souza at the University of York (my undergraduate alma mater), where she was spending a year abroad in Britain. In 2003 they married, and in April 2005 they had a baby, Lillian. They moved to Massachusetts. In early January 2006 they moved into a rented colonial-style home in Hopkinton, a suburb of Boston. But Neil left to return to his family in the English midlands early in the morning of January 21. That evening friends and family arrived at the Hopkinton house for a dinner party and found no one answering the door. On January 22, a search of the house revealed Rachel and 9-month-old daughter Lillian dead in bed under an obscuring pile of blankets. They had been shot. On January 25, Massachusetts detectives flew to England to investigate; on February 9, the British police arrested Neil; and today he has been returned to the Boston area. Priscilla Matterazzo is Rachel Souza's mother. The quote above is from an affidavit in the case against Entwistle.

Our interest here is the claim about language. As Jan Freeman comments, it is really difficult "to imagine a 27-year-old American with a university degree in electrical engineering thinking, much less telling his mother-in-law, that his accent doomed him to mediocrity in his home country." I certainly agree. There is some accent prejudice in the USA, but not quite to this extent.

I wish I could dismiss it as nonsense to say that having an accent that marks you out as being from a working-class home in Worksop, Nottinghamshire (near Sherwood Forest, in the middle of England) might alter your employment prospects in a downward direction. But it is undeniable that if you elide initial [h] and pronounce putt the same way you pronounce put, speakers of British English will instantly draw a few conclusions about your likely intelligence level, reliability, morals, etc. Such things form the subject matter of sociolinguistics. The potency of sociolinguistic facts should not be underestimated in any country or culture, but the effects of both region and class are particularly well known and well studied in the case of Britain.

It may sound like a ridiculous hard-luck story that should be lent no credence whatsoever (and certainly, I am not suggesting for a moment that it has any relevance to the issue of the killings); but it is not completely impossible to imagine that Entwistle believed this part of what he told his mother-in-law, or even that he might have had some justification for believing it.

Posted by Geoffrey K. Pullum at 03:36 PM

A quantifier for every season

Fernando Pereira recently emailed the headline of a BusinessWeek story, "A Search Engine for Every Subject", with the comment:

1) Google's replacement, or a 2) swarm of specialized engines? You have to read past the first paragraph to realize that they mean 2).

In the language of the old-fashioned predicate-calculus treatment of quantifiers, expressed in "heavy English" rather than pretend-mathematics, this is the difference between

1) There exists a search engine x such that for every subject y, x is for y.
2) For every subject y, there exists a search engine x such that x is for y.

As far as I can tell, given that I'm biased by reading Fernando's note, I agree with his judgment that 1) is more natural than 2) as an interpretation of the BusinessWeek headline. This makes sense, given that 1) preserves the surface order of the operators. However, there are plenty of examples, some of them very common, that go the way of 2). There's the phrasal template "if I had a X for every Y":

If I had a nickel for every time that ...

and hundreds of thousands, if not millions, of ("an X in every Y") variants of

A chicken in every pot.

In fact, in these cases there's the additional implicature that all the nickels or chickens are disjoint, which presumably does not apply in e.g.

She has a finger in every pie.

in which different figurative fingers may be involved in different figurative pies, but there might well be more pies than fingers.

Of course, there are plenty of examples that would require interpretations analogous to 1):

We have entered into a unique partnership with Lumisource, Inc to bring you a must have toy for every weather enthusiast.
Here is a staple for every touring paddler.
The two obvious secrets of every service business
Five Ways to Profit from Every Meeting
America, A Home for Every Culture

In structures of this general sort, each inevitably (?) follows pattern 2):

The doctor or hospital will be paid a fee for each service rendered to the patient.
The new Circular requires an agency to make a formal public announcement for each competition.

often emphasized with modifiers like "separate" or "unique":

Do a separate worksheet for each student.
An extraction of the chest spot pattern allows the generation of a unique biometrical identifier for each penguin.

while all usually follows pattern 1):

A man for all seasons.
This week I'm happy to announce a solution for all other Visual Studio users!

often emphasized with modifiers like "single" or "complete":

A single solution for all your document needs
A complete announcement for all openings is available on the West Human Resources Web.

though all also sometimes follows pattern 1), especially when the head noun in the first noun phrase is plural:

No need to carry separate auto power adapters for all your portable electronics.
The foundation maintains separate accounts for all restricted funds.

The crude method of checking Google counts for a few particular sequences generally supports this picture for each and all, while every seems sometimes to be each-like and sometimes all-like:

  "a solution for __" "a separate solution for __" "a single solution for __"


  "an application for __" "a separate application for __" "a single application for __"


  "a program for __" "a separate program for __" "a single program for __"


  "an answer for __" "a separate answer for __" "a single answer for __"

[Update: David Hargreaves writes

"Life has a paddle for every behind."

My hallway survey leads me to conclude this reading is preferred:

For every behind y, there exists a paddle x such that x is for y.

One interpretation went as follows: since every bottom is unique, there has to be an anatomically correct paddle that fits it.

I first heard musician Wynton Marsalis use the phrase, but my Google search came up with "author unknown."


Posted by Mark Liberman at 08:11 AM

"Torino" rolls along

Confusion still reigns over NBC's decision to refer to the Italian city hosting the Olympics as Torino rather than the traditional English name of Turin. As noted on the copy-editing blog A Capital Idea, many American newspapers are following their style guides and using Turin rather than Torino. (USA Today is one notable exception.) On the other hand, the International Olympic Committee has gone along with NBC's choice of Torino, displaying it on the English-language version of its official site, though even the IOC isn't very consistent: the logo and titles all say "Torino 2006," but much of the text uses Turin.

So why did NBC opt to keep the Italian name of Torino (while not, for instance, calling Italy Italia)? It's been widely reported that the choice was made in 1999 by NBC sports chairman Dick Ebersol on a trip to the city shortly after it was awarded the 2006 Games. Here are two direct quotes from Ebersol about his executive decision (emphasis mine):

"And we'll call them [the Games] Torino. It rolls off the lips a lot smoother than Turin." (Boston Globe, 8/27/04)

"I was just swept away with how that sounded: Torino. It just rolls off your mouth. It talks about a wonderful part of the world. It has a romanticism to it. And I just thought that that was a wonderful way to name these games." (Hartford Courant, 1/26/06)

When Mike McCarley, a spokesman for NBC Sports, was asked about Torino, he cleared up the lip/mouth confusion of his boss by choosing the more common folk-phonetic idiom:

"Dick made the decision in '99 because of the way Torino rolls off the tongue. It's Italian. It sounds Italian." (Milwaukee Journal Sentinel, 1/21/06)

Regardless of which articulatory organ is imagined to have words rolling off of it, the perceived euphony of Italian (and Romance languages in general) is an old stereotype among Anglophones, the flip side to descriptions of Germanic or Semitic languages as "harsh" or "guttural." In the case of Torino, we might imagine that the "rolling" refers to the trilled r, though that's clearly not sufficient as an indicator of linguistic sweetness — even a so-called "guttural" language like Arabic has voiced alveolar trills. There's must be something about prosodically lilting vowel-consonant alternation, particularly with a vowel in word-final position, that strikes many ears as pleasing. A product naming consultant's blog entry (which naively claims to give the "unwritten rules for translating foreign city names") characterizes the word-final vowels of Italian place names like Torino as both "flamboyant" and "appetizing"!

Anthropologists might talk about Ebersol's choice of Torino over Turin in terms of "exoticization of the Other" or "the fetishization of authenticity." But Italians themselves are taking full advantage of the "romanticism" that Ebersol detects in the name Torino. As a Dec. 8, 2005 Wall Street Journal article explains, local boosters see the use of Torino as an opportunity to "rebrand" the economically depressed city, and they successfully lobbied the IOC to emulate Ebersol's decision to use Torino instead of Turin as the city's official name in all languages, including English:

The name choice is about "taking back our identity," says Giuseppe Gattino, a spokesman for the city's Olympics organizing committee, officially dubbed Torino 2006. In Italy, it also represents a small victory for the national language amid concern about the growing use of English words such as "email" and "weekend," despite the existence of perfectly fine Italian equivalents.

Another reason is more prosaic: Singsong "Torino," with its classically Italian rolling R and vowel ending, sounds better than the hard, Anglo-Saxon "Turin" — and might help soften the city's scruffy, industrial reputation. "I'm convinced that part of the reason people love cappuccino is for the joy of pronouncing the word," says Beppe Severgnini, a columnist for Italian daily newspaper Il Corriere della Sera. "The word Torino is less sexy, but it's better than Turin."

Ah, so this explains why I don't drink cappuccino — I experience no "joy in pronouncing the word"! But besides providing folk-phonetic perceptions of Torino vs. Turin (does the latter really sound "hard"?), the Wall Street Journal article also delves into some of the actual history of the city's naming:

Italians cringe at English names for their cities, such as Florence for Firenze and Leghorn for Livorno. The irony is that Turin isn't an anglicized form of Torino at all. The area around the city was first settled by Celtic tribes in the third century B.C., and the name Turin derives from the Celtic word "tau" for mountains. Torino is the Italian derivation, and happens to mean "little bull." The city was known as Turin when it became the first capital of the Kingdom of Italy in 1861.

David Kertzer, a professor of anthropology and Italian studies at Brown University, notes that, in the fading dialect of the local Piedmont region, the city is still known as Turin, with the accent on the second syllable. Historically, he says, the region "is closer to France than Italy linguistically and geographically."

So there's some fodder for those of you who would like to buck the tide and stick to Turin: it has a longer history than Torino and even has the seal of approval from speakers of Piemontese (which is actually not a "dialect" but a language distinct from both Italian and French). But if you're fetishizing authenticity and want to pronounce Turin the Piemontese way, give it final-syllable stress. Ben Sadock thinks that he's heard the name pronounced tuRIN by English speakers before the Olympics (as in discussions of the famous shroud), but American Heritage and Random House corroborate my sense that the preferred pronunciation in American English is TURin. And that's the pronunciation I'll continue to use, as I'm immune to the charms of Torino (and cappuccino).

(Looking ahead to the 2008 Summer Olympics, is there any way to nip "Beizhing" in the bud?)

[Update #1: More on Turin and Pie(d)montese from Ben Sadock here.]

[Update #2: Matt Weingarden, aka Mr. Fine Wine, suggests another reason why Americans might romanticize Torino: Starsky & Hutch drove a Ford Gran Torino (a muscle car manufactured from 1968 to 1976).]

Posted by Benjamin Zimmer at 12:57 AM

February 14, 2006

The fujigmotic chigger of scholarship

The Feb. 11 Economist has a story about taxonomic marketing, leading with the fact that a Canadian online casino made the winning bid of $650,000 in an internet auction for the name of a newly-discovered Bolivian monkey. (Which therefore became Callicebus aureipalatii, or the Golden Palace titi. The wikipedia entry calls it the "GoldenPalace.com Monkey", but I don't see any .com in aureipalatii. And titi is too nice a common name to pass over in favor of monkey, it seems to me.)

The Economist would like to suggest that the Bolivians who sold off naming rights to their discovery were on to something. ... Notwithstanding recent discoveries in New Guinea ... few biologists these days have flashy mammals and birds to hawk around. But a little imagination might find sponsors for lesser creatures. For, while a wealthy airline (if any still exist) might aspire to a Papuan bird of paradise, its low-cost confrere could consider something a bit more within its budget—a butterfly, perhaps? And which building society would not be seriously tempted by its own bee?

Dismissing concerns that this would be somehow infra dig, the writer observes that

Last year, for example, America's president, vice president and defence secretary each got a beetle (Agathidium bushi, A. cheneyi, A. rumsfeldi) courtesy of two Republican coleopterists. Admittedly, the beetles in question eat slime mould, which caused a few titters among taxonomists of a Democrat persuasion, but it is clearly an act of gross speciesism to criticise the dining habits of other organisms, so the titters were sotto voce. And it is not only politicians who are benefiting. Sting, a musician, has his own tree frog (Hyla stingi), and several spiders also bear the names of entertainers (Calponia harrisonfordi, Pachygnatha zappa) who clearly have taxonomists as fans.

The best fun is saved for the end:

Detractors of such horrid commercialisation there will no doubt be. But they might consider that taxonomists have been amusing themselves quietly for years, as names such as Colon rectum (a beetle), Ba humbugi (a snail), Oedipus complex (a salamander) and Ytu brutus (a beetle) attest. Besides, how much disrepute could commerce really bring to the discipline that brought the world Trombicula fujigmo, a mite whose name is an acronym for “fuck you Jack, I got my orders.”

I wonder if this means that The Economist can no longer be sold at Walmart?

Anyhow, Trombicula may be mites, but their larvae are chiggers, so this is potentially serious stuff. However, I'm having a little trouble verifying this excellent story. Trombicula fujigmo figures in a list of "Funny or Curious Zoological Names" attributed to "Arnold S. Menke ... with additions by Neal L. Evenhuis", and reprinted here, on a site otherwise devoted to "Research on Chalcidoid Systematics". However, the citation for the list says that it has been "Reprinted from BOGUS Volumino Negatori Doso, pages 24-27. (April Fool's Day 1993)", which is not a citation to inspire confidence.

There are three Trombicula species in the list, as follows:

Trombicula doremi Brennan and Beck, 1955 (musical chigger number one)
Trombicula fasolla Brennan and Beck, 1955 (musical chigger number two)
Trombicula fujigmo Philip and Tucker, 1950 (another chigger--ask any WWII U.S. veteran what "fujigmo" stands for)

Google Scholar can't yet find a paper indexed by the terms {Philip Tucker Trombicula}, but of course 1950 was quite a while ago.

[Update: Blake Stacey writes

A notable entry in this genre is, of course, Strigiphilus garylarsoni, a species of biting louse found only on owls. According to The Prehistory of the Far Side, S. garylarsoni was named for the well-known cartoonist by Dale H. Clayton, head of U Chicago's evolutionary biology committee.

And, naturally, it has its own Wikipedia article:


Gary Larson fans may be surprised to find him put into the same genre as an on-line casino and an impolite WWII acronym, but such is fame, I guess.]

Posted by Mark Liberman at 04:42 PM

The allusive butterfly of love

Happy Valentine's Day to all Language Log readers. I had a good time today, during a small part of the morning of this day of romance, correcting the proofs for a short book review to be published in The Mathematical Intelligencer. There were two hilarious aspects of it. The less staggering of the two concerned what happened to the LaTeX source file I submitted. They printed it out on paper and got a copy-editor to mark it up in pencil ("umlaut" written beside my umlauts; "Greek delta" written beside my Greek delta; and so on); and then the typesetter had started over, composing the pages without using LaTeX! So the final look of the page was amateurish, much worse than what I submitted. I could have done the typesetting for them myself and it would have been better. Apparently the editors cannot get the current printer to switch to using LaTeX for production.

But the more staggering thing I found was that a word I had used correctly was altered to a different word with a different meaning because the copy editor thought I couldn't spell and she knew better. (It was a she. I found out her name. She is on my ON NOTICE board now. She will never be my valentine.) You see, I had made a reference to to "increasingly allusive journal articles" in generative linguistics, meaning that many papers are contenting themselves with indirect hints and allusions rather than explicit definitions. And the copy-editor changed "allusive" to "elusive" (the nerve!), and compounded the felony by writing a marginal note to me:

There is no English word "allusive" that means hard to pin down.

Indeed there is not; but I didn't mean hard to pin down, I meant tending to allude rather than state, so that's what I said. And "elusive" means hard to catch, as said of, for example, a butterfly (I guess they are easy enough to pin down, but you have to catch them and kill them first).

Well, I had a good laugh at the combined editorial arrogance and lexicographical ignorance of the editor. "Allusive" is of course in all dictionaries designed for grownups, as Googling it will rapidly reveal to you; I haven't yet tracked down a dictionary or thesaurus so puny that "allusive" is not in it. I was definitely tempted to scrawl on the proofs in red, "Do you realize who I am? Do you realize that I work for the Language Log corporation, which could crush you like a bug?"; but I didn't. I decided, as Superman so often had the self-control to do, not to reveal my superpowers merely out of pique. I just wrote "STET", the Latin word (meaning "let it stand") that authors use to tell a bossy copy-editor "CHANGE IT BACK TO WHAT I SAID IN THE FIRST PLACE".

Footnote: There are in fact three words in English pronounced almost the same, or close enough for them to be confused: allusive (compare the verb allude and the noun allusion), elusive (compare the verb elude), and illusive (compare the noun illusion). The meanings are:

  • Allusive: containing or characterized by indirect references.
  • Elusive: (1) Tending to elude capture, perception, comprehension, or memory; (2) Difficult to define or describe.
  • Illusive: Illusory, having the property of being an illusion.

This is a pretty disastrous temptation to confusion (worse than having the almost identical-sounding verbs affect and effect with different meanings, for example). So be careful. For you word-lovers, the logophilic butterfly hunters among you, it's a lexicographical jungle out there.

Posted by Geoffrey K. Pullum at 12:08 PM

February 13, 2006

Feeling all Olympic-y

The Winter Olympics is looking more and more like the trendy X Games, with new sports like snowboarding contributing to the "extreme makeover" of the Olympic Games. The English language is getting an extreme makeover too, as anyone who has heard 19-year-old snowboarding phenom Shaun White, aka "The Flying Tomato," can attest. At his news conference after winning the gold medal for the men's halfpipe event, White displayed a typical array of snowboarder slang (a subcultural offshoot of West Coast surfing and skateboarding lingo): gnarly, stoked, pumped, amped, and so forth. A more innovative usage appeared in an offhand comment he made between qualifying rounds, admitting his nervousness after an unimpressive first run. Three media sources gave slightly different versions of White's overheard remark:

"I'm feeling all Olympic-y," he confessed to one of his Burton snowboards support crew. (Los Angeles Times)

Standing at the base of the mountain, he said he felt "all weird and Olympic-y." (New York Times)

"I wasn't thinking straight," White told Finch before jumping on a chair lift for his second qualifying run. "I got all Olympic-y. I got that out of the way." (Yahoo Sports)

Regardless of White's exact wording, he apparently used "all Olympic-y" to mean 'suddenly overwhelmed by the experience of participating in the Olympics.' Once White overcame that anxious feeling of Olympic-iness, he was able to relax for his second qualifying run and win the gold.

White combined the use of two forms common in the casual speech of young American speakers: the intensifier all and the productive suffix -y. Intensive all should not be confused with quotative all ("So I'm all, 'Huh?'"), though both of these usages have been investigated by the "Changing All" project at the Stanford Humanities Lab. (See Arnold Zwicky's discussion of the project here and here.) The intensive usage is nothing new, despite what the Recency Illusion might lead you to believe. It's easy enough to find 19th-century examples of intensive all, particularly in representations of Scots and other British dialects. And it turns out that intensive all has long been attached to adjectives ending in -y describing some sort of emotional or physical state. Here are two early dialectal examples I found from a full-text search on Oxford English Dictionary citations:

1866 THORNBURY Greatheart lviii, I felt all whizzy and sleepy like. (s.v. whizzy)

Leeds Loiners' Comic Olmenac 24, I went all wimley-wamley e me head. (s.v. wimbly-wambly)

The pattern get/go/feel all X-y was extended in the 20th century to a wide variety of forms, as in these three examples from well-known novels:

1922 JOYCE Ulysses 445 Eat it and get all pigsticky.

1932 S. GIBBONS Cold Comfort Farm v. 71 She will only go and keep a tea-room in Brighton and go all arty-and-crafty about the feet and waist.

1934 A. CHRISTIE Murder on Orient Express I. vi. 59 This is where I'm supposed to go all goose-fleshy down the back.

I'm afraid I don't have a clue what Joyce might have meant by "get all pigsticky," but Stella Gibbons' "go all arty-and-crafty" and Agatha Christie's "go all goose-fleshy" are transparent enough.

Beginning in 1997 the productivity of the suffix -y got a tremendous boost by the television show "Buffy the Vampire Slayer," as meticulously detailed by Michael Adams in his book Slayer Slang. (Followers of the truthiness wars will remember Adams as the wordanista who dared to define "truthiness" without reference to Stephen Colbert, instead relying on the Buffyesque formulation of "truthy, not facty.") Many of the examples of -y usage cited in Adams' book also make use of intensive all:

"Oh, so who are we to be all judgy?"

"And none of us look all passiony murdery, so we're probably safe here."

"So, you gonna say good-bye this time, or just split all secret agenty like last time?"

"Wanna come and get all unwindy?"

"Darla's cute until she turns all vampiry."

As the above lines demonstrate, the "Buffy" writers felt free to attach -y to just about anything (more complex examples given by Adams include twelve-steppy, out-of-the-loopy, and stay-iny). Context is sometimes necessary to establish the exact semantic relation between the base form and the suffixed form, but the use of intensive all at least helps establish that a particular -y innovation refers to a mental, emotional, or physical state of being.

Context is also key in the case of Shaun White's comment. Out of context, one might assume that "feeling all Olympic-y" is an emotionally positive state of being, something like "feeling all warm and fuzzy" or "feeling all gooey (inside)." If White had used the expression during the opening ceremony or on the medal stand (as well he might), then the expression could indeed have had this positive sense. But coming immediately after his shaky qualifying run, the usage was properly understood by his snowboarding colleagues and other overhearers as referring to a state of heightened anxiety or "stage fright" that can afflict Olympic athletes at crucial moments.

By the way, White wasn't the only American Olympian to use the -y suffix in an innovative fashion this past weekend. Outspoken figure skater Johnny Weir used the word princessy in comments to the Associated Press (offering an implicit critique of athletic norms of gender and sexuality):

"I am very princessy as far as travel is concerned and having a nice room and things like that. Sorry to say 'princessy,'" he added, laughing, "but that's what we do."

Posted by Benjamin Zimmer at 11:50 PM

Presenting linguistic evidence in court

[This is a guest post by Roger Shuy, Language Log's forensic linguistics correspondent.]

Here's a novel idea. Why not use experts to do experts' work?

Language Log has done a lot to show how reporters don't quite get it right when they quote what their sources say (see here and here and here and lots of earlier posts). We might expect better of them, but the sad truth is that they are no worse than our courts when it comes to reducing the spoken word to writing.

We all know that lawyers try their cases in the courtroom using spoken language. But after the trial ends, only the written form of the oral exchanges is preserved in memory. This written version is what is used as a basis for settlement, sentencing, and appeals. In the courtroom, once speech is written down, it freezes, assumes authority and becomes resistant to change. In his excellent book, Legal Language, Peter Tiersma tells us that it wasn't until the end of the eighteenth century that judges began to issue written opinions systematically. He also points out that what today's American appellate judges say is now virtually irrelevant, that all that really matters is what they write and that this development has led to the doctrine of precedent, among other things.

Even if the legal community's preference for the written word is thought to have advantages, when the major evidence in a criminal case is tape-recorded spoken language, a number of difficult problems can make things ugly. For example, how can these undercover conversations between law enforcement officers and suspects be best communicated to a jury? Playing the tapes to a jury sounds okay, except for a few serious problems, including the following.

1. Task unfamiliarity. The jurors are not used to listening to tapes this way. Speech goes by very fast and they can easily miss the crucial things that expert linguists are trained to hear. Jurors often get to listen to the tapes only once during trial, although they can take into the jury deliberation room if they ask for them.

2. Quantity. In some cases there can be many, many tapes to hear. In the case of John Z. DeLorean, the automaker tried for narcotics conspiracy, there were 63 hours of conversation over a period of about a year. Simply following the convoluted development of themes and topics discussed is painstaking for even a trained analyst, much less for a jury. The BCCI money laundering case had over a thousand hours of taped conversation between many speakers using different varieties of foreign sounding English. The sheer quantity of tapes often leads the prosecution to excerpt only those portions of the conversation that best suit their case, allegedly making the jury's task simpler. True, the defense is allowed to use those deleted and sometimes redacted portions although they often don't know how to do this. In addition, sometimes the undercover agent has control of when to turn the tape on and off, making it impossible to know what might have happened in the off-tape conversations.

3. Quality. The quality of the tape recordings sometimes makes listening to them daunting for jurors. Often conversations are recorded in noisy restaurants or inside automobiles with the radio blaring. Telephone calls are usually a bit easier to hear, but not always. The task of just listening, much less understanding, is difficult for the untrained ear.

4. Transcripts. Okay, if tapes are so hard to hear, why not provide a written transcript? Trying to accommodate these problems, judges in most jurisdictions permit a transcript of the conversation to be provided as an aid to the jury, introducing still another problem. Although transcripts (when accurate) can indeed be helpful in keeping track of who said what to whom and when, my experience shows that invariably the persons making the transcript produce many errors. Among other things, they sometimes attribute turns of talk to the wrong speakers, leave out words, sentences and sometimes even longer passages, miss-hear a lot of what is said, and make creative but erroneous interpretations. In addition, they often guess at passages that the audiotapes do not and cannot indicate, such as "suspect counts the money," or "opens desk drawer." Nor do they include important timing information that would show that five seconds or even ten minutes elapse with no speech audible on the tape. Even worse, they sometimes correct the grammar of the police and delete their foul language while leaving these intact for the suspects.

The way the prosecution creates these transcripts is perhaps the most depressing thing of all. To my knowledge, no explicit procedure has ever been described but there is every reason to believe that many transcripts are made by secretaries, whose job skills may include taking dictation. They listen to the tapes and write down what they think they hear. Even when court reporters are used, the product is often inaccurate. After the initial transcript is made, the agent who participated in the undercover investigation reviews the transcript and corrects it to reflect what he thought was said. It is common for this to favor the prosecution, of course.

Why do such practices continue? Largely because not enough outrage and criticism has moved the courts to understand that important aspects of language are omitted when spoken language is written down. Audiotapes don't contain the information found on videotapes, which are better in indicating where the speakers are located in relationship to each other and who was present when the talk took place. Is it possible that the suspects couldn't even hear the agents' representations of illegality? But even audiotapes contain much more information than written transcripts.

I don't have to tell you that there are several serious problems with the way transcripts are used in our court system. For one thing, based on my experience the final transcripts most frequently err on the side of making the suspect look guiltier than he might be. Why? Goodwin's notion of professional vision springs to mind. Most our jobs in life pass along to us the professional vision of our group — how to think and act like it. Police have a professional vision that leads them to spot, even anticipate, aggressive behavior because this can be helpful to them in their work on the street. The professional vision of lawyers is to advocate. The professional vision of prosecutors is that suspects must be guilty because it's their job to put them in prison. Defense lawyers have the opposite vision because it's their task to be sure that their clients go free. And to be fair, I should mention that transcripts prepared by the defense can be equally biased and wrong.

When the courts accept transcripts made by people whose jobs it is to prosecute, objectivity, neutrality and accuracy suffer. To combat advocacy, the courts need objective and knowledgeable third parties to make the transcripts. What is needed is a more neutral, objective vision. Does the job of the linguist spring to mind here?

It seems clear to me that linguists could be extremely helpful to our legal processes, as long as they aren't sucked into the advocacy of whichever side they work for. The obvious idea is to have linguists working for the court, not as expert witnesses for either the prosecution or the defense.

What useful linguist-produced transcripts for the jury to use might look like is another matter. They can't contain phonetic notations, for example, because they have to be easy for laypersons to read easily. That's a topic for another post on another occasion.

— Roger Shuy

Posted by Geoffrey K. Pullum at 10:24 PM

Do you have a language barrier?

From the Palo Alto Daily News of 2/11/06 (p. 24), the first item on the police blotter for the city of Atherton for the preceding Thursday, in its entirety:

El Camino Real, 9:11 a.m.:  Someone with a language barrier called police to report a medical emergency.  The person then drove to the Stanford Medical Center.

What does having a language barrier mean here?  Ordinarily, a "language barrier" is created when people don't share a language to interact in.  Maybe that's what happened here, but then how did the staffer who answered the phone know that the caller was reporting, or trying to report, a MEDICAL emergency?  (Note also that the police attribute the communication problem to the caller, not jointly to the caller and their staffer.)

Alternatively, maybe the caller had a speech defect severe enough to make understanding difficult (though not impossible).  If so, "language barrier" is an odd expression to use, though in this situation it would at least be fair to attribute the source of the problem to the caller.

Why, in fact, does the entry mention communication difficulties at all?  Why not just say, "Someone called police to report a medical emergency and then drove to the Stanford Medical Center"?

Sometimes, calculating implicatures just makes your head hurt.

[zwicky at-sign csli period stanford period edu]

Posted by Arnold Zwicky at 01:21 PM

February 12, 2006

Hanging up a participle without getting a subject

Read the following paragraph from Jonathan Kellerman's 1997 novel The Clinic (Bantam Books; see page 23); the narrator, Alex Delaware, is trying to get some information about a controversial university committee, and has phoned the office of a dean of students at UCLA, but is being blown off by the dean's secretary:

When I told the dean's secretary what I was after, her voice closed up like a fat-laden artery and she said she'd get back to me. Hanging up without getting my number, I phoned Milo again.

Does that strike you as linguistically bizarre, or at least striking? Am I alone out here? Hello?

I don't know what The Fellowship of the Predicative Adjunct will think, but I find it such an extreme case of a dangling participle that I have to assume some kind of or word-processing error. The second sentence came at me like a smack in the face. The reason may have to do with syntax, semantics, pragmatics, or etiquette — I'm actually not quite sure, because the whole topic is a bit of a mystery to me. But the problem of understanding it is that we get the wrong subject for the verb hang. You see, Alex is not trying to get a number [people keep emailing me to suggest that interpretation, but it does not fit], and he is not the one who first hangs up. The dean's secretary knows the dean will not want to talk to Alex on the subject he is inquiring about, so she says she'll get back to Alex, but she hangs up without taking down a number that would make that possible. That's the point. And the syntax just won't support the point.

This way of putting things would have been fine, given the intended sense:

When I told the dean's secretary what I was after, . . . she said she'd get back to me, hanging up without getting my number. I phoned Milo again.

That correctly identifies the dean's secretary as the one who hung up without getting Alex's number. And so would this:

When I told the dean's secretary what I was after, . . . she said she'd get back to me. Then she hung up without getting my number. I phoned Milo again.

And if you insist on having the second sentence begin with "Hanging up..." but you want to get the right subject, you could do something like this:

When I told the dean's secretary what I was after, . . . she said she'd get back to me. Hanging up without getting my number, she left me with nothing I could do but phone Milo again.

If the subject of "Hanging up..." really was intended to be (implausibly) Alex Delaware — i.e., if he refused to supply his number to the secretary — then this would have conveyed the correct sense:

When I told the dean's secretary what I was after, . . . she said she'd get back to me. Hanging up without giving her my number, I phoned Milo again.

But what Kellerman writes is extraordinarily far from expressing anything like what he wants to express, in my opinion. That second sentence in the red version above makes it sound like Alex Delware hung up the phone without getting his own number, which makes no sense. This surprised me: Kellerman is not a clunky and inexpert writer the way Dan Brown is. Kellerman writes well; his characterizations are rich and generally plausible, and his descriptions literate and generally effective. I have enjoyed a number of his novels recently. I'm inclined to try and track him down and ask him what went wrong with the paragraph quoted. I'm interested in whether he actually considers what he wrote to be a well-chosen, grammatical way of expressing his meaning accurately (he might; stranger things have been known), or whether it was just some kind of unnoticed slip by him or an editor. I'll have the Investigative Services Division at Language Log Plaza try to track him down, and I'll keep you posted.

Added later: The ISD did succeed in tracking Jonathan Kellerman down, and he was kind enough to take a look at the passage in question. What's really interesting is that at first he read it without the context (he had a different edition of The Clinic in front of him, and the sentence was not on page 23), and he misunderstood his own intent just like everyone else, figuring that Alex must have been trying to get a number. The story absolutely rules that out. And when the full context was supplied to him, Jon agreed that he simply could not read the text as having the right meaning. He had no idea how the error could have survived into print despite all the usual editorial scrutiny.

The main relevance of this is that dangling participles are not some kind of silly invention of grammarians. There is a real phenomenon here. If you position a non-finite clause adjunct somewhere that does not allow easy access to the right noun phrase to provide it with an understood subject, you get really serious difficulties of understanding. So serious that even a highly expert best-selling author cannot tell, nine years later, what the hell he could have meant by the sentence, and has to guess. Notice, though, we still don't know where this error came from. Jon cannot remember, even under interrogation by Language Log's ISD. We don't know whether he typed a period where he meant a comma and the copy editor failed to catch it, or whether the copy editor misunderstood and replaced a comma by a period, or what happened. We do know that what happened destroyed intelligibility. So some dangling participles are more than minor style imperfections, they are crashingly impermissible.

Posted by Geoffrey K. Pullum at 02:40 PM

Trent Reznor Prize to Bernard-Henri Levy

The prestigious Trent Reznor Prize for Tricky Embedding (Delayed Gratification Division) has been awarded to Bernard-Henri Levy and his translator, Hélène Brenkman, for this sentence:

And it is hard not to link this provocation, the deliberate circulation of these cartoons, the quasi-home-delivery of a Danish paper that no one could have guessed had so many readers in the Muslim world, it is hard not to link this self-inflicted blasphemy, this calculated offense (calculated, mind you, by the organizers of the distribution of the cartoons), it is hard not to link this blasphemy to a new planetary configuration, itself determined by three recent and major events.

[Hat tip to Semantic Compositions]

Posted by Mark Liberman at 08:40 AM


David Brooks' 2/12/2006 column "Bring Back the Gang of 14" includes this:

On the right you have Dick Cheney worrying about the return of Frank Church and on the left you have Howard Dean vaporizing about the return of Dick Nixon.

Some may object to vaporizing on the grounds of word usage, but I think that it's just mildly subversive of Brooks' proposal for bipartisan cooperation.

The OED has an appropriate sense for the verb vapor (well, "vapour" in their stick-in-mud orthography :-):

5. intr. To use language as light or unsubstantial as vapour; to talk fantastically, grandiloquently, or boastingly; to brag or bluster.

The AHD has a similar gloss for vapor the verb:

3. To engage in idle, boastful talk.

For the verb vaporize, the AHD gives us only the literal sense

To convert or be converted into vapor.

and the OED gives us only a few figurative evocations of things thus converting or being converted (making allowances again for British spelling):

1831 CARLYLE Sart. Res. II. vi, In figurative language, we might say he becomes..spiritualised, vaporised.
1866 FELTON Anc. & Mod. Gr. I. x. 175 They have not only vaporized her husband into a myth, but have consolidated a myth into a lover.
1888 DOWLING Miracle Gold III. xxvii. 15 The family estates and honours had been vapourized before that last of the Poniatowskis fell under Napoleon.

Brooks surely doesn't mean that Dean has been vaporizing in anything like this sense, while the idea of idle and fantastic talk fits his phrase perfectly.

But in his defense, let's note that Brooks isn't alone in using vaporizing to mean vaporing. Over at NRO, David Klinghoffer wrote (emphasis added):

A fellow with a beard like Bluebeard the Pirate hawks a deluxe bong that operates on the vaporizer principle. Vaporizing about the virtues of hemp, he explains that "hemp seed lubricates your brain. It actually helps you think more clearly."

On the left, and without the excuse of a reference to the vaporizer principle, Barbara Eherenreich at LiP magazine wrote (emphasis added again):

Whatever the psychology of this new type of war—and there has been much vaporizing about a recrudescence of "evil" in the world—one particular innovation has made it possible, and that is the emergence of an international market in small arms.

So you could see Brooks' usage as a malapropism (vaporizing substituted for vaporing), perhaps caused by an errant copyeditor or spellchecker. On the other hand, perhaps Brooks is joining an appropriately bipartisan effort to extend the English language in new directions. It's your call: I'm going with #2, myself. Why, as Horace asked, should we grant to Plautus a privilege denied to Virgil? (Not that Carlyle is much like Plautus, or Brooks like Virgil.)

But either way, isn't Brooks' phrasing inappropriately partisan? Cheney is "worrying" while Dean is "vaporizing"? To stay with the bipartisan mood, as well as the matter-phase metaphor, shouldn't Cheney be going to pieces about the return of Frank Church? or for added morphological parallelsim, how about fragmentizing?

[Update: Aaron "Dr. Whom" Dinkin writes that

I bet Brooks didn't mean "vaporizing" to be understood as 'engaging in boastful talk'; I bet he meant it as a nonce formation for 'having the vapors' - i.e., 'experiencing depression or hysteria'. This is more parallel with Cheney's "worrying", and I think it's more recognizable and common than the meaning of "vapor" which has to do with boastful talk.

If that's right, it improves Brooks' score for word choice while bringing him down a notch for covert gender-baiting in bipartisan disguise.]

Posted by Mark Liberman at 08:30 AM

Top Arafat climate scientists

Hoping to find some more George Deutsch audio, I searched Podzinger for {Deutsch NASA}, and got results that set me to wondering. First, let me tell you what happened. Then I'll explain what's so striking about it.

The third item on Podzinger's first page of results was the podcast of WNUR's This is Hell for 2/4/2006, where Podzinger found the following at about 53:20 into the mp3. Podzinger's transcript is in blue, and what I heard in the original audio is in red:

last sunday    there's an article in the times       about         the the nasa's top arafat climate scientists
last sunday uh there's an article in the times uh uh about a uh uh the     nasa's top uh     climate scientist

               began in the thirty years    and   --   respected throughout the community in effect of the  community
the guy that's been there   thirty years uh and and is respected throughout the community uh the scientific community  

You'll recall that Podzinger uses BBN automatic speech recognition (ASR) to turn podcasts into text, and then indexes the results in a roughly Googlish way. Actually, the text retrieval is probably more like the old Altavista algorithms, since there doesn't seem to be any equivalent of page rank here, but never mind... The TiH commentator has an extraordinary density of uhs for a radio personality, but Podzinger's ASR software manages to ignore the first five of them. That's 5 uhs in 15 words, by the way -- George Deutsch, agressive rises and all, is a hell of a lot more fluent than this guy.

And Podzinger's recognition of the word "NASA" is right on the money, demonstrating again that ASR-based audio indexing has gotten to the point of being really useful. Sometimes. Because then, on that sixth uh, the ASR system does something really weird. It renders "NASA's top uh climate scientist" as "nasa's top arafat climate scientists".

Now the "fundamental equation of speech recognition" says that

In other words, in deciding which string of Words corresponds to some Observed sound, we should pick the words that maximize (our estimate of) the conditional probability of O given W, multipliplied by the probability of W, divided by the probability of O. And since the probability of the sound -- P(O) -- is the same for all hypotheses about word strings, we can ignore it for the purpose of this decision.

We're left with a product of two terms. One term comes from an "acoustic model" that defines, for an arbitrary word string W, a probability distribution over possible stretches of sound. The other term comes from a "language model" that defines, independent of any considerations of sound at all, a probability distribution over possible word strings. In order to make it practical to create these models and to compute with them, we use pretty crude approximations. As you can see, the current state of the art nevertheless often works quite well.

(I believe that Fred Jelinek is the one who started calling this the "fundamental equation of speech recognition"; for further explanation, see Daniel Jurafsky and James Martin, "Speech and Language Processing", chap. 9, p. 47 of cited .pdf; the equation, of course, is basically just "Bayes' rule" applied to P(W|O), and Bayes' rule is either a trivial consequence of the definition of conditional probability, or one of the most profound and controversial equations in the history of mathematics, take your pick.)

When the method goes wrong, its mistakes sometimes turn out to be sensible when we look into them closely. For example, in my earlier post on Podzinger, I noted a case where the system rendered "when you said Jesus is" as "when music scene it is". Well, in the first place, these two sequences are phonetically a lot closer than you might think at first:

w ɛ
j u s ɛ d ʤ i z ɪ s ɪ z
w ɛ nm j u z ɪ k s i n ɪ ɾ ɪ z

So it's plausible for an acoustic model to be pretty happy with the second one as a substitute for the first. And as for the language model, using counts from a corpus of 4,444,962,381 words of news text, a bigram language model estimates the sequence "when you said Jesus is" as only about 2.2 times more probable than "when music scene it is" -- as such things go, this is a dead heat.

(Using counts from MSN search, the bigrams model rates "when you said Jesus is" as 2,112 times more probable than "when music scene is". I guess this tells us that Jesus is about a thousand time more prominent on the web than in the news; and Podzinger's language model is probably based mostly on news text.)

All this doesn't predict the error, but it tells us that the error was a sensible one given the kind of model being used.

Now consider the mistake that rendered "NASA's top uh climate scientist" as "nasa's top arafat climate scientists".

It's musch harder to explain or excuse this one. We have to assume that the acoustic model was happy to regard this cough-like uh as a probable rendition of "Arafat" -- this is not the behavior of a healthy and effective stochastic model of the sound of the English language. And we also have to assume that the n-grams involved in "nasa's top arafat climate scientists" were estimated to be probable enough to yield a good language-model score for this string.

Now, I can't imagine that anyone faced with a cloze test based on a text like

NASA's top ___ climate scientist

would think to answer "Arafat" as a candidate to fill in the blank. And counts from MSN support this impression, yielding estimates like

P(Arafat | top) = 1.7*10^-6
P(climate | Arafat) = 1.3*10^-6

where by comparison, for example,

P(scene | music) = 5.2*10^-3
P(climate | top) = 6.7*10^-5
P(scientist | climate) = 2.7*10^-3
P(scientists | climate) = 3.9*10^-3

Of course, bigram statistics are a crude measure of what is or isn't plausible English; and people are capable of producing and perceiving very implausible word sequences. But in terms of its own simple-minded models, it's hard to understand why Podzinger mapped uh to Arafat in this context. (Unless maybe its language-modeling materials were unnaturally enriched in strings like "top Arafat aide" and "post-Arafat climate"?) I'm sorry to say that ASR error analysis is not infrequently like this -- I wish there were a clear path to a class of models that would make more lifelike, or at least more coherent, mistakes.

Posted by Mark Liberman at 08:10 AM

February 11, 2006

Whatever is not prohibited is permitted -- not!

Last week Tommy Grano, laboring on a Stanford honors thesis on case in English pronouns, came across an entertaining exchange on alt.video.tape-trading (6/14/04-6/16/04):
  • Poster 1 writes "against Brian and I";
  • Poster 2 explains (in effect) that the same rules govern case choice for coordinated NPs as for single NPs, so "I" is just wrong -- only "me" is grammatically correct here;
  • Poster 1 maintains that "against Brian and I" is the grammatically correct version, claiming (in effect) that the rules for coordinated NPs and single NPs are slightly different, defends "I" by citing a check run through MS Word XP (presumably, a grammar check in which MS Word did not flag "against Brian and I"), and adds the further defense that Abraham Lincoln and Mark Twain used "I" this way;
  • Poster 2 replies scornfully that Poster 1 is completely wrong, as are the grammar check, the person who set up that grammar check, Lincoln and Twain (if they used "I" this way -- Twain certainly did; I don't know about Lincoln), and the "many, many PROFESSIONAL WRITERS" who "every day" use "I" this way ("There's barely an hour going by that you won't hear it.").
Some of this is fascinating, but old news.  What's startling is the appeal to the Microsoft Word grammar checker as an arbiter of correctness.  That's silly enough on its own, but what's REALLY silly is the covert assumption that if the grammar checker doesn't flag it it's ok, disregarding the fact that there are lots of things (like producing word salad) that grammar checkers don't even try to warn you against, the fact that there are lots of things (like determining when an NP is a subject, when it's a direct object, and when it's an object of a preposition) that grammar checkers can't do easily, and the fact that neither the Libertarian Maxim covertly appealed to by Poster 1 -- WHATEVER IS NOT EXPLICITLY FORBIDDEN IS PERMITTED -- nor the parallel Authoritarian Maxim -- WHATEVER IS NOT EXPLICITLY PERMITTED IS FORBIDDEN -- is a good guide to the use of grammar checkers, usage manuals, and other resources of linguistic prescription.

First, the exchange from step 2 on.  (Poster 1 goes under the handle "The Slayer @ 25", Poster 2 "c a", by the way.)

Poster 2: You said "those who would join forces against Brian and I".  Separately, it would be join forces against Brian and join forces against ME, NOT join forces against I.  The use of I and me does not change because another name is added.  If you are truly a student, there should be a teacher somewhere about that you can ask about this.  You will find out that I am right.  Also, MORE gramatically [sic] correct is not right either.  MORE doesn't fit there because it is either right or nor [sic] right.  One is not MORE right than the other, one is just wrong.  Like you.

Poster 1: Oh--"I" feel so sorry for your students.  "against Brian and I" is the gramatically [sic] correct version, sorry about that.  You're right, you shouldn'thave [sic] gotten involved if you had nothing valid to contribute.  Need I repeat the check that I just ran through Microsoft Office XP (specifically MS Word XP) before I made this post?  The rules of proper grammar do change slightly if you're running a combination sentence instead of seperate [sic] objects.  You're right about the seperation [sic] being "Brian and Me" but the rules of proper American/English Grammar  ("I" don't know where Crazy Cat Lady is from, nor do "I" particularily [sic] care) would state that the proper form is "Brian and I".  Just look up some of Lincoln's (Abraham) old written, forever archived speeches if you don't believe me.  And don't forget Samuel Clemens [sic] Classic Essays under the name "Mark Twain".

Poster 2: Don't feel sorry for his students.  He has taught them the right way.  Brian and ME is correct.  You are 100%, completely and thoroughly wrong.  And so is any grammar check you did.  Whatever PERSON set up that grammar check is wrong (and it WAS a person).  If Lincoln and Twain used I and me in this way, THEY were wrong.  Many, many PROFESSIONAL WRITERS are wrong every day, as evidenced by all the movies, television, books and magazines which use the word I when they should use the word me.  There's barely an hour going by that you won't hear it.

Some of this is fascinating, but old news, for example, Poster 1's (flawed) belief that nominative case for pronouns in coordination is prescriptively correct, even especially formal and polite (see Angermeyer & Singler's 2003 article in Language Variation and Change), both posters' (correct) observations that nominative coordinate objects are widely attested, and Poster 2's (flawed) belief that examples lie thick on the ground all around us, which is an instance of the Frequency Illusion.  It's also possible that Poster 2 believes that in general, not just in this case, there's only one right way to do things; certainly Poster 2 denies any validity for the nominative pronoun as an alternative to the accusative.

New to me, however, is Poster 1's touching faith in the Microsoft Word grammar checker.  Poster 1 expected that if there was something wrong with "against Brian and I", MS Word would have flagged it.  (By the way, my version of Word -- Word X for the Mac -- doesn't flag nominative coordinate object pronouns, either.  I tried using the grammar checker on files that were jam-packed with the things, and it flagged not a one of them, not even the classic "between you and I".)

But of course a grammar checker doesn't check text against an actual grammar; it merely checks for certain types of violations, essentially just those on a (relatively short) list of problems in grammar, usage, and style.  The Word grammar checker on my Mac fails to find anything wrong with some of the (nonstandard) constructions that have been of special interest to me: WH-that ("I don't know how many people that were at the party") and GoToGo ("She's going to San Francisco and talk on firewalls"), for example.  I have files with dozens of examples of these, and they all pass through the grammar checker unscathed.

It's not just the things that haven't come to the attention of the Microsoft staff that get missed; the grammar checker doesn't catch some things that the Microsoft Manual of Style for Technical Publications warns against -- subordinator (as opposed to adverbial) once, as in "Once you save the file, exit from the program", for example.  (I know, you're asking, what's wrong with the subordinator once?  The short answer is: nothing, but somebody on the Microsoft technical writing staff thought there was.  I'll get to this topic on another day.)

It also doesn't check for things that take a lot of parsing and so are genuinely hard to check for.  It doesn't seem to catch any of the classic types of dangling modifiers, for example, even the ones that are laughably bad.  It doesn't catch any of the "government by the nearest" examples that have occasionally come to our attention here at Language Log Plaza: "She had never and was never going to wear it"; "I expect the simplified characters will or are becoming standard there"; etc.

Now, checking for case of pronouns in coordination requires determining, first, when coordination is inside a NP (you don't want to parse things like "Kim saw him and he saw Kim" as containing a NP "him and he"); then, what the boundaries of the coordinate NP are; and, finally what the syntactic function (in particular, subject, direct object, or object of a preposition) of the coordinate NP is.  These are hard tasks to automate, and there's every evidence that the Word grammar checker doesn't even attempt them, but instead looks for configurations that are much easier to find:
  • On a sampling of 42 instances of finite clauses with non-coordinate "me" as subject ("... the fact that me (a Brit) was her boyfriend..."; "I know that me for one am always looking at the negative"; "... but me for one is glad I found the place"; "Yeah, me too am very disappointed" -- yes, these are actually attested) unearthed by Grano on 2/10/06, the grammar checker detected not a single one of the anomalous subject forms, and flagged only two of the 42 as having a problem with subject-verb agreement: "Heya party people, the holiday season is upon us and me for one am excited" (perhaps the checker took "us and me" to be a phrase, in which case a plural verb would be called for) and "Me for one is a true party girl" (the only example with "me for one" in sentence-initial position and with the verb "is"; the checker doesn't flag "Me for one am...").
  • When I fed the checker 20 examples of "agreement with the nearest" examples ("the challenges each of them still face"; "the population of ocean fish and other marine species have suffered major declines"), it flagged only five of them.  I don't at the moment understand how it managed to catch these five.
  • When I fed the checker the first 50 examples from a large collection of "double is" and other "extra is" examples, it flagged one of them ("The only thing is, is I got the power to see the future") as a possible subject-verb agreement violation, I don't know why; flagged a few examples of "was, was" and "is, is" as possibly having an unnecessary or misplaced punctuation mark; and flagged a few others (with "The point is, is that..." and "The truth is, is that...") as possibly having a comma between subject and verb.  But most of the 50 made it through the grammar checker unmarked for errors involving the subject of a clause.
(The checker did catch every occurrence of is is that had no intervening comma, marking these as "repeated word" errors.  Unfortunately, it also flags perfectly grammatical occurrences of is is, as in "What the problem is is that we have to leave".  The checker is obviously literally searching for (certain) repeated words -- it doesn't flag that that, but does flag this this, so it's not totally simple-minded -- rather than determining syntactic structures.)

But suppose we had a much much better grammar checker than this one, a grammar checker that actually did some parsing (correctly) and would catch nominative coordinate object pronouns and accusative coordinate subject pronouns ("me and him escaped") and the astonishing me-subject examples above.  We still wouldn't be justified in letting the Libertarian Maxim (or the Authoritarian Maxim, for that matter) guide us.  Any grammar checker is going to be based on some list of things to flag and some list of things to let pass, and meanwhile users of the language will come up with things that the compilers of these lists never contemplated -- some of them dubious, some of them fine, some of them up for discussion.  Anybody who studies variation, especially innovative, informal, spoken, and/or nonstandard variants, comes across new things (new to them, anyway) all the time.  (Yesterday's surprise for me was the "me too am..." and "me too is..." examples.  The day before it was the re-shaped idiom "taint with the same brush".  Let's see how today goes.)

A few words about the maxims.  There are situations where one of them is appropriate and situations where the other is, as well as situations (like grammar checking) where neither is a good guide. 

Public prohibitions generally assume the Libertarian Maxim.  If the sign says No Bicycles On Sidewalk (and nothing more), you are normally entitled to assume that scooters, rollerblades, skateboards, and so on are permitted on the sidewalk.  (Cars are barred from the sidewalk by a higher-level prohibition, made explicit in the traffic laws.) 

On the other hand, recursive definitions of sets work by the Authoritarian Maxim:  some entities are stipulated as being in the set; if certain entities are in the set, then others (related in a regular way to these) are as well; and NOTHING ELSE IS IN THE SET.  What is not expressly permitted is not allowed.

The two maxims are often contrasted with one another.  Here, for example, is Jeffrey Phillips on the Working Smarter site:

If not forbidden, it's permitted

I want to write today about taking the initiative and creating a culture that encourages risk taking and innovation.  Too often in our corporate bureaucracies, those who step up and take initiative with new ideas or new products are shunted aside.  It can be truly difficult to innovative and to change corporate processes if the culture and management team don't support innovation and change.

My father was a Marine, so maybe I'm biased, but there's a long running joke about how Marines see the world and how soldiers see the world.  Marines are taught that they must be prepared to improvise.  One of their core maxims is: "That which is not forbidden is permitted".  It's often been said that the Army mantra is:  "That which is not permitted is forbidden".

Meanwhile, various sites about computer systems caution that everything not explicitly permitted is forbidden; they're implicitly offering recursive definitions.  The xmlcoverpages site says, however,  that "Everything that is not forbidden is permitted in XML 1.1 names."

What you find on the web is mostly recent stuff.  But I was sure that I'd heard the Authoritarian Maxim from way back.  A trip to a dictionary of quotations eventually led me to back to Schiller, Wallensteins Lager (1798), Act 1, Scene 6, where the First Hunter explains: "Was nicht verboten ist, ist erlaubt" (original German courtesy of the Gutenberg Project).  That's 208 years ago, which is good enough for me at the moment.  Possibly Schiller got it from some earlier writer; certainly the IDEA is likely to be ancient.  In any case, Schiller was something of a phrase-maker, so maybe this compact formulation is original with him.

(Note perfectly grammatical occurrence of ist ist in Schiller's German formulation.)

[Update, 2/17/02: Jonathan Breit has written with a quote that does indeed push things back way before Schiller: Tertullian (the early Church Father), De Corona Militis, Chapter 2: "Sed quod non prohibetur ultro permissum est." -- Immo ['to the contrary'] prohibetur quod non ultro est permissum.  Apparently, Tertullian is entertaining an objection based on the Libertarian Maxim, and countering with an argument based on the Authoritarian Maxim.]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:41 PM

Angry rises

George C. Deutsch has fired back at his critics, in a radio interview on WTAW in Bryan, Texas. For those of you who haven't been following the story, Deutsch is a recent political appointee in NASA's public affairs department, who got crossways with NASA scientists over theories of global warming and the origins of the universe. His peroration:

This is -- an agenda! It's a culture war agenda! They're out to get Republicans, they're out to get Christians, they're out to get people who are {breath} helping Bush; anybody they perceive as not sharing their agenda, they're out to get!

This being Language Log, my focus is not on his politics, nor on his paranoia, but on his pitch.

Listen to his last sentence. (If you want to hear it in context, four sections of the interview are available on WTAW's web site -- part 1, part 2, part 3, part 4 -- and the quoted passage is at the very end of part 4.)

Now pay attention to the pitch contour. Below, I've given you some graphical aids, with an orthographic transcription and (from top to bottom) a display of fundamental frequency, a wide-band spectrogram, and an audio waveform.

We might crudely symbolize Deutsch's first phrase as

They're OUT to get _/ RePUBlicans _/

The boldface caps indicates that the main stressed syllables are "out" and "pub". The _/ notations mark the ends of two subphrases, and indicate that in each subphrase, Deutsch is using his locally lowest pitch on the stressed syllable, and then rising to the end of the phrase.

This passage is a good example of how rising intonation is not always (or even mostly) associated with what Taylor Mali called "invisible question marks" used by people who are unwilling to "declare things to be true". Deutsch continue with the same pattern:

They're OUT to get _/ CHRIStians _/

After the build-up from "Republicans" to "Christians", "people who are helping Bush" is a bit of a rhetorical ebb, not to say anti-climax, so the final rise is smaller:

(The plotted high pitch in the second syllable of "helping" is a pitch-tracker error.) Deutch closes with a bang. There are two (gently) rising subphrases, and then a final high (fall?):

Anybody they perceive as _/ not sharing their agenda _/ they're out to get ^

It's hard to tell what the final contour direction is, since the strength of his convictions propels his last word into falsetto mode, but certainly the final "get" starts high.

This example is slightly unfair to the proponents of uptalk-as-self-doubt, since in the case just examined, final rises mark the members of a list (of conspiracy targets), and "list intonation" is another of the traditionally-recognized uses for rising contours in English. However, sequences of repeated poke-in-the-chest rises are frequent in this interview, and generally enumeration does not seem to be involved, except insofar as rhetorical repetition can be considered to be a sort of listing. (A small sample can be heard here, here, here.) The communicative crux in these examples is certainly not self-doubt, nor does it seem to be the mere implication that there is more to come. Perhaps the rises are intended to reach out and seize the listener's attention, like the Ancient Mariner's eye contact:

He holds him with his glittering eye--
The Wedding-Guest stood still,
And listens like a three years' child :
The Mariner hath his will.

Or perhaps they're intended to compel a response, if only an internal one. In any case, Deutsch's interview seems to confirm David Brazil's observation that "rise tones" can be used to "assert dominance and control".

Of course, not every one of the phrases in this interview is rising, and a responsible analysis (rather than a simple blog post) would try to explain the complete distribution of contours, not just present a few evocative anecdotes. The roughly 9 minutes of speech in Deutsch's four posted interview fragments may not offer enough data to distinguish among alternative hypotheses about what Mr. Deutsch's intonational repertoire really is, and how he deploys it in relation to the content and context of his communications. But he's a forceful and effective speaker with a lot on his mind, and I expect that we'll be hearing more from him in the future.

[More Deutsch links: Nick Anthis at The Scientific Activist 1/30/06, 2/6/06, 2/8/06, 2/9/06; NYT 2/8/06, NYT 2/10/06, WaPo 2/10/06, Houston Chronicle 2/10/06, Ap 1/9/06. Curiously, as of mid-day 2/11/06, searches for "Deutsch" on the Fox News and the Washington Times sites turned up nothing. ]

[Update: Several readers have written in to express some puzzlement (or disagreement) about what (they take me to be claiming as) the form and meaning of the intonational patterns involved here. Without going into detail, let me say that there are certainly several different patterns in these examples, which may differ in kind as well as in degree; and that final rises can certainly sometimes be used in asking questions, in signaling that a list isn't finished, and -- yes -- in expressing uncertainty. My main goal in this post was to undermine the widespread belief that final rises are always "question intonation", and that rises on statements must therefore represent uncertainty about their content. A much more plausible idea about final rises was proposed in McLemore (1991), where it was suggested that a metaphor of connection unifies such uses as marking non-final list items, evoking shared knowledge, and inviting a response. Whatever the correct description is, it's probably not a matter of position on scales of relative confidence and dominance. If final rises are sometimes used to signal self-doubt, or more often used for (perhaps benevolent) communicative control, it's for the same reasons that nearly any linguistic tool can be used for nearly any interactional purpose. ]

[Other Language Log "uptalk" posts:

This is, like, such total crap? (5/15/2005)
Uptalk uptick (12/15/2005)
Further thoughts on "the Affect" (3/22/2006)
Uptalk is not HRT (3/28/2006)


Posted by Mark Liberman at 10:23 AM

February 09, 2006

Rhyme schemes, texture discrimination and monkey syntax

About two years ago, I posted (here, here and here) about Tecumseh Fitch and Marc Hauser's article "Computatational Constraints on Syntactic Processing in a Nonhuman Primate". Last month, Geoff Pullum gave a talk here at Penn, under the title "Monkey Syntax", about some recent work with Jim Rogers on "some not very well known mathematical results which appear to be highly relevant to ongoing experimental work on precursors to syntax in non-human primates". This got me thinking about the questions again. My thinking has been considerably clarified by discussions with Geoff and with Barbara Scholz during their recent visit, and with Jim Rogers over breakfast yesterday. I've jotted down a few notes, which you can read after the jump, if you're interested in such things.

You'll recall that F&H tested the ability of two primate species -- humans and cotton-top tamarins -- to detect novelty in sequences of spoken syllables generated by grammars belonging to different mathematical classes. According to their abstract:

The capacity to generate a limitless range of meaningful expressions from a finite set of elements differentiates human language from other animal communication systems. Rule systems capable of generating an infinite set of outputs ("grammars") vary in generative power. The weakest possess only local organizational principles, with regularities limited to neighboring units. We used a familiarization/discrimination paradigm to demonstrate that monkeys can spontaneously master such grammars. However, human language entails more sophisticated grammars, incorporating hierarchical structure. Monkeys tested with the same methods, syllables, and sequence lengths were unable to master a grammar at this higher, "phrase structure grammar" level.

I suggested at the time that this overinterprets their results. The particular stringsets used in their experiment were finite, and in fact contained only two short strings each, if their terminal vocabulary is divided into the high-pitched female-spoken words and the low-pitched male-spoken words that defined the two grammatically-relevant classes of syllables that they used. Specifically, I suggested this alternative interpretation of their experiments:

Given exposure to instances of the patterns ABAB and ABABAB, tamarin monkeys showed increased interest in patterns AABB and AAABBB, perhaps because these contained two to four copies of the salient (because repeated) two-element sequences (bigrams) AA and BB, which they had not heard before. By contrast, given exposure to instances of the patterns AABB and AAABBB, other tamarins did not show significantly increased interest in the patterns ABAB and ABABAB, perhaps because they contained only one or two copies of the previously-unheard bigram BA, which may also be less salient because it does not involve a repetition.

Given the same stimulus sequences, human subjects were able to categorize the new patterns as different, regardless of the direction of training and testing, perhaps because their threshold for noting statistical sequence differences was lower, and perhaps because they were able to remember longer sequences, thus noting that the training material AABB and AAABBB did not contain the four-element sequence ABAB.

The recent work by Pullum and Rogers aims to "provide an introduction to some interesting proper subclasses of the finite-state class, with particular attention to their possible relevance to the problem of characterizing the capabilities of language-learning mechanisms". These are important issues for anyone interested in understanding familiarization/discrimination experiments, and I'm glad to hear that Geoff and Jim have been talking with Marc Hauser and his students. But some simple familiarization/discrimination results may not be straightforwardly characterized in any grammatical terms at all. And in fact, I think there's good reason to think that the F&H experiments happen to have this property.

Suppose you heard someone reading a list of sequences of six numbers, something like this

73 30 73 30 73 30
97 53 97 53 97 53
42 38 42 38 42 38
. . .

and then another list, something like this

98 98 98 22 22 22
77 77 77 84 84 84
71 71 71 70 70 70
. . .

You'd have no difficulty in detecting that the second one exhibits a different pattern from the first. The same would be true if what you heard were sequences made of random English monosyllables instead of sequences made of random 2-digit integers:

bits field bits field bits field
cots brunt cots brunt cots brunt
wheat spooked wheat spooked wheat spooked

must must must foist foist foist
hug hug hug peal peal peal
squat squat squat cranes cranes cranes

The patterns here are patterns of equivalence across positions in strings of elements drawn from a vocabulary that might as well be infinite, given that none of the elements used ever recur within the experiment. As a result, the standard mechanisms of formal language theory don't give us any direct way to characterize the patterns that we nevertheless so easily recognize.

As a start towards a more general characterization of the kind of patterns under discussion, observe that there are only two possibilities for sequences of length 2, either that both elements are the same or that the second one is different from the first. We can symbolize these options as


(Note that "A" and "B" here denote for any tokens that we like -- the terminal vocabulary is infinite, or at least is limited only by the length of the signals we're willing to sit around to listen to, and the signal-to-noise ratio of the channel on which we're listening.)

For length-3 sequences, there are 5 possibilities, which we can symbolize as:


More generally, for a sequence of length n, we're setting up equivalence classes among its positions, resulting in a sequence of what are are called "Bell's numbers", as Eric Weisstein at Mathworld explains:

The number of ways a set of n elements can be partitioned into nonempty subsets is called a Bell number and is denoted Bn. For example, there are five ways the numbers {1,2,3} can be partitioned: {{1},{2},{3}}, {{1,2},{3}}, {{1,3},{2}}, {{1},{2,3}}, and {{1,2,3}}, so B3=5.

B0==1, and the first few Bell numbers for n=1, 2, ... are 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975, ... (Sloane's A000110).

The "Bell" in question is Eric Temple Bell, who among his other accomplishments wrote Men of Mathematics, one of my favorite books when I was a child. It would be nice to have a new version without gender presuppositions, and with some additional sections.

The comments on sequence A00110 at Neil Sloane's magnificent On-line Encyclopedia of Integer Sequences include this one, which offers a different perspective on the relationship of Bell's numbers to patterns of equivalence classes in a sequence:

Number of distinct rhyme schemes for a poem of n lines: a rhyme scheme is a string of letters (eg, 'abba') such that the leftmost letter is always 'a' and no letter may be greater than one more than the greatest letter to its left. Thus 'aac' is not valid since 'c' is more than one greater than 'a'. For example, a(3)=5 because there are 5 rhyme schemes. aaa, aab, aba, abb, abc. - Bill Blewett (BillBle(AT)microsoft.com), Mar 23 2004

Short patterns of this type -- strings characterized in terms of position-wise equivalence classes of their elements -- are clearly very salient to humans. (And note that the equivalence-classes can be defined by any salient shared properties, like "starts with [k]" or "is an odd integer".) Given two random schemes from among the 15 possible patterns of length 4, or the 52 possible patterns of length 5, I suspect that after being familiarized to the first pattern, subjects will easily discrimate it from instances of the second, even if none of the local elements used in the experiment ever occurs more than once.

As the patterns' length increases, this task will clearly become harder and harder -- unless the patterns to be discriminated happen to have rather different local properties. For example, if one length-12 pattern happens to start with AAABBB while the other one starts with ABCABC, the discrimination task will be trivial.

One way to model this would be to assume that subjects are sensitive to the statistics of equivalence-class properties of local substrings -- what we might call schematic n-grams -- just as they are sensitive to the statistics of conventional n-grams. This might be as simple as noting when adjacent symbol pairs are the same vs. different, or it might be based on progressively more complicated sorts of calculations, organized in the ways familiar to formal language theorists.

If this is on the right track, then formal language theory will help us understand this sort of auditory texture discrimination after all -- but we'll need to to take a broader view of the vocabulary of the "language", and how it's related to the particular sequences that we use as stimuli.

[For some other ideas about interesting things that might be going on in experiments of this type, take a look at this review of research in visual texture perception. Also relevant, I think, is some of the work on mismatch negativity, which offers an alternative method of measuring the perceived novelty of auditory subsequences.]

Posted by Mark Liberman at 09:33 AM

Multilingualism isn't dead yet

On the CBC French channel they just had a tour of the Olympic Village in Turin. The reporter encountered a Dutch speed skater and asked him in English if he could speak French. To her surprise, I think, he answered in French that he could speak "a little". In fact, though he wasn't entirely fluent, he managed an exchange of several turns quite well. Ring one up for Dutch multilingualism.

Posted by Bill Poser at 03:13 AM

Bella Coola Censored

CTV news just had a story about a log cabin being used to promote British Columbia in other countries, currently at the Olympics in Turin. It's made of Lodgepole Pine (Pinus contorta latifolia) from Bella Coola, which is on the mainland. They don't tell people that the logs come from Bella Coola though; they tell them that they come from Vancouver Island. Why? Because in Italian bella cula means "nice ass". That seems like a pretty weak reason to lie about the origin of the wood, but in any case, as far as I can tell (Italian being a language I can read but don't speak and don't know really well) there is no such phrase in Italian. The Italian word for "buttocks" is culo, and since it is masculine, "nice ass" is "bello culo". I guess this does sound somewhat like Bella Coola, but certainly not the same.

[Addendum: 04:20 LLT. Ben Zimmer turned up this story in the print media. It gives the problematic Italian phrase as bella collo , which I think is wrong - by my lights collo means "neck" - and doesn't mention the lack of gender agreement. On the other hand, it cites evidence that Bella Coola gives Italians the giggles. Of course, this seems to show declining standards of grammar on the part of Italian speakers. You would think that no self-respecting Italian speaker would entertain the possibility that a phrase could be Italian if it would violate the gender agreement rules. I suppose we'll be hearing from William Safire about this soon.

Incidentally, according to the article, Bella Coola is "named for the original First Nations inhabitants". That is sort of true, but like many such names it isn't their own name for themselves. It is an anglicization of the Heiltsuk name for the Nuxalk, who are the native people of the Bella Coola area.]

Posted by Bill Poser at 03:07 AM

February 08, 2006

Arabic in Kurdistan

From one of Tristan Mabry's dispatches in the Pennsylvania Gazette, a description of the Dec. 15 elections in Iraqi Kurdistan:

As the day wore on, it was clear that Kurds were very proud of their hard-won right to vote, though some had misgivings about the draft constitution. This document unequivocally gives Kurdish status as an official language of the country, but a few citizens were disappointed by a last minute amendment that also made Arabic an official language of government institutions operating inside the Kurdish region. Still, the future of Kurdish in Iraq is now more secure than at any time in the last century. Remarkably, there is an effort to merge the two dialects of Kurdish in Iraq into a bona fide national language. In this case, the strength of sharing a national identity as Kurds trumps any quarrels between Kurmanji and Sorani. They are united also by a strong dislike for Arabic: not the sacred language of their holy book, but the modern form that forcibly displaced their mother tongue for decades. While the first language of all school children is Kurdish, the Ministers of Education in both governorates told me they are developing a more robust curriculum for the second language of public education: English.

An earlier Language Log post remarked on the evident negative feeling that some Kurds feel towards Arabic, and linked to an exchange in the Financial Times on the subject of English vs. Arabic in Kurdistan. Bill Poser has posted here on the suppression of Kurdish (associated with the oppression of Kurds) in other places and times, including a recent decision in Turkey to fine 20 Kurds for displaying placards containing the letters Q and W.

[As the Gazette explained in connection with an earlier report:

Tristan Mabry is a former economics reporter for The Wall Street Journal and producer for CNN who is currently a Ph.D. candidate in political science. This is the first in a series of reports for the Gazette on his travels to meet with the leaders of Muslim independence groups as part of his research for his dissertation, “Nationalism and the Politics of Language in Muslim Minority Conflict.”


Posted by Mark Liberman at 06:29 PM

Hellaciously fellatious

E-mail correspondent Bill Findlay recently came across this occurrence of fellatious in a web discussion of Intelligent Design:

To: Bellflower

"Don't bible at me in public school". Actually it has certain rhythm to it and could be made into a song line. Designer is not logical - for who/what then had designed that designer? In logic it is called "reductio ad infinitum", and such arguments are fellatious.

[posted on 01/06/2006 9:42:29 PM PST by GSlob]

Perhaps GSlob meant to say, in a tasteful way, that such arguments suck monkey dick, big time.  (By the way, "suck monkey dick" gets 1,460 raw Google webhits this morning, and "sucks monkey dick" another 1,290.  All the occurrences are, of course, both metaphorical and disparaging; no monkeys or blowjobs are involved, even in the forum where the proposition that "gay people suck monkey dick" was under discussion.)  Somehow I doubt it; could GSlob be that clever?

Findlay wondered if it was an eggcorn.  But it's hard to imagine that someone who can call up a reference to the reductio ad infinitum would fail to connect the adjective in question to the noun fallacy and instead hit on fellatio as the related noun.

Fond though I am of eggcorns, my guess is that GSlob is just not a very good speller; sometimes this is all that's going on.  There is a pattern of noun stems ending in ac (+y) related to adjective stems in at (democracy-democratic), so you might guess that the adjective stem related to fallacy works this way too.  Independently, there's a considerable tendency, which I reported on in "Toadying 3", to spell the first vowel in fallacious with an e instead of an a: fellacious.  (I find this astonishing, but there it is.)  Put these two spelling errors together, and you get the entertaining fellatious

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 02:26 PM

February 07, 2006

Looking for lies in all the wrong places

The following guest post comes from the distinguished sociolinguist and forensic linguistics expert Roger Shuy:

Mark Liberman's comments on the New York Times article "Looking for the Lie" reminded me that there are scads of programs offered to law enforcement agencies around the country, all of which claim to teach cops (and anyone else willing to pay the tuition) how to detect deception.

Avram Sapir's Scientific Content Analysis (SCAN) program is one of them. Curious about it, I paid my $350 and took his week-long seminar about a ten years ago. It's an odd mixture of the techniques used by the FBI, called Statement Analysis, and a hopeful but largely misguided use of some of the good work coming out of Paul Ekman's laboratory in San Francisco. Mix in a touch of the Reid Technique, stir well, and you can teach the police how to tell when suspects are lying.

I found SCAN, Statement Analysis, and the Reid technique amusingly optimistic and often downright frightening. But they are hot items in our nation's police departments.

Perhaps SCAN's best attribute is that it begins by admitting that the police interview often messes up royally (I wondered then, and still do, why then it might not be better to try to improve the interviewing). To avoid this problem, Sapir suggests that before the interview begins, the police are to give the suspects pen and paper and have them write a description of everything that happened on the theory that it's much harder to write a lie than to say one. He also teaches that people lie indirectly, so the police are to notice omitted information, hedging, answering questions with questions, and ignoring certain things. Some (but by no means all) of his principles include the following highly suspicious conclusions:

  • The shortest way to write a sentence is the best way. Any deviation from short is meaningful.

  • Changes in language reflect change in reality, with examples such as changing "my wife" to "Louise" or changing "my son" to "my child".

  • Information that comes as an answer to a question is less reliable than information that comes without being asked.

  • Nobody can lie twice, which means that it is impossible to lie on two different layers of deception about the same issue. It is also impossible for a deceiver to say, "I'm telling you the truth."

  • The sequence of the statements reflects a deceiver's priorities. Sapir says that there are three time units to be described: before, during and after the incident. 85% of deceivers are said to devote more writing to what happened before and after the event.

  • Pronouns produce 80 to 90 percent of the deception. When "I" is changed to "we", this indicates that there is deception. The lack of "I" indicates that the subject might be lying.

  • Changing "my" to "the" is also a key.

  • The unnecessary use of connectors such as time markers are signs of lying because they replace information that the deceiver left out of the statement.

  • After reviewing the writing, the cops can interview the suspect.

Statement Analysis does much the same. Again using a written statement, the analyst looks for these clues to deception: overly detailed statements, repetition, unusual details, unnecessary complication, irrelevant details, subjectivity, admitting memory loss, hedging, excessive self referencing, verbosity, unnecessary connectors, deviating pronouns from "I" to "we", and the imbalance of language on the beginnings and ends of the statements.

If you are still bothering to read this, an even greater surprise is in store for you. The Reid technique, developed by John R. Reid of Chicago's Scientific Crime Detection Laboratory. Unlike Statement Analysis, it takes allegedly known information about verbal and non-verbal communication and uses it during the police interrogation. The cop is to ask a set of 15 questions, noting throughout such non-verbal features as nervousness, body twitching, lack of eye contact, downward glances, eye blinking, and arm flailing. As for language clues to deception, long pauses, overly elaborated responses, off-topic responses and silence are all diagnostic clues. The interviewers are to watch for these responses at the same time they do their interviews, marking the answer to each question with a slots marked Deceptive, Truth or Don't Know.

These techniques are not only highly suspect but they also make no allowance for the cultural, social or individual differences of the suspects. The police are expected to do more than is humanly possible, such as watching for the frequency of eye contact (truth tellers are supposed to do this between 30 to 60 percent of the time). One wonders how this could possibly be measured on the spot. And nothing is said about the strong probability that it is the interviewer who stimulates the alleged clues.

— Roger Shuy

Posted by Geoffrey K. Pullum at 09:27 PM

How to countr orthographic offendrs

David Giacalone of f/k/a takes a break from his efforts to eradicate the word "blawg" to alert us to a new linguistic menace: the creeping conversion of the agentive suffix "-er" to "-r" in trendy online usage. Microsoft's Robert Scoble was the first to complain, after noticing that the popular photo-sharing site Flickr had seemingly infected several other "geek projects" like the screenshot utility Grabbr. The new tech-gossip blog Valleywag was next to pick up on Scoble's peeve, adding retrievr, gtalkr, talkr, flagr, Bloggr, gabbr and Frappr to the rogues' gallery of malefactrs. Digerati queen Esther Dyson is a particularly egregious "-r" user, at least on her Flickr feed where she calls herself "Esthr" and peppers her comments with words like "investr" and "dinnr".

Giacalone echoes Valleywag's urgent call to "donate 'e's to needy trendoids":

To help those who are particularly impervious to subtle gestures, we suggest that you dig out any old, unused Scrabble games and remove the 12 "e" tiles. Then, send one each to the dozen trendoids you know who most need a reminder of how our language operates (perhaps with instructions).

It's a funny suggestion, though not exactly a novel one. Spiritual predecessors include the "Vowels for Wales" campaign of the Society for Creative Anachronism, and the much-circulated Onion article, "Clinton Deploys Vowels to Bosnia."

[Update: See Polyglot Conspiracy on the related transformation of "-er"/"-or" to "-r" in the names of Motorola cellphones such as RAZR, ROKR, and SLVR.]

Posted by Benjamin Zimmer at 04:12 PM

Toadying 3: More of the fellat- family

By now I should know better than to write things like "fellatial is one of NINE attested adjectives in the fellat- family", without any hedging about that exact number.  So now e-mail correspondent Tacotortoise notes yet another adjective in the family: fellatious, with the spelling variant fellacious.  It even pushes fellative out of third place in Google web frequency, not far behind the adjective that started the whole thing, fellatial.

Meanwhile, since we already had the verb fellatiate in the family, we should have expected a derived nominal fellatiation.  It's a shy member of the family, but can occasionally be sighted.  As can the derived adjective fellatiatory.

It turns out that Tacotortoise thought of fellatious because he's used it himself, for instance in a LJ entry from 2004:

If we take the guitar to be a giant phallic symbol, as I have often (read: twice) seen it described, then the act of playing the guitar becomes overtly masturbatory. This is true not only of the 15-minute guitar solo, but also rhythm guitar, ... bass, and all other guitar variants. All the reed instruments are equally blatantly fellatious.

This is a literal use of the adjective ('concerning or resembling fellatio'), but among the 566 raw Google webhits I got on 2/1/06, there are a huge number of metaphorical occurrences in the toadying domain, for instance:

I just wonder whether Graham Capill would have got the kind of fellatious women's magazine coverage given to Dravitski. (link)

As sad as it is to watch the ongoing and fellatious fawning of the Royal Saudi family by the Bush Administration, they are hardly the only administration ... (link)

(I note in passing that "the... fawning of the Royal Saudi family by the Bush Administration" doesn't work for me syntactically.  The of is off for me; to or before would work, though.)

Tacotortoise also reported the spelling variant fellacious.  This one is harder to search for.  First, you search on "fellacious -DVD" to eliminate most of the (numerous) references to the DVD Fellacious.  On 2/1/06, that gave me about 412 raw webhits.  But a large number of these are just misspellings of fallacious.  So you hand-search through these.  That yields a few occurrences of fellacious in a sexual sense, apparently all of them literal.  Here's one:

... that we shall be caught in a fellacious act as said vehicle steals from darkened garage, and daylight reveals oral glandular massage ...  (link)

So much for the fellat- adjectives (for the moment -- but read on).  In the noun branch of the family, so far we have fellatio (the borrowing from Latin), fellation (the derived nominal built on the verb fellate), and of course the nominal gerund fellating (also built on fellate).  But there are some occurrences of a verb fellatiate (which I like to think of as fellate with bells and whistles), so we'd expect a few occurrences of a nominal derived from it.  Verbs in -ate almost always have derived nominals in -at-ion, so: fellatiation.  And on 2/1/06 I got 17 web hits for it, some literal, some metaphorical:

literal: I don't about fellatiation but I'm sure masterbation and fornication would be the ... (link)

metaphorical: ... and start saying nein and nyet to blind obedience, mass-fellatiation of The upper class--give them all downers--The kind that take you waaaaay down ... (link)

(Masterbation is a not uncommon misspelling of masturbation.  I've never been able to decide whether there's some eggcornish impulse, invoking the word master, behind this spelling.  And now Seinfeld, with its catchphrase "master of my own domain" alluding to masturbation, has thoroughly muddied the waters.)

Next, just as fellate serves as a base for the derived adjective fellatory (the most frequent of the ten adjectives so far found in the fellat- family), fellatiate is available to serve as a base for a derived adjective in -at-ory: fellatiatory.  And, yes, this one has a handful of attestations on the web, and they are both literal and metaphorical:

literal: Bill's heinous sin of Fellatiatory Prevarication wouldn't even rate a footnote in the twenty-volume edition of The Complete Lies and Deceptions of Bibi ... (link)

metaphorical: ... extensively and explicitly bought out the media is (using OUR tax money), the mutually fellatiatory relationship of the White House and Saudi Arabia. (link)

I am pleased to say that although fellate yields a derived adjective fellatial (the word that brought us into this extended visit with the fellat- family in the first place), the alternative verb fellatiate has apparently not yet served as the base for a derived adjective in -at-ial, at least on the Googleable regions of the web.  I view this as good news, because fellatiatial is just silly.  Anyway, we have (as of press time) eleven other cocksuckin' adjectives to choose from, so who needs the awkwardly clownish fellatiatial?

[Afterthought: fellatiatious isn't attested, either.  Whew!]

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 12:51 PM

A sculpture that "bears no resemblance" to its subject

According to a 1998 article by Washington Post staff writer Joan Biskupic:

In the Supreme Court's white marble courtroom, the nine sitting justices are not the only presiding presence. At the center of the nation's legal system, high above the justices' mahogany bench, the great lawgivers of history are depicted in marble friezes.

From Hammurabi to Moses to John Marshall, the stone sculptures commemorate written law as a force for stability in human affairs. The larger-than-life artworks, designed by architectural sculptor Adolph A. Weinman as the courthouse was being built in the early 1930s, convey the idea that, while the law begins with individuals, its principles never die.

The 18 lawgivers looking down on the justices are divided into two friezes of ivory-colored, Spanish marble. On the south wall, to the right of incoming visitors, are figures from the pre-Christian era -- Menes, Hammurabi, Moses, Solomon, Lycurgus, Solon, Draco, Confucius and Octavian (Caesar Augustus). On the north wall to the left are lawmakers of the Christian era -- Napoleon Bonaparte, Marshall, William Blackstone, Hugo Grotius, Louis IX, King John, Charlemagne, Muhammad and Justinian.

Interspersed with the lawgivers are angels representing concepts such as philosophy, liberty and peace.

Recent events underline the fact that one of these 18 figures is implicitly controversial. (There's a photo of the currently-relevant figure here. And at least one of the other 18 figures has also been the subject of controversy, though for different reasons.)

The same article gives some evidence to support the concerns of aniconists and even iconoclasts:

Occupying nearly the highest point of the luminous, gold-edged room, above the 30-foot Ionic columns, the friezes inspire awe. When Sherman Minton, a Supreme Court justice from 1949 to 1956, pointed out the friezes to his grandson, the 10-year-old asked the predictable question, "Granddaddy! Where's God?"

(though I don't personally share those prejudices.) In 1997,

A coalition of Muslim groups asked the court to sandblast or otherwise remove the depiction of Muhammad, contending that it was a form of sacrilege because graven images are forbidden in Islam and that believers might be encouraged to pray to someone other than God, or Allah in Arabic.

Chief Justice William H. Rehnquist rejected the request, saying the Muhammad sculpture "was intended only to recognize him, among many other lawgivers, as an important figure in the history of law; it is not intended as a form of idol worship."

According to the footnote on p.7-8 of K.M. Sharma, "What's in a name?: Law, religion and Islamic names", Denver Journal of International Law and Policy, 1998 (link from TPM):

Rehnquist also dismissed the objection to the curved sword in the marble Muhammad's hand as reinforcing the stereotypical image of Muslims as intolerant conquerors: "I would point out that swords are used throughout the Court's architecture as a symbol of justice and that nearly a dozen swords appear in the courtroom friezes alone." Rehnquist said that the description and literature, however, would be changed to identify Muhammad as a "Prophet of Islam," and not "Founder of Islam." The rewording, based upon "input of numerous Muslim groups," would also say that the figure "is a well-intentioned attempt by the sculptor Adolph Weinman to honor Mohammed, and it bears no resemblance to Mohammed."

The same events are discussed in the 10th Anniversary Report of the Council on American Islamic Relations (CAIR) here (on p. 22), which presents this interaction as the first of a series of cases where "CAIR continued to rely on the strength of its numbers to challenge inappopriate portrayals of Islam". The presentation implies that CAIR viewed the resulting change in description with satisfaction -- at least, it's placed first in a series of examples, the third of which is introduced as "another great success".

I don't recall that the "attempt to honor"/"no resemblance" argument was presented on behalf of the Buddhas of Bamiyan, or for that matter the religious statues destroyed during the Dutch Beeldenstorm in 1566, nor do I imagine that such an argument would have been effective. Its acceptance in this case may therefore reflect the logic of power more than the power of logic.

[Update: the fate of a similar statue in New York City was different, as this 2/12/2006 NYT story by John Kifner explains:

Perhaps the longest-running -- if least noticed -- depiction of Muhammad in New York City was an 8-foot-tall statue on the roof of the State Appellate Division courthouse on Madison Square. The building was erected at the turn of the 20th century, back in the days when graft got you some architecture.

Muhammad was one of 10 lawgivers -- among them Moses, Confucius, Justinian and Alfred the Great -- along with other allegorical figures like Peace, Wisdom, Justice and the four seasons, for a total of 21 statues adorning the building. His identity came to public light after more than 50 uneventful years when the Department of Public Works announced a $1.2 million project to repair the statues, clean the building and put up a five-story addition.

Ambassadors from Indonesia, Pakistan and Egypt went to the State Department to ask that the statue be destroyed rather then renovated. The justices in the court agreed.

So in 1955, Muhammad, who had a turban, a book, a scimitar and rather Old Testament-looking beard, lost his prominent place atop the courthouse's southwest corner. Everybody else was moved one spot to the left, leaving empty the pedestal that once held Justinian. Muhammad was lowered by block and tackle, wrapped in excelsior and trucked off to a stone company in Newark. In the last reported sighting -- in 1983 -- the statue was lying on its side in a stand of tall grass somewhere in New Jersey.


Posted by Mark Liberman at 07:31 AM

What they talk about when they talk about the things that most people never talk about

In case you're among those who have unknowingly contributed to the rising tide of floopiness in our nation's prose, Francis Heaney will explain the problem to you. And here's another story in the same vein, where the ungrammarian is made of tougher stuff.

Posted by Mark Liberman at 06:18 AM

February 06, 2006

Toadying 2: Derived nominalization

Some follow-ups to "The vocabulary of toadying", including material from my e-mail correspondents and some things that didn't make it into the earlier posting because it was already awfully long:

    - a difference in the syntax of fellatio and fellation that indicates that fellation is not merely an anglicized version of Latin fellatio;

    - still more words in the fellat- family;

    - possible insights into the mind of John Podhoretz;

    - more toadying vocabulary, based on sexual vocabulary, on the vocabulary of body services, and the vocabulary of religious worship.

I'll take these topics up in separate postings.  First, fellatio vs. fellation

Correspondent E. notes that my example of metaphorical fellation has "fellation of" followed by a NP denoting the recipient -- "the non-stop fellation of Brady and Belichick by Michaels" ("Michaels's non-stop fellation of Brady and Belichick" would also have been possible) -- and that "fellatio of" really wouldn't work here.  In fact, E. writes, "I can't think of a way to stick on a recipient of the action to the noun [fellatio]."  This is an astute observation.  Fellation and fellatio are both nouns denoting acts/events, but fellation has the syntax of most other English nouns in -ation, which can be followed by a preposition (most commonly of) plus a NP object of that preposition which denotes the person or thing affected by the act or in the event, while fellatio lacks this syntactic possibility. 

In fact, the noun fellation (but not fellatio) exemplifies a much-studied phenomenon in English morphosyntax, usually labeled DERIVED NOMINALIZATION (not a perfect name, by any means, though the derivation of nouns from verbs is certainly part of the story).  To describe what's going on here, I'm going to have to go through some fairly technical stuff, so if you want to avoid this, here's the conclusion: fellation is derived (in English) from the verb fellate, rather than being a simple anglicization of fellatio (fellatio being a Latin noun derived from a Latin verb meaning 'to suck', but that fact isn't relevant in English); fellation has the syntax of an act/event noun derived from a verb, and fellatio does not.

Ok, into those deep and dark technical waters.  We start with verbs.  Each verb is associated with a set of (LEXICAL) ARGUMENTS, which I'm going to indicate by numbers: 1 and 2 for what you probably want to think of as its subject and its direct object (if it takes one), respectively.  (Many verbs take other arguments as well, but 1 and 2 will do for my purposes here.)  What kinds of arguments a verb takes is a fact about that verb as a lexical item; it's a separate question how all this material gets put together into phrases and clauses.  As for fellate, it's an "absolute transitive" verb, taking not only a 1 but also, obligatorily (except in special circumstances), a 2.  An expression serving as its 1 denotes the cocksucker, and an expression serving as its 2 denotes the guy getting blown.

Now to put these three parts (V, 1, and 2) together into larger expressions.  This is a matter of SYNTACTIC FUNCTIONS for the constituent parts. In the simplest sorts of clauses (Tony fellated Joe), the 1 and 2 expressions serve as the SU (Subject) and DO (Direct Object) of the V, respectively, which means (among many other things) that the 1 comes first, followed by a VP consisting of the V followed by the 2.  There are (many) other ways to put the three parts together.  In passive clauses (Joe was/got fellated (by Tony)), the 2 serves as SU and the 1 is not obligatorily expressed -- but if it is it serves as an OO (Oblique Object), marked by a preposition, in this case the particular preposition by.  In nominal gerunds (Tony's fellating Joe (entertained their roommates), (The guys were surprised at) Tony's fellating Joe), the 1 is not obligatorily expressed (Fellating Joe (pleases me), (I'm enthusiastic about) fellating Joe), but if it is it serves as a (possessive) DET (Determiner) for the -ing form of the V.  And on and on.

Some verbs have their 2 expressed as an OO rather than a DO; the preposition marking the OO is one associated with the specific V: adhere to, rebel against, flee from, etc. 

Now: a great many verbs have related act/event nouns.  Morphologically, there are many V-N relations, depending on the V: the N can be identical to the V (capture-capture; this is "conversion" or "zero derivation") or can have one of a number of derivational suffixes, among them -ance/-ence (disappear-disappearance, adhere-adherence), -al (remove-removal), -t (flee-flight), -ion (rebell-rebellion, donate-donation), and -ation (confirm-confirmation).

Finally, the really important, very cool, fact: the syntax of these derived Ns is almost entirely predictable from the syntax of the Vs they are based on.  If it takes a 2, the 2 for the V serves as an OO of the N, immediately following it.  If the V is one that has its 2 expressed as an OO via the preposition P, then this P marks the OO of the N (adhere to - adherence to, rebel against - rebellion against, flee from - flight from); otherwise, the OO of the N is marked by of (capture - capture of, confirm - confirmation of).  If the V doesn't take a 2 (is "lexically intransitive"), then the N has no following object (disappear - disappearance).  If the V has another argument in addition to 1 and 2, or instead of 2, the N inherits it too (remove X from Y - removal of X from Y, donate X to Y - donation of X to Y). 

Meanwhile, the 1 associated with the V is not obligatorily expressed; if it is not expressed, the N is free to have the full range of appropriate determiners, including none (the capture of the enemy soldiers, this rebellion against authority, donation of money to the church), but if it is expressed, the 1 can serve as a (possessive) determiner (our capture of the enemy soldiers, the students' rebellion against authority, Margaret's donation of money to the church).   As in: "Michaels's non-stop fellation of Brady and Belichick".

Another possibility for the expression of the 1 with a derived nominal is as an OO marked by by, as in passive clauses: the capture of the enemy soldiers by our army, a rebellion by students against authority, donation of money to the church by Margaret.  As in: "the non-stop fellation of Brady and Belichick by Michaels".

So fellation behaves syntactically like a derived nominal based on the verb fellateFellatio, on the other hand, is just an ordinary act/event noun, and these don't automatically allow a 2 argument to be marked by of.  Some ordinary act/event nouns allow for marking 2 with a preposition other than of, usually on; surgery is like this: surgery on/*of an emergency patient, surgery on/*of his right arm.  But some ordinary act/event nouns are not comfortable even in this construction; for me, the names of most surgical procedures are like this (*An appendectomy on this patient is advisable, *The/Your nose job on Kim was not entirely successful), as are fellatio (*Fellatio on Tony is uncomfortable) and for that matter blowjob and handjob (*My morning blowjob/handjob on Tony is thoroughly enjoyable).  You can express the 2 argument for these nouns, but it takes an extra verb, which can then be used in a nominal gerund, or converted to a derived nominal: performing/doing an appendectomy on this patient, the surgeon's performance of an appendectomy on this patient, performing fellatio on Tony, providing fellatio to/for Tony, giving a blowjob/handjob to Tony.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:57 PM

Speaking the language of the Indians' enemies

According to The Economist, Mércio Gomes, president of the Brazilian government's Indian affairs agency FUNAI, recently remarked in an interview that the Brazilian Indians have "a lot of land", and that ultimately the Supreme Court would "have to set a limit". In response, Sydney Possuelo, a well-known senior FUNAI commented that Mr Gomes "spoke the language of Indians' enemies" (Mr Possuelo was promptly fired). What did he mean, "spoke the language of"? The language spoken by most of the enemies of the Brazilian Indians is of course Portuguese. Nearly everyone in Brazil speaks it. But that isn't what he meant. What he meant is clearly that Mr Gomes was saying the sort of things that the enemies of the Brazilian Indians say. Why didn't he just say that? The two claims strike me as utterly different. Yet ‘X speaks the language of Y’ is a very common locution (there are over 79,000 raw Google-hits for "speaks the language of", and a brief glance suggests that far more are of the figurative type like "he speaks the language of small government conservatism" than of the literal type like "a caregiver who speaks the language of their child"). It is a snowclone of the most simple and basic kind, in fact. I'm puzzled by the curious indirection that this figure of speech enforces. Surely it is not linguists' pedantry to point out that the language in which you speak does not determine any of the content of what you choose to say.

[Added later: lots of people are emailing me to say that I am being a pedant, this is a perfectly normal use of the word language, it is described in dictionaries (the Oxford English Dictionary's meaning 3.f says "Phr. to speak (talk) someone's language, to speak (talk) the same language: to have an understanding with someone through similarity of outlook and expression, to get on well with someone; to speak a different language (from someone): to have little in common (with someone)"), and so on. I don't think I'm saying anything inconsistent with all this. Note carefully, I am not saying anything here is "incorrect". No one is making a mistake. And the dictionaries are right to cover the use of "talk X's language" in the meaning "get along with X" or "have similar ideas to X". What I'm doing above is musing on how this idiomatic use got started. To me, it seems that speaking the language that you use when you say things is such a different idea from saying the sort of things you say. No argument from me about whether this is a well-established usage; it certainly is. It's just one that I don't feel the slightest bit tempted to use. It grates. Is it all right if I have that tiny little prejudice? Huh? Can I just have an aesthetic preference or two, please? Do I have to be Doctor No-Preferences-Or-Value-Judgments Serious Linguist all the frigging time? Huh?]

Posted by Geoffrey K. Pullum at 09:50 AM

Who needs grammar?

People who can't think for myself, according to a recent Get Fuzzy strip:

Note that Bucky Katt seems to have some grammatical idiosyncrasies in perception as well as in writing:

[via serendipity ]

Posted by Mark Liberman at 09:39 AM

986,120 words for snow job

The subject of the Language Log final exam, loyal readers will recall, was a peculiar article in the New York Times real estate section on the power of buzzwords in the New York housing market. In need of a language expert to lend the imprimatur of authority, the reporter turned to one Paul JJ Payack, president of the Global Language Monitor:

Mr. Payack, who graduated from Harvard with a bachelor's degree in comparative literature, calculated the popularity of some 36 buzzwords chosen by a reporter. He used his Predictive Quantities Indicator, or P.Q.I., an algorithm that tracks words and phrases in the media and on the Internet in relation to frequency, contextual usage and appearance in global media. It is a weighted index that takes into account year-to-year increases and acceleration in the last several months.

Along with calculating the "popularity" of market buzzwords (using his magic "algorithm"), Payack also revealed to the credulous reporter that "as of Jan. 26 at 10:59 a.m. Eastern time, the number of words in the English language was 986,120." This is one of those pronouncements so exquisitely silly that you figure it has to be a put-on. Who would possibly claim to have determined the exact number of words in the language, and that the number would be anything like 986,120? But there it is, proudly featured on GLM's page of language statistics. And now this absurd declaration has spread far beyond the New York Times real estate pages, as the Times of London has spun an entire article out of Payack's number, trumpeting the news that the English language will be welcoming its millionth word some time this summer. Break out the party hats!

It's hard to know even where to begin in analyzing Payack's specious claim. The description of GLM's "methodology" in calculating the number of English words is hard to take seriously:

The Global Language Monitor has attempted to pinpoint the precise number of words in the English Language at a given point in time. To do so, it first established a base number of words in the language using the generally accepted unabridged dictionaries (the O.E.D., Merriam-Webster's, etc.), that contain the historic 'core' of the English language: every word found in the works of Shakespeare, the King James Bible, and the other 'classics'.  It then created a proprietary algorithm, the Predictive Quantities Indicator (PQI) that attempts to measure the language as currently found in print (including technical and scientific journals), the electronic media (transcripts from radio and television), on the Internet and, increasingly, in web logs (blogs). GLM then assigned a number to the rate of creation of new words and the adoption and absorption of foreign vocabulary into the language. The result, though an estimate, has been found to be quite useful as a starting point of the discussion for lay persons, students, and scholars the world over.

So GLM starts with a "core" number of words, evidently based on the sum of entries in unabridged dictionaries. Who knows what that number might be, since even if we consider one particular dictionary there is no simple answer to how many "words" it contains. The second edition of the Oxford English Dictionary has about 300,000 headwords, covering 640,000 words and phrases, according to AskOxford. (The Third Edition, now in preparation, will increase that number to 1.3 million or more.) So do we count headwords? All defined words and phrases? Every distinct sense and subsense of those words and phrases? Every spelling variant? Do archaic words make the cut, and if so, what's the chronological cutoff for "English"? In estimating the size of the lexicon, AskOxford remains admirably agnostic in its FAQ (emphasis mine):

How many words are there in the English language?

There is no single sensible answer to this question. It is impossible to count the number of words in a language, because it is so hard to decide what counts as a word. Is dog one word, or two (a noun meaning 'a kind of animal', and a verb meaning 'to follow persistently')? If we count it as two, then do we count inflections separately too (dogs plural noun, dogs present tense of the verb). Is dog-tired a word, or just two other words joined together? Is hot dog really two words, since we might also find hot-dog or even hotdog?
It is also difficult to decide what counts as 'English'. What about medical and scientific terms? Latin words used in law, French words used in cooking, German words used in academic writing, Japanese words used in martial arts? Do you count Scots dialect? Youth slang? Computing jargon?

Once we step beyond the seemingly authoritative pages of the OED and other unabridged dictionaries, these questions only get muddier. Are we to count every nonce form of the sort found in Urban Dictionary or Merriam-Webster's Open Dictionary? If so, what about all the fleeting nonce usages that existed in those benighted pre-Internet days? Even a big historical dictionary like the OED won't tell you about those, since words and phrases still need to show some staying power before lexicographers will consider them for entry. Or can GLM's mysterious "proprietary algorithm" somehow discern what is nonce and what is here to stay?

None of these questions get addressed by the Times of London correspondent, who seems happy to take Payack at his word as a reliable linguistic authority. (The article dubs him "a Harvard-educated linguist," but the bio on Payack's other Web venture, Yourdictionary.com, mentions no linguistic credentials.) Though we are left in the dark as to the the criteria used by GLM to count "words," Payack does divulge that Chinese-English, or "Chinglish," is largely responsible for the lexicon's latest expansion:

Chinglish terms include "drinktea", meaning closed, derived from the Mandarin Chinese for resting; and its opposite, "torunbusiness", meaning open, from the Mandarin word for operating.
While some are amusing to the British ear, others are abrasive. Public toilets for disabled people in Beijing are marked "deformedman" and in Hong Kong "kweerboy" denotes a homosexual.
The Chinese government has vowed to sweep Chinglish from road and shop signs before the 2008 Beijing Olympics, but is fighting an uphill battle.
Payack...said 20,000 new English words were registered on the company's databases last year — twice as many as a few years ago. Up to 20% were in Chinglish.

Now there's obviously nothing wrong with incorporating various World Englishes in appraising innovations in the language. (See, for instance, the diverse international sources for neologisms catalogued on Double-Tongued Word Wrester.) But how many of the thousands of "Chinglish" words that Payack claims to have recorded are in common use even in China? Google turns up very few attestations of the examples given in the article. For instance, torunbusiness (a running together of to run business?) only shows up in two Chinese-language news articles about improperly used English on Chinese street signs (this article can be "gisted" with Google's translation). If GLM is going to include every novel use of half-learned English on the world's street signs, they've got an awful lot of work ahead of them (even if they restrict their attention to, say, Japan).

All of this merely elaborates on the eloquent observation made by OED pioneer J.A.H. Murray way back in 1888: "The circle of the English language has a well-defined centre but no discernible circumference." So why are ostensibly scrupulous news sources so eager to accept a calculation of this unknowable circumference, without at least asking the opinion of an established linguist or lexicographer? Apparently this is yet another language-related factoid that's just too good to check.

[Update: Grant Barrett notes that there are "millions of chemical names alone," and also takes on some of the other "low-hanging fruit" in the Times of London article.]

Posted by Benjamin Zimmer at 06:00 AM

Big Ben continues to hit that iceberg

Two years ago, I noted Ben Roethlisberger's use of a striking idiom blend in the news conference at the NFL Combines:

Q: Since you started so late as a QB, how much better can you get?
A: "That's the thing. A lot of people talk about quarterbacks who played their whole lives and can't get much better. I'm just starting to hit the iceberg and get going. I have a lot of development I can do, so I believe I can get a lot better."

Now, in addition to being the only person ever to have used that phrase -- at least as far as Google's index knows -- he's the youngest quarterback ever to have won a Super Bowl.

Posted by Mark Liberman at 05:13 AM

February 05, 2006

112 words for misunderstanding meaning?

Robin Marantz Henig's article in today's NYT magazine, "Looking for the lie", doesn't mention Eskimos. However, it presents an interesting -- if snowless -- example of the lexicographic fallacy. Henig opens a discussion of Daniel Langleben's research on the neuroscience of lying by telling us that "The English language has 112 words for deception, according to one count, each with a different shade of meaning".

The point is not that Albion's heirs are especially perfidious, just that lying can be individually and socially complicated, and that Langleben is interested in the possibility that different kinds of lies might have different kinds of neural correlates. Here are the first two paragraphs of the section:

The English language has 112 words for deception, according to one count, each with a different shade of meaning: collusion, fakery, malingering, self-deception, confabulation, prevarication, exaggeration, denial. Lies can be verbal or nonverbal, kindhearted or self-serving, devious or baldfaced; they can be lies of omission or lies of commission; they can be lies that undermine national security or lies that make a child feel better. And each type might involve a unique neural pathway.

To develop a theory of deception requires parsing the subject into its most basic components so it can be studied one element at a time. That's what Daniel Langleben has been doing at the University of Pennsylvania. Langleben, a psychiatrist, started an experiment on deception in 2000 with a simple design: a spontaneous yes-no lie using a deck of playing cards.

Thus the question of vocabulary size is, as usual, irrelevant. The logic of this passage would be unchanged if we English speakers were forced to distinguish different kinds of lies entirely by describing them phrasally -- as Henig in fact does by listing "lies of omission" and "lies that make a child feel better" -- rather than with single lexical items like "malingering" and "exaggeration". Henig's exposition falls into the common fallacy of implying that concepts can only be distinguished by being in a one-to-one relationship with dictionary entries.

This is not just semantic fussiness. It's implausible that there are 112 (or 37 or 243 or 14) distinct and atomic types of deception. Deception is presumably a structured concept with several aspects: the audience, purpose, content, scale, source and justification of the deception, among others. Each aspect can have many values, themselves sometimes complex. Perhaps there are several overlapping conceptual systems involved at once, or at different times, or to different degrees for different people. Deception presumably engages more general neural systems of emotion, memory, communication and so on. It's surely a mistake to think that understanding deception is a matter of listing its 112 atomic types and determining where in the brain each one is localized. This is the sort of thinking that generalizes notions like "the gene for X" and "the brain region for Y" far beyond the (significant but limited) domains where they make sense.

A bit later in the article, Henig quotes Steve Kosslyn making a similar point:

Deception "is a huge, multidimensional space," he said, "in which every combination of things matters."

However, the article immediately goes back to talking in atomistic terms:

Each type of lie might lead to activation of particular parts of the brain, since each type involves its own set of neural processes.

After discussing Langleben's research program, the article gives a misleading impression of the experimental apparatus used in it:

His research involved taking brain images with a functional-M.R.I. scanner, a contraption not much bigger than a kayak but weighing 10 tons. Unlike a traditional M.R.I., which provides a picture of the brain's anatomy, the functional M.R.I. shows the brain in action. It takes a reading, every two to three seconds, of how much oxygen is being used throughout the brain, and that information is superimposed on an anatomical brain map to determine which regions are most active while performing a particular task.

There's very little about being in a functional-M.R.I. scanner that is natural: you are flat on your back, absolutely still, with your head immobilized by pillows and straps. The scanner makes a dreadful din, which headphones barely muffle. If you're part of an experiment, you might be given a device with buttons to press for "yes" or "no" and another device with a single panic button.

I guess that this passage is strictly true as written, but it is likely to leave most readers with the idea that "a functional-M.R.I. scanner" is a special kind of device, different from the device that produces "a traditional M.R.I." But in fact it's exactly the same piece of apparatus, just used in a different way. Here readers are invited to extend the fallacy one step further: the one-to-one relationship assumed for concepts and terms seems to apply to devices as well. An "fMRI scanner" performs a different function from an "MRI scanner", so it must be a different device. But it isn't. (Some technical tutorials on how MRI and fMRI work can be found here and here).

Thus linguists and logicians shouldn't feel picked on. It's true that science writers (or their editors, it's always hard to tell) are often careless with linguistic concepts, but unfortunately they're often no more careful in dealing with issues in physics, chemistry, biology and psychology.

Henig's article deals with a timely and important topic, and surveys a range of interesting research (including Paul Ekman's work on facial expressions). It's written in a clear and engaging style, and has interesting things to say about the role of lies in everyday life, and the unintended consequences that might follow from a genuinely effective technology of lie detection. And it emphasizes the important point that there isn't a single, simple phenomenon of "deception" that might have a single, easily-identifiable physiological correlate. But it's too bad that the article isn't more careful to avoid pushing the same idea back a stage, to the view that there might be a fixed number of well-defined types of deception, each with its own physiological signature.

A good review of some of the issues involved in hi-tech lie detection can be found in Paul Root Wolpe, Kenneth R. Foster and Daniel D. Langleben, "Emerging Neurotechnologies for Lie-Detection: Promises and Perils", American Journal of Bioethics, Volume 5, Number 2 / March-April 2005.

Posted by Mark Liberman at 12:30 PM

February 04, 2006

The proper treatment of snowclones in ordinary English

In October of 2003, Geoff Pullum noted a peculiar sort of "bleached conditional":

If Eskimos have dozens of words for snow, Germans have as many for bureaucracy.
[The Economist, October 11th, 2003, p. 56, col. 2]

I pointed out that a search for {"if Eskimos"} turns up many examples of phrases like "If Eskimos have N words for snow, <some question or assertion about X's vocabulary for Y>", used to frame the idea that Y is especially interesting or important to X. A few days later, Geoff happened to notice another phrasal cohort of the same type -- "In space, no one can hear you X", and wondered what we should call patterns like this. Glen Whitman suggested the term snowclone, and we've been using it ever since. Wikipedia has a snowclone page, and a list of common snowclones.

But two things have been bothering me about about all this. First, phrasal templates like those on the Wikipedia list are often more protean -- and therefore more interesting -- than the descriptions suggest. And second, the original "If Eskimos" example is not really an example of the same thing at all.

First things first. Most of the snowclones in the Wikipedia list are crisp phrasal templates, with one or two open slots to be filled, generalizing a well-documented specific quotation: "To X, or not to X" (from Hamlet's soliloquy); "X for fun and profit" (from a series of how-to books starting with one on stamp collecting); "I, for one, welcome our new X overlords" (from The Simpson's episode "Deep Space Homer"). Others are similarly crisp, even if the origin is uncertain: "X is the MIT of Y"; "X is the new Y".

It's easy to multiply examples of this general type. "Through X with gun and camera"; "ask not for whom the X tolls"; "how do I X thee"; "teach an old X new tricks"; "an X in the hand is worth Y in the Z"; and so on. The procedure is simple and common: take a familiar quotation, title, proverb etc., and fit it to a new context by substituting for one or more of its words.

But the discussion of snowclones so far has suggested that such patterns have a simple and well-defined set of substitution slots, while in fact, people can (and often do) generalize in many different ways from a given phrase,. Thus the proverb "A bird in the hand is worth two in the bush" gives rise to the following -- a small sample of the variants available on the web:

A toad in the hand is worth two in the bush.
A bird in the hand is worth two in the grocery store.
A harp in the hand is worth two in the willows.
A bird in the nesting box is worth two in the bush.
A bird in the head is worth two in the textbook.
A book in the hand is worth two in the bin.
A man in the house is worth two in the street.
A cat in the bath in worth two in the litter tray.
A stone on the board is worth two in the bowl.
A bike on the road is worth two in the shed.
An engine under the hood is worth two in the shop.
A palm in the hand is worth a visor in the mail.
Frank in the bush is worth three in the field!
A beer in hand is worth six in the fridge
a bratwurst on the bbq is worth six in the fridge
Three in the hand is worth about 14 on the black market.
A bird in the hand is worth one from the Bush.*
Two Udalls in the Congress are worth more than two Bushes in the statehouses as far as protecting the environment goes
A bird in the hand is a messy proposition.
A bird in the hand is always safer than one overhead.
A bird in the hand is a certainty, but a bird in the bush may sing.
Would you take the bird in the hand, or a 75% chance at the two in the bush?

Many of the cited snowclone patterns don't support such elaborate elaboration, because there's not much to work with. When your starting point is "Got milk?", for example, or "Mmm, beer!", your options are limited. But where the base form has more substance, or at least more words, the only real constraint seems to be that the starting point should remain somehow recognizable. And that requirement is conceptually circular, since if the source were no longer recognizable, we wouldn't recognize the result as a snowclone. I don't know of any good taxonomic description of the sorts of transformations that are involved here, much less a theory able to predict the distribution of observed variants, but I suspect it would be productive to look into this further. (The issues involve seem to overlap with the problems of construction grammar, whatever exactly we take those to be...)

And now for something completely different: the "Eskimo snow words" examples.

In that case, there's no proverb or quotation or anything like it to start with, and in fact there isn't even a crisply defined pattern of structured words. The template is really an abstract rhetorical structure, which has become such a commonplace trope that we can get a sort of imitation snowclone by cashing the rhetoric out in a particular way: "If Eskimos have N words for snow..."

As Geoff Pullum pointed out in the post that started all this off, the Economist's writer used the conditional

If Eskimos have dozens of words for snow, Germans have as many for bureaucracy.

to mean something like

Eskimos have dozens of words for snow, as everyone knows. Well, Germans have just as many for bureaucracy.

This was intended as an entertaining way to introduce what Geoff calls "the stereotypical bureaucratic ubiquity of the Teutonic world". The basic method is to use the analogical form Eskimos:snow::X:Y to communicate something like "Y is really important to X, so much so that you should think of it as you think of the relation between Eskimos and snow". For extra flavor, add the notion that X is therefore motivated to make many fine conceptual and/or terminological distinctions among types of Y.

It would have been boring -- and perhaps offensive -- to write something like "Germans are notoriously concerned with bureaucracy". So instead, the writer decided to call on the Eskimos:snow::X:Y analogy, instatiated as Eskimos:snow::Germans:bureaucracy, and then to phrase this as "If Eskimos..."

There are lots of other ways to do the same sort of thing. A bit of web searching turns up:

Just as Eskimos have many words for "snow," you probably need many concepts to sort out your thinking on "love."
Just as Eskimos have lots of words for snow, the Welsh language has more than a few words for green.
...just as Eskimos have a rich and precise vocabulary for discussing a wide range of complex snow conditions, so, too, teachers need to develop a language for describing and talking about teaching strategies.
I learned that just as Eskimos have several words for types of snow, sailors have at least six for blends of rum
Just as Eskimos distinguish between a dozen different sorts of snow, so too do the Swiss appreciate a wide range of Gruyeres.
Just as Eskimos are said to have a hundred words to describe types of snow, we’d figured out at least ten for rain.
Just as Eskimos have more than twenty words to describe snow, Americans have as many words to describe comfort.

You know how Eskimos have several different words for snow, and the British several ways of saying how it can rain ...? Well, the Psalmist had a couple of words in Hebrew that we translate blessed.
...you know how Eskimos have about 30 words for snow? We're just as serious about clams.
You know how Eskimos have like 100 words or so for snow? Why don't people in Washington have 100 different words to rain?
You know how Eskimos have, like, a hundred different words for snow? ''The Perfect Storm'' has a hundred different shades of wet.
Do you know how Eskimos have a hundred-odd words for snow because its all around them?.... Well, Japanese has many words for "pervert" too... Guess why. ...
Know how Eskimos have 173 words for the word "snow"? Well, Dixon's has more types of chili than it has chairs at lunchtime.
You know how eskimos have three hundred words for snow? jews have over three hundred words to use to complain about how humid it is.
You know how Eskimos have a zillion words for snow? The French have a zillion words for doughs.

They say that Eskimos have many words for snow. Cats have many words for sleep:
They say that Eskimos have close to 17 words to describe snow. The English vocabulary does not possess adequate adjectives to describe my gratitude to ALL of you
They say that Eskimos have something like forty words for snow. From what we saw in our two days in San Sebastian, the Spaniards in the Basque region must have at least that many for rain.
They say that Eskimos have 52 different words to describe snow... The music of Sibelius and Riisager has at least that many ways to describe the frigid landscape of Finland and Greenland in musical terms.
You know how they say that Eskimos have 100 different words for ice. A student of the Psalms I was reading claimed that he had found 94 different ways to say enemy in the Psalms.

They say that Eskimos have a hundred words for "snow". That may or may not be true, but Runners have far more than a hundred words for guns.
They say that Eskimos have over 200 words for snow. I can’t help but wonder if Nebraskans have just as many words for “flat.”
They say that eskimos have thousands of names for snow, the British have millions that can be used as insults.

...among many, many others. And you can see a very different use of the same analogical schema in the title and opening paragraph of Joseph's Reagle's W3C Note "Eskimo Snow and Scottish Rain: Legal Considerations of Schema Design".

Eskimos have many words for snow; Scots have numerous words related to rain. This concept has achieved near urban myth status -- though it continues to be contentious amongst linguists [Who40]. The idea is compelling because it speaks to our belief that the mechanism of speech itself is a reflection or [sic] our world and what we wish to say. Within this paper I examine the mechanisms by which our computer agents will express and understand what we wish to say in order to form online agreements.

Here the message seems to be that internet applications need many words for... well, see if you can figure it out for yourself, this post is long enough already.

[Note that the title of this post refers to Richard Montague's famous 1973 paper "The proper treatment of quantification in ordinary English". Needless to say, there is no real implication that other discussions of this topic are "improper": the title is a joke, not a (nominalized) statement. For that matter, there's no reason to refer to "ordinary English" except to echo Montague's title, since all discussions of this topic have dealt with phrasal and rhetorical patterns in ordinary English. Needless to say, there are a fair number of prior PTQ snowclones in the literature, such as:

On the proper treatment of opacity in certain verbs
The proper treatment of optimality in computational phonology
The proper treatment of symbols in a connectionist architecture
On the Proper Treatment of Context in NL
Discourse structure and the proper treatment of interruptions.
The Proper Treatment of Quantifiers in Ordinary Logic


[Update: some thoughts from Russell Lee-Goldman:

It certainly seems like categorizing and describing (at least certain) types of snowclones is certainly very similar to the sorts of things that construction grammar is often used to describe. Of course CxG per se doesn't, AFAIK, have anything to say about what exactly makes two expressions seem close enough that one can evoke the memory of the other. For instance, consider this popular tagline:

sometimes you feel like a nut, sometimes you don't

For me, anyway, any time someone says "sometimes you feel," the whole phrase is "activated." On one level, it's common enough to juxtapose [sometimes [positive proposition]] [sometimes [negative proposition]], so why not with "feel." But on another level, you can draw on the stored memory of a particular famous instance of that construction in order to build from it.

Similarly, you could say that there's some sort of

[[cardinal number] X in the hand is worth [cardinal number+n] Y in the bush]

construction. Or more generally

[[quantity] X ['in control/possession'] BETTER-THAN [quantity] Y ['only potentially available']]

But this is getting very abstract, and is more like a guiding principle in life rather than a linguistic construct. That's why something like:

--I'd rather have a single book on my bookshelf than several available in the library.

is barely recognizable as an extension of the "bird-hand-bush" idiom, because it's only vaguely syntactically similar, and expresses an idea that may well exist independently from the initial idiom. On the other hand,

--A bird in the hand is always safer than one overhead.

Does not express the semantics OR syntax of that very general construction above, but maintains the "bird in the hand" part, which clearly marks it as an attempt to evoke the idiom.

This sort of exercise reminds me of what Paul Kay called "patterns of coining," though his argument was in particular against the idea that some constructions (like "resultative" or "caused motion") were productive, instead claiming that there are some non-productive patterns that speakers nonetheless use to coin new instances of familiar patterns. Snowclones are slightly different since the test of productivity is not grammaticality (in the narrow sense), but rather effectiveness in evoking some familiar concept.


Certainly the evocation of familiarity plays a special role in these allusions to well-known phrases. But allusion and construction share the tension between analogy-as-cause and analogy-as-result. On one view, the process of analogical pattern generalization is the psychological source of new instances (of snowclones or of constructions), making new phrases from old ones. On another view, the perceived similarities among phrases (and their associative effects) are an evoked response to patterns created by some non-analogical process, in which particular phrasal models (with or without a few variables added) play no role. Of course, you could easily believe that both sorts of process are involved, or that true understanding reveals this to be a false dichotomy, as in connectionism's rejection of the distinction between remembering and inventing.]

Posted by Mark Liberman at 10:41 PM

Commercial hybridity, Super Bowl edition

I was intrigued to read the news that one of the high-profile commercials running during the Super Bowl on Sunday would be bilingual, mixing English and Spanish. Fittingly enough, the ad is for the new Toyota Camry that runs on both gas and electricity, with a father and son carrying on a hybrid conversation in their hybrid car. One article painted the commercial as a harbinger of increased Spanish-English code-switching in U.S. advertising, part of a general trend of companies "probing an undertapped market of bilingual consumers." When I tracked down the commercial on Toyota's website (viewable here), I was less than impressed by the code-switching on display, though I found the metalinguistic aspects of the father-son interaction rather thought-provoking.

Here is a transcript of the conversation between the father and son:

Son: Papá, why do we have a hybrid?
Father: For your future!
Son: Why?
Father: It's better for the air, and we spend less because it runs on gas and electrical power. (Points to dashboard display.) Mira, mira aquí. It uses both.
Son: Like you, with English and Spanish!
Father: Sí!
Son: Why did you learn English?
Father: (Pauses.) For your future!

Clearly the agency that designed the ad tried to make the actual Spanish content of the dialogue as unobtrusive as possible. The son's use of the vocative Papá presents no problems for an Anglophone audience, nor does the father's response of "Sí!" That leaves one sentence, "Mira, mira aquí" ('Look, look here'), as the only bit that might perplex viewers with no knowledge of Spanish. [*] Considering how cautiously the sentence is interjected (it sounds as if it were dubbed over in post-production), I'm surprised that the father didn't immediately translate the sentence for his son into English. But he does point conspicuously to the fancy dashboard display of the energy monitor just in case there's any confusion.

Even if the dialogue only has a light patina of Spanishness so as not to disturb the Super Bowl-viewing public, the commercial is still notable for its metalinguistic foregrounding of bilingualism itself. The son makes the quick connection between the car's hybridity and his own father's hybrid language use. Mixing gas and electrical power is just like mixing English and Spanish! (Out of the mouths of babes!) Previous commercials for gas-electric hybrid cars have played with the notion of hybridity — as, for instance, this commercial for Honda's Civic Hybrid showing images of paint-mixing, turntable-mixing, and so forth, culminating in the tagline, "the perfect mix of fuel efficiency and performance." But the Toyota commercial makes the bold leap of equating the practicality of having a hybrid car with the practicality of being bilingual.

That might be a subtly subversive message for Americans comfortable with the hegemony of the monoglot Standard, as Michael Silverstein has termed it. The message is made more palatable, however, by being couched in stereotypical images of the American dream: the father has "made it" and is now able to provide for his son's future by buying a hybrid car. And the linguistic side of his success story is not so much that he has remained bilingual but that he has learned English — for his child's future. Thus the image of the monoglot Standard remains unsullied, with Latino families happily incorporated into an English-speaking consumer culture.

The creators of these "code-switching" commercials are trying to perform a difficult balancing act. On the one hand they want to grab the attention of Latino viewers with the salient use of Spanish. But the ads apparently can't sound too Spanish, for fear of offending Anglo sensibilities. Better then to fall back on innocuous images of the nuclear family (as with a new bilingual Cheerios commercial which also revolves around father-child interaction). Then bilingualism can be deemed acceptable, as long as it remains in the confines of the family (and only then with the occasional sprinkling of Spanish into English discourse). Then again, the fact that a commercial is emphasizing bilingualism in the midst of the all-American ritual of the Super Bowl is some sort of advance in the public recognition of the nation's linguistic and cultural diversity. Baby steps, baby steps.

[* Update: Count me as one of those viewers perplexed by Spanish. I originally transcribed the sentence as Mire, mire aquí, but Pat Schweiterman and Jena Barchas Lichtenstein have pointed out that the familiar imperative mira is used rather than the formal mire, as befits a father addressing a son.]

Posted by Benjamin Zimmer at 01:35 AM

February 03, 2006

Standing up to linguistic terrorism

The world is full of menace these days. Ayman al-Zawahiri boasts that Al Qaeda will bring Americans "catastrophes and tragedies", unless perhaps George W. Bush converts to Islam. Fatah "militants" are promising to "shell the headquarters of the EU and all European countries" if the governments of Denmark, France and Norway don't "officially apologise" for failing to stop newspapers in their countries from printing cartoons of Muhammad. But the most consistent and widely applicable (if least serious) threats are those of home-grown Anglophone grammar terrorists. In a recent example, automotive journalist Jeremy Clarkson has put us all on notice that he will respond to the misuse of pronouns with an especially atrocious form of assault:

If you send a letter to a client saying “my team and me look forward to meeting with yourself next Wednesday”, be prepared for some disappointment. Because if I were the client I’d come to your office all right. Then I’d stand on your desk and relieve myself.

Talk about your "dirty bombs"...

Well, we here at Language Log are united in our determination not to surrender to terrorists. We defy al-Zawahiri, we support the freedom of cartoonists, and we stand firm against Clarkson's threats. Hey, Jeremy, I can't say that we're looking forward to the experience, but my colleagues and me are prepared to watch yourself soil the reception desk at Language Log Plaza. It's due for replacement anyhow. After we rub your nose in it.

In case I haven't offered enough provocation, I'll also respond to the opening paragraph in Clarkson's Times opinion piece, in which he names what he considers to be the "worst word in the world":

Wog. Spastic. Queer. Nigger. Dwarf. Cripple. Fatty. Gimp. Paki. Mick. Mong. Poof. Coon. Gyppo. You can’t really use these words any more and yet, strangely, it is perfectly acceptable for those in the travel and hotel industries to pepper their conversation with the word “beverage”.

But it's not just "those in the travel and hotel industries", Jer. The OED cites Caxton, Shakespeare and Boswell, among others:

1475 CAXTON Jason 52 Metes delicious and with al beuurages and drynkes sumptuous.
1611 SHAKES. Wint. T. I. ii. 346 If from me he haue wholesome Beueridge.
1791 BOSWELL Johnson (1831) I. 297 Tea..that elegant and popular beverage.

Mark Twain used this "worst word in the world" in Life on the Mississippi, writing of Burlington, Iowa:

It was a very sober city, too -- for the moment -- for a most sobering bill was pending; a bill to forbid the manufacture, exportation, importation, purchase, sale, borrowing, lending, stealing, drinking, smelling, or possession, by conquest, inheritance, intent, accident, or otherwise, in the State of Iowa, of each and every deleterious beverage known to the human race, except water.

And Thomas Jefferson used it too, writing about beer in a letter that

I wish to see this beverage become common instead of the whisky which kills one-third of our citizens, and ruins their families.

Charles Dickens tells us that

Mr. Pickwick could not resist so tempting an opportunity of studying human nature. He suffered himself to be led to the table, where, after having been introduced to the company in due form, he was accommodated with a seat near the chairman and called for a glass of his favourite beverage.

Walt Whitman, in a prose piece, modified the Worst Word with the same adjective:

We took our seats round the same clean, white table, and received our favorite beverage in the same bright tankards.

Charlotte Bronte has Jane Eyre exclaim

How fragrant was the steam of the beverage, and the scent of the toast!

A search on LION turns up more than 350 poetic examples, including some from John Dryden:

Assisted by a Friend one Moonless Night,
This Palamon from Prison took his Flight:
A pleasant Beverage he prepar'd before
Of Wine and Honey mix'd, with added Store
Of Opium ...

William Wordsworth:

Yet more,---round many a Convent's blazing fire
Unhallowed threads of revelry are spun;
There Venus sits disguisèd like a Nun,---
While Bacchus, clothed in semblance of a Friar,
Pours out his choicest beverage high and higher
Sparkling, until it cannot choose but run
Over the bowl, whose silver lip hath won
An instant kiss of masterful desire---
To stay the precious waste.

and again:

But still, to a bosom susceptibly placid,
The anguish of love will but heighten the joy;
As the bev'rage uniting a sweet with an acid,
Is grateful, when nectar untempered would cloy.

John Keats:

Hence Burgundy, Claret, and Port,
Away with old Hock and Madeira,
Too earthly ye are for my sport;
There's a beverage brighter and clearer.

Robert Browning:

Thomas stands abashed,
Sips silent some such beverage as this ...

and again

I did not call him fool, and vex my friend,
But quietly allowed experiment,
Encouraged him to spice his drink, and now
Grate lignum vitæ, now bruise so-called grains
Of Paradise, and pour now, for perfume,
Distilment rare, the rose of Jericho,
Holy-thorn, passion-flower, and what know I?
Till beverage obtained the fancied smack.

So, Jeremy, after a frank exchange of views in the reception area at Language Log Plaza, we'll invite yourself to retire to the cafe, where we look forward to offering yourself a generous sample of additional quotations, along with a tankard or two of your favorite beverage. You might even get that fancied smack, if you insist on it.

Posted by Mark Liberman at 06:51 AM

Who wants to die, asks Baby David

There's an urban legend that an early speech recognition system heard "recognize speech" as "wreck a nice beach". That one's made up, but it's not a legend that BBN's Podzinger recently transcribed "say Jesus is Lord" as "Beijing this morning", or "a moment in your life" as "remote wooded delight". The perils of ASR should be getting some sympathy these days from the publisher of that Elmo book on potty training, the one where you press Baby David and hear (something that sometimes sounds like) "who wants to die?" for what was recorded as "who has to go?".

This appears to be an unfortunate artifact of carelessly done compression, which was not detected before publication because of the effects of lexical priming: as everyone who ever gave a speech synthesis demo knows, perceptions of distorted speech are strongly influenced by expectations. Thus the folks responsible for quality control on this book weren't irresponsible, they were just primed. Well, maybe they were a little irresponsible -- if you're going to be in that business, you should know that you need to check how the compressed and re-created sounds are perceived by people in the position of the product's users, rather than its creators.

I don't have a lot to add to Brent Edwards' discussion, except to observe that the actor who produced Baby David's voice is using a fundamental frequency (pitch) that peaks (in "go"/"die") at about 690 Hz., or about f2 (the second F above middle C), well into the soprano range. This is higher than the lowest vocal tract resonance for a vowel like [o], and produces a set of overtones (690, 1380, 2070, 2760) that will not fill in the standard resonance pattern for such a vowel very clearly, since the resonance peaks are likely to be at roughly 500, 1000 and 2500 Hz. In fact, it's a bit of a mystery why such high voices can be understood at all -- presumably it has to do with the brain's ability to recover the resonance pattern from the way that time-varying overtones sweep across it -- but listeners will still sometimes mistake overtones for resonances.

I guess I could also observe that aside from the effects of compression distortion and high fundamental frequency, many American youth have hardly any high back vowels left at all anyhow, having fronted /u/ and /o/ in pretty much every context. But the quality of the available recording in this case is very poor -- I recorded it from the compressed stream of a TV clip available on the internet, so that we have the original compression, the TV technician's recording of the book's playback over its tiny little speaker, and the compression involved in the TV clip's distribution as well. With a low-quality nth-generation copy like this, it's hard to assign clear causes to the obvious effect.

Posted by Mark Liberman at 06:35 AM

February 02, 2006

Annals of animalistic analogies

Here's one of those odd coincidences. Earlier today I read Mark Liberman's post about Vladimir Nabokov's prophetic vision of emoticons, which links back to a post Mark wrote in 2003 about Nabokov. In the earlier post, Mark mentions the famous line attributed to Roman Jakobson when asked if Nabokov should be given a faculty position at Harvard:

"I do respect very much the elephant, but would you give him the chair of Zoology?"

A few hours later I ended up at Brendan Wolfe's entertaining blog The Beiderbecke Affair, led there by a blog feed that picked up on a post about Arnold Zwicky's disquisition on "the vocabulary of toadying." In the post right below that one is an excerpt from a column by Martin Peretz in The New Republic (subscription required). Peretz sternly takes Garrison Keillor to task for his hilarious evisceration of Bernard-Henri Lévy's American Vertigo in Sunday's New York Times Book Review. (The subtitle of Lévy's book is "Traveling America in the Footsteps of Tocqueville," but Keillor remarks that the book is really about the French: "There's no reason for it to exist in English, except as evidence that travel need not be broadening and one should be wary of books with Tocqueville in the title.") Here's the excerpt from Peretz:

There is no philosophical argument in his hostile review, only a litany of ridicule. Keillor is an American mythographer, the nostalgic and reverse-snobbish creator of Lake Wobegon, his hugely successful line in middlebrow American sentimentality and a much greater assault on America's gray matter than Lévy's reflective visits to Las Vegas and Dealey Plaza in Dallas. What does the inventor of  "A Prairie Home Companion" know about the tradition in which Lévy is working? Suppose Sartre had followed Tocqueville to the United States and written about his journey. To whom should the Times have assigned its review? To Thornton Wilder? Yes, that's it: the author of Our Town. There's the connection: Tocqueville was fixated on small-town America, and many of the classic community studies of the early decades of the last century were written about places called Middletown and Elmtown, Plainville and Hilltown, Yankeetown and Southern Town, false names for actual polities with a few thousand souls. So maybe Keillor was actually an inspired choice. Why shouldn't a bird review an ornithologist?

Posted by Benjamin Zimmer at 11:25 PM

Tong-maker the Kong-maker, and other translational follies

I recently came across a press release about an online English-Malay translation tool that promises "real-time translation and searching of the whole Internet in Malay." The Malaysia-based company, Linguamatix, claims that their product can translate between English and Malay at a rate of 500,000 words per minute, compared to the mere 5,000 words per minute achieved by commercial translation systems for other languages. The company's goal is to allow the Malay-speaking public to surf the Web and read all English-language webpages instantaneously translated into their native language. Linguamatix is also planning to apply its high-speed translation engine, LinguaBASE, to various other language pairs.

The company is currently offering its translation service, known as LinguaWeb, in a free online trial version. According to the press release, the trial version has been made publicly available "for a limited period while Linguamatix assesses its current capacity to continue providing such services." I took LinguaWeb out for a spin, and it looks like they're still working out the kinks. One can surf the Web in either English or translated Malay by entering in a search term, but the Malay option frequently returns timeout errors. However, as with other translation tools like Altavista's Babel Fish and Google Translate, one can also supply a URL and get a translated version of the page in return (either from English to Malay or Malay to English). That feature works quickly and cleanly, and the resulting translations seem roughly on par with Babel Fish et al. in terms of accuracy. But like other automatic translators, LinguaWeb has some peculiar stumbling blocks.

The first page that I chose to translate was my Language Log entry "Nias, Komodo, and 'Kong'," about Indonesian connections in the original 1933 version of King Kong, directed by Merrian Cooper. The output from LinguaWeb shows how much trouble most translation tools have with any snippet of text that is even slightly idiomatic or noncompositional. The first sentence of the entry begins:

I have yet to find three hours to devote to Peter Jackson's remake of King Kong...

And here is the Malay version from LinguaWeb, with my item-by-item gloss:

Saya tetapi ada untuk mencari tiga jam untuk menumpukan untuk buat semula Peter Jackson bagi Raja Kong
look for
remake (v.)
Peter Jackson for

Needless to say, the Malay version only makes the vaguest of sense when the sentence is strung together (I'd gloss it back into English as, "I but have for looking for three hours for devoting for remaking Peter Jackson for King Kong"). As LinguaWeb's press release acknowledges, the translation tool is mostly useful for "gisting purposes." (See the Wikipedia entry on machine translation, or MT, for more on "gisting.") Not that other automatic translators do a better job with such tasks — hence the bizarre results that quickly generate from serial Babelfishing, or even a single translational cycle of something particularly idiomatic, such as the English-to-Italian-to-English output from "Rapper's Delight." (Here's how my "Kong" post looks translated back into English.)

Finding humor in automatic translation is nothing new, by the way. Take this old MT urban legend, as it appeared in Art Buchwald's syndicated column of July 2, 1959 (Buchwald is discussing the International Conference on Information Processing):

At the beginning of the conference one of the lecturers was describing a machine which translates English into Russian. The first phrase put through the machine was, "The spirit is willing, but the flesh is weak." But the Russian equivalent which came out read: "The whisky is fair, but the meat is foul."

As it turns out, that joke is actually more than a century old, long predating the advent of MT. Here's the earliest version I've found in the newspaper databases:

Decatur (Ill.) Herald, Jan 20, 1903, p. 5
A student at Berkeley contributes the following: Many ludicrous mistakes are made by foreigners in grasping the meaning of some of our common English expressions. A young German attending the state university translated "The spirit is willing, but the flesh is weak" into "The ghost is willing, but the meat is not able." And a Filipino youth fairly set the class in an uproar by the statement that "Out of sight, out of mind" meant "The invisible is insane."

(The "out of sight, out of mind" line has also frequently been hauled out for the MT age, with the purported mistranslation sometimes appearing as "invisible idiot" or "blind idiot.") But beyond repeating stale jokes about the hazards of translating idioms literally, I'm curious about another common problem with automatic translators: the (mis)recognition of proper names.

When I was first skimming through the Malay translation of my "Kong" post, I noticed the collocation "Pembuat Tong" in places referring to the film's director, Merrian Cooper. Since pembuat means 'maker' in Malay, my first thought was, "That's strange... Did I write 'the maker of Kong' and it's coming out as 'the maker of Tong'?" Then it dawned on me that tong is Malay for 'barrel(s),' and LinguaWeb had translated Cooper's name according to the literal (but rare) meaning of cooper: 'one who makes or repairs casks or barrels.' Similarly, a mention of Mark Liberman in another post comes out as "tanda Liberman," or '(the) mark or sign (of) Liberman.'

Again, LinguaWeb is no worse than other translation tools in this department. Here's how Babel Fish does with these two names, regardless of the context of their appearance:

Merrian Cooper
Mark Liberman
Kuiper Merrian Teken Liberman
German Merrian Faßbinder Markierung Liberman
French tonnelier de Merrian marque Liberman
Spanish fabricante de vinos de Merrian marca Liberman
Portuguese cooper de Merrian marca Liberman
Italian cooper di Merrian contrassegno Liberman
Russian бондарем Merrian меткой Liberman
Chinese-simp Merrian 木桶匠 标记Liberman
Chinese-trad Merrian 木桶匠 標記Liberman
Japanese Merrian のたる製造人 印Liberman
Korean Merrian술장수 표Liberman

Babel Fish managed to recognize "Mark Liberman" as a proper name in only one language, Greek. But the Greek translation for "Merrian Cooper" came out as "βαρελοποιός Merrian." That first word is given as a Greek term for cooper by Answers.com, but when I feed it back into Babel Fish I'm told that it means "gravimeter," bafflingly enough.

Clearly Babel Fish and LinguaWeb are working from lexicons where mark means 'sign' and cooper means 'barrel-maker,' without any information about common given names like Mark or common surnames like Cooper. But how hard would it be to include a heuristic that guesses whether a given collocation is a proper name, especially when it is in the form Given Name + Surname? I would think in many cases there would be dead giveaways — little things like capitalization, or the relative frequency of the barrel-making cooper vs. the surname Cooper in contemporary English usage. But perhaps I'm expecting too much of tools intended merely for "gisting purposes."

Posted by Benjamin Zimmer at 05:00 PM

Invention of the supine round bracket

As a footnote to the business about Cingular trying to patent emoticon-entry methods, Pekka Karjalainen has reminded me that I should have linked to Vladimir Nabokov's suggestion in a 1969 NYT "interview":

Q: How do you rank yourself among writers (living) and of the immediate past?

Nabokov: I often think there should exist a special typographical sign for a smile – some sort of concave mark, a supine round bracket, which I would now like to trace in reply to your question.

As far as I know, no patent application followed.

I've put "interview" in scare quotes because the version that I've found on line is due to Nabokov himself, who describes it as follows:

In April, 1969, Alden Whitman sent me these questions and came to Montreux for a merry interview shortly before my seventieth birthday. His piece appeared in The New York Times, April 19, 1969, with only two or three of my answers retained. The rest are to be used, I suppose, as "Special to The New York Times" at some later date by A. W., if he survives, or by his successor. I transcribe some of our exchanges.

Note that 22 of Nabokov's "interviews" with representatives of various publications can be found here (scroll down to the section labelled "Interviews"). All are worth reading, including the 1964 interview with Alvin Toffler which I referenced in a previous post for its elegant little joke about Fulmerford. I'll reproduce here Nabokov's introductory description:

This exchange with Alvin Toffler appeared in Playboy for January, 1964. Great trouble was taken on both sides to achieve the illusion of a spontaneous conversation. Actually, my contribution as printed conforms meticulously to the answers, every word of which I had written in longhand before having them typed for submission to Toffler when he came to Montreux in mid-March, 1963. The present text takes into account the order of my interviewer's questions as well as the fact that a couple of consecutive pages of my typescript were apparently lost in transit. Egreto perambis doribus!

Extra points for decrypting the final "quotation" :-)...

[Update: I suspect that VN would have enjoyed this.]

Posted by Mark Liberman at 10:40 AM