January 31, 2006

The vocabulary of toadying

A little while back, on his blog Daily Dish, Andrew Sullivan took a quick swipe at Fred Barnes's adulatory biography Rebel-in-Chief: How George W. Bush Is Redefining the Conservative Movement and Transforming America (Crown Forum, 2006), using a reference to oral sex.  Here's Sullivan on 1/15/06:

BREMER'S BOMB-SHELL: And Fred Barnes' fellatial biography of Bush. (He makes Powerline read like the Daily Kos.) I try and make sense of each here. More on Fred's book soon.

The very next day, on the National Review's blog, John Podhoretz sputtered about "Andrew Sullivan's anti-gay invective", in a posting that seems confused on several fronts: Podhoretz apparently thinks that words have one single meaning in all contexts, and that gay men (like Sullivan), who presumably have a positive attitude (once again, in all contexts) towards performing fellatio, are being hypocritical when they characterize toadying, negatively, by analogy to one man fellating another.  (It's not easy to unpack Podhoretz's unhinged, shouting-in-capitals rhetoric, so my analysis here might be off the mark.)

Later that day, James Wolcott (of Vanity Fair), on his blog, mocked Podhoretz's fuming but embraced the imputation of homoeroticism in neocon gushing over GWB.

This was how things stood when I first learned about these exchanges, from John Calendo on the entertaining gay blog Nightcharm (yes, I'll give you a link, but that will come below the fold) on 1/19/06, in a piece titled "A Blowjob By Any Other Name":

It's a wonder so useful a word [as fellatio] was never put in adjectival form until last Sunday, when it was invented by conservative gay pundit Andrew Sullivan, the man we love to hate.

He used it in a book review, to indicate the fawning, on-their-knees way conservative blowhards write about this, the worst of all possible presidents (save one: Nixon still wins that derby.)

Recency Illusion alert!  This was by no means the first time Sullivan had used fellatial, and plenty of other people have used it.  In fact, fellatial is one of NINE attested adjectives in the fellat- family; it's not even the most frequent in Google webhits, coming in way behind fellatory, which is the only one to make it into the OED.

Now to go through these things systematically.  First, the fellat- family of words in English.  Then, some reflections on the vocabulary of toadying, the many uses of the sexual lexicon (or, why fellatial isn't necessarily a homophobic slur), and the attitudes of gay men toward fellatio (or, why it's not necessarily hypocritical for a gay man to use this word disparagingly).



Two words of warning.  First, the Nightcharm site offers, in its own words, "gay porn, blog and naked men pictures".  The blog part brashly covers all sorts of things of interest to gay men, and potentially to many people.  But there's really no avoiding the gay porn and the naked men, so if it makes you uncomfortable to be close to this stuff, don't go there.  If you're cool with that, or positively disposed, the "Blowjob By Any Other Name" posting is here.  The accompanying photos are mostly naked French rugby hunks, plus a Japanese sex doll (female) "looking very fellatial".

Second, at this point I'm going to abandon the elevated register of the English sexual lexicon ("oral sex", "perform fellatio", "fellate" above) in favor of the vernacular -- what I think of as "plain talk" -- because I dislike the distancing and shrinking-away effect of the more technical vocabulary.  You might well feel otherwise, but at least I've warned you about what's to come.

Ok, on to a brief introduction to the fellat- family.  In three dictionaries I respect (and have extremely easy access to while I'm sitting at my desk at the Stanford Humanities Center outpost of Language Log Plaza) -- OED2, AHD4, and NOAD2 -- there are listings for the nouns fellatio (cocksucking, the act) and fellator (cocksucker, the actor) and the verb fellate (suck cock, perform the act).  NOAD2 has only these three basic items.  AHD4 has the anglicized variant fellation for the act noun.  And OED2 also gives the feminine actor noun fellatrix (the variant fellatrice is attested, but not in the OED) and the adjective fellatory, noting that fellate is a back-formation and that fellatory is built on it.  (There are also occurrences of fellatiate, built directly on fellatio, or possibly a blend of fellatio and fellate, as an alternative to fellate.)

(A digression.  People have occasionally objected to the noun fellation to me on the ground that the "correct" noun is the Latin fellatio -- even though fellatio is already anglicized in pronunciation, to rhyme with ratio, not patio or potty-o.  It is true that fellatio came first, and was used as an unassimilated Latin word in medical or other "scientific" discussions of sex, and as a coded word in elegant pornography, where all the racy bits were for some time in Latin or Greek.  But some time ago the word passed into ordinary language, though as part of an elevated register; fellatio is the word you use to talk about cocksucking in "polite" contexts that allow this as a possible topic of discourse at all.  At that point, it's natural to anglicize it fully, like the zillions of other English nouns in -ation that trace back to Latin nouns with a nominative singular in -a:tio: and genitive singular in -a:tio:nis, some taken directly from Latin, some indirectly via French.  The noun fellation is, in fact, reasonably common, though nowhere near as common as fellatio.)

(Another digression.  A few people object to the verb fellate on the ground that it's a back-formation; presumably, fellatory would be objectionable as a result.  Well, there are people who object to any back-formation they perceive as recent, which is to say, any back-formation they recognize as one -- but, eventually, back-formed verbs cease to be seen as innovations, especially if they're really useful.  OED2's first citations for fellate are from 1968 and 1969 -- Updike's Couples and Legman's Rationale of the Dirty Joke, respectively -- but I have no doubt that some sleuthing will take the dating back at least another decade or two, so the verb is not exactly a recent thing.  In raw Google webhits as of 1/30/06, fellate gets 74,900, which is pretty respectable for an item from an elevated register.  In any case, if you want to talk about cocksucking in an elevated register, it's hard to do without fellate as the verb, since the alternatives -- perform fellatio on, perform oral sex on, copulate orally with, etc. -- are wordy and clunky.)

Back to the rest of the fellat- family.  Attested as alternatives to the actor nouns fellator/fellatrix (fellator is sometimes used of women, by the way, another sign that we're moving away from Latin) are fellatist and fellationist, though they're much less frequent than fellator/fellatrix.

Here I point out that every single member of fellat- family that I've mentioned so far has both literal uses (referring to events in which actual dicks are in actual mouths) and metaphorical, or figurative, uses, referring to praising, admiring, pandering, fawning, sycophancy, obsequiousness, and the like -- acts, relationships, and attitudes in what I'll call "the toadying domain".  Situations in the toadying domain involve two participants, an ADULATOR and a RECIPIENT of the adulation, and there are at least three relevant aspects of the relationship between them: (1) REGARD: the adulator appreciates, admires, possibly worships the recipient, regards the recipient highly; (2) DEFERENCE: the adulator shows deference, submission, or subservience to the recipient; and (3) EAGERNESS TO PLEASE: the adulator is eager to please the recipient.  All three aspects can vary in degree.  Some situations in the toadying domain show a fourth component: (4) THE ICK FACTOR: the adulator is willing to do things they find unpleasant or humiliating in the service of the recipient.  Figurative cocksucking often has a pretty big ick factor.

Some examples:

fellatio: Anyone who claims that artistic fellatio is not rampant in the arts in general ... Unsurprisingly the writing is a veritable Johnny Wadd of fellatio. (link)

fellation: Even I was getting fed up with the non-stop fellation of Brady and Belichick by Michaels...  [in discussion of 2005 NFL playoff] (link)

fellator: We've been waiting almost a week, you acne crippled terrorist fellator, yet you've yet to address this... (link)

fellationist: What makes Donald Wildmon think his fundy fellationist knuckle dragging 'Deliverance' inbred followers could afford A Ford truck? (link)

fellatist: Wow. After reading that, I once again have to wonder just what the hell Bush is thinking. Then I remember Fox is renowned to be a consummate fellatist.  (link)

fellate: There was never an enemy of the US that Klintoon DIDN'T fellate.  (link)

    And as long as you continue to fellate at least some of my favorites I'll keep coming back ... Sorry, I can't fellate everyone's favorite band. Farewell. (link)

    it seems that the art world is very insular and artists merely metaphorically fellate one another while simultanaously ripping off rich idiots who think ... (link)

fellatiate: Now we have a two weeks of Packer Luv Orgy on all the networks. I can't wait to hear how Madden will verbally fellatiate Favre this week. And Berman will go into some sort of ecstatic wet dream on ESPN about Favre and what a super human being he is. (link)

On to the adjectives in the fellat- family.  There are nine attested adjectives, four of them with 200 or more raw webhits on 1/25/06:
    1.  fellatory: 3,390 hits
    2.  fellatial: 853
    3.  fellative: 346
    4.  fellatic: 212
    5.  fellational: 23
    6.  fellationary: 20
    6.  fellationic: 20
    8.  fellatorial: 5
    9.  fellatiary: 4
(Not attested, on the web or in newsgroups: fellatoric, fellatorian, fellatoriary, fellatistic, fellatonic, fellationistic, fellationical.)  People have certainly been creative with English morphology in order to get an adjective related to fellatio.

The first seven adjectives are all attested in both literal and figurative uses.  The adjective fellatorial (#8) is attested only in literal uses, fellatiary (#9) only in figurative uses, but this is probably just a consequence of the small numbers involved.  Some metaphorical examples:

fellatory: The interviews range all the way from obsequious to fawning to fellatory. Two of the worst are those with Sylvia Benso and ... (link)

    As for "barbaric and backward", well, that pretty much sums up my attitude toward Europe's fellatory attitude toward Arab-Muslim tyrants and terrorists. (link)

fellatial: WENNER TAKES ALL
... And we're no longer shocked to find that Wenner's indebtedness to Clinton translates into fellatial coverage of the president in the pages of Rolling Stone. And this toadying to a man who expanded the drug war to new and invidious heights!  (Andrew Sullivan's Daily Dish, 2/22/01)

    THAT TONY BENN INTERVIEW: Like many former apologists for Soviet terror, the British lefty, Anthony Wedgewood Benn, has a soft spot for Saddam Hussein. His interview with the monster will surely rank high up there in the annals of moral obtuseness along with Jimmy Carter's fellatial interactions with various mass murderers.  (Andrew Sullivan's Daily Dish, 2/2/03)

    In other chess news, World Champion Vladimir Kramnik now has his own website, just in time for his title defense against Peter Leko later this month. The site features fellatial sponsor profiles of the "cosmopolitan" Russian champ and the "ascetic" Hungarian challenger.  (Archives de Colby Cosh, 9/2/04)

fellative: I give you the classic Washington mode of the fellative. self-consciously literary interview instead. The kind Cox would and does ridicule online. (link)

    The new Bob Woodward book, Plan of Attack, is out on shelves, now, to fellative Hosanna by the Times' Pulitzer prize-winning Michiko Kakutani...  (link)

fellatic: Now that the son of a bitch is dead, the media is, of course, back in a full fellatic frenzy. How well they remember their beloved position, on their knees, ... [note extended metaphor] (link)

fellational: Nowhere else in Blogistan can we find such sensational, fellational Minaya-hyping AND Boras-flacking posted with such impunity. (link)

fellationary: In addition to discouraging fellationary interviews with the terrorists who raped Russian schoolchildren, Putin may have also made a crude calculation. (link)

fellationic: ... thanks to their decades-long uncritical (nay, fellationic) regard for any Republican whatever, regardless of his actual track record on Second Amendment ... [about the NRA] (link)

fellatiary: ... the criminal underuse of Chris Mortensen, and the fellatiary treatment of anything even remotely connected to Southern California football. (link)

Note the Andrew Sullivan examples of metaphorical fellatial from 2001 and 2003, and one non-Sullivan example.  I don't know when Sullivan started using this adjective in print (he seems to have some fondness for it), nor do I know who used it first, and these questions don't much interest me.  All these adjectives seem quite likely to have been created many times, by different writers; they're all possible English words, built from the stem fellat- or from the noun fellation, using suffixes appropriate for material in the classical stratum of the English vocabulary.  And indeed there are numerous literal uses of fellatial from before 2001 -- "my novice fellatial powers", "the fellatial arts", "fellatial talent", "fellatial fanatics", "the sloppy fellatial act", "cunnilingual and fellatial stimulation", "fellatial facial" (all from 2000) -- including the expected Lewinsky references,  as in this excerpt from a Virginia Vitzthum piece on Salon, 9/25/98:

Monica and the president explored an amazing span of fellatial landscape over the course of those nine "encounters." Monica's immediate eagerness to suck presidential dick offsets the encounters' one-sidedness and makes her seem less victim, more vixen.

Now I'm ready to move on to the commentary on Sullivan's swipe at Barnes.  Here's John Podhoretz:

ANDREW SULLIVAN'S ANTI-GAY INVECTIVE

Andrew Sullivan calls my old friend Fred Barnes's admiring book about President Bush "fellatial." Imagine if someone had used such a word about an Andrew Sullivan blog item about, say, John McCain. Andrew would have been OUTRAGED! He would have demanded an APOLOGY! Andrew, you see, is gay. So any comparison of his rhetoric to homosexual conduct would be UNACCEPTABLE. But Andrew, being gay, is free to use slighting sexual references to homosexual conduct when discussing the rhetoric and ideas of others. Why? Because, in Andrew's eyes, he is beyond reproach solely because he shares a bed with other men. And Fred Barnes? Married to a...(I know it's unimaginable) woman. How contemptible of Fred. Doesn't he know marriage is only for gay people? UPDATE: Yes, the act Andrew S. analogizes to Fred Barnes's treatment of President Bush is not exclusively one performed by homosexuals. But since Sullivan uses the word for a male writer's analysis of another male, his use of the word "fellatial" therefore has an unmistakably gay tinge.

(Note Podhoretz's modifier from the very mild edge of the toadying domain,"admiring".  The unsigned review in the 1/28/06 Economist (pp. 81-2) calls the book "gushing", which  is a bit more negative.  In my introduction to this posting I used the stronger "adulatory", taking things further into the toadying domain.  "Worshipful" would have gone a bit further still.  Sullivan goes all the way with "fellatial"; "suck-up" would have been a bit less extreme.  No doubt other writers have characterized the book with other vocabulary choices from the toadying domain.  "Sycophantic" and "fawning" would not be bad choices from the fairly negative region of this territory.  "Boot-licking" has a lot of ick factor going for it, and "ass-kissing", "ass-licking", and "shit-licking" have, in turn, progressively more.)

Now if I understand Podhoretz's position here -- not at all a sure thing -- he's saying that "fellatial" is a homophobic slur (a piece of anti-gay invective), period.  Presumably because it refers to cocksucking, and the act of sucking cock is strongly associated with gay men and so picks up the negative affect that attends homosexuality, especially male homosexuality; after all, "cocksucker" is an insult, right?

Well no, not really.  "Cocksucker" can be used literally, it can be used metaphorically to mean 'toady', it can be used as an insult directed at a gay man, it can be used as an all-purpose insult, it can be used as a taboo-word filler noun, otherwise like "jobbie"("I've got to get all these cocksuckers washed and dried by 6" -- said of a pile of dirty dishes), it can be used as an affectionate taboo-word sign of solidarity ("Any of you cocksuckers got a beer?" -- said by one straight guy to a bunch of his straight buddies), and probably in other ways as well.  It isn't just one thing; it's a lot of different things, depending on context.  That's the way language works.

Now, Sullivan surely meant to pour on the ick factor, but that doesn't mean that he takes a generally negative view of sucking cock, or of cocksuckers, as Podhoretz seems to think Sullivan's use of "fellatial" commits him to.  It would be sufficient for Sullivan to believe that FRED BARNES would find it unpleasant or humiliating to suck another man's dick -- and surely he would -- so that comparing Barnes's writing about GWB to sucking GWB's dick introduces the ick factor, suggesting that Barnes the adulator would go even to such lengths to satisfy GWB the recipient.

But, of course, Sullivan's use of "fellatial" will be read -- correctly, I think -- as more generally disparaging, and Podhoretz seems to take it this way.  Sullivan not only shares his bed with another man, but he undoubtedly sucks his boyfriend's cock (sucking dick being the most ordinary of sex acts between two gay men, the meat and potatoes of gay male sex, so to speak), with enthusiasm and pleasure.  But gay men (like Sullivan and me) don't suck cock to show regard or deference, but because cocksucking pleases us (as well as our partners); this is literal, not metaphorical, cocksucking.  In addition, cocksucking is not some unalloyed good thing, independent of context.  Gay men are not interesting in dick, any dick, every dick, any time or place; literal cocksucking can be accompanied by a considerable ick factor.  The idea of sucking off GWB is deeply repellent to me, as I'm sure it is to Sullivan, and that repulsion carries over from the literal sphere to the metaphorical one.

A tale from my sexual life...  My first boyfriend found kissing other men -- me, in particular -- enormously pleasurable, and I reciprocated, passionately.  Yet he once described an event he found decidedly unpleasant as "like kissing Richard Nixon".  (You will see how long ago this was.) Instant ick.  It wasn't kissing men, period, that was the problem, but the details of the event.  (Sullivan could have characterized Barnes's book as lavishing kisses on GWB, and that would have worked, but it wouldn't have been as powerful, simply because, as people see such things, sucking cock is a much more intimate act than kissing.)

So far: "fellatial" isn't necessarily a homophobic slur, and it's not necessarily hypocritical for a gay man to use this word disparagingly.  I turn now to James Wolcott's critique of Podhoretz.  Here's the bit I want to focus on:

"Gay tinge" is a rather prissy phrase on Podhoretz's part, as if Sullivan were trying to slip by a sly innuendo. There's no need to be sly. I won't presume to speak for Sullivan, but it's clear that there's a homoerotic ardor for Bush by neonconservatives that bypasses reason and reduces them to hero-worshipping mush.

My problem here is with "homoerotic".  We seem to have moved from literal "fellatial" to figurative "fellatial" 'servile, etc.' back to a more literal use, imputing homo-desire (though without actual cocksucking).  But this isn't really about language; it's about relationships between people.  Wolcott is connecting an adulatory relationship to homo-desire, a connection that someone could make regardless of what vocabulary is used to describe the adulation.  But why would anyone make that connection?

I can see two contributions towards making this connection.  One is very general in the modern world.  Since Freud, we have come to appreciate the significance of the erotic in our lives.  But that has led many people to see sexual desire in virtually every kind of relationship between two people.  For them, sex is always part of the story.  While not denying the importance of sexual feelings (after all, I write sexually explicit memoirs of my life and pornographic fiction and analysis of the fantasy world of gay male desire, and I create pornographic collages), I resist the idea that they're the mainspring of social life.  There are many other, equally important, factors that organize human relationships: affiliation, physical contact, nurturance, power, play, mentoring, respect, and more.  These can, of course, co-occur with sexual desire, but they need not.  I respect many of my colleagues, but (in general) I don't desire them sexually.  (I feel reasonably assured in saying this, since I'm exceptionally well in touch with my inner sexpig.)  Fred Barnes respects and admires GWB, but that's no reason to think he has the hots for him.

The other contribution is a sense of bafflement that many of us -- I am one -- have over the respect and admiration that some people (like Fred Barnes) have for GWB.  We wonder: how could anyone have such regard for someone who is so transparently unworthy of it?   And so we cast about for explanations other than an appreciation of GWB's merit.  Stupidity and gullibility are two possibilities.  A desire for a strong authority figure is another.  The hope of advancement is yet another.  No doubt there are other possibilities.  Meanwhile, especially if you see sex in all relationships, desire is always available as an explanation.  So you end up discerning homoeroticism.  I think this is just silly.  And annoying, because it trivializes the enormous power of homoerotic desire, for those of us who experience it.  (Well, some gay men find that consequence attractive, since trivializing homoerotic desire means normalizing it: look, ALL guys desire other men, so there's nothing special about me!  Spare me.)

Fascinating as all this is, none of it's about language.  So let's return to language, with John Calendo's (tongue in, um, cheek) proposal in Nightcharm for a definition of the "new word" fellatial:

fellatial (fel-lay-shel) adj.  1.  Of or suitable for a blowjob.  2.  Of the nature of blowjobs, servile, fawning, with involvement of the mouth in a hoovering motion.  3.  Ready to suck off those in authority, usually in exchange for favors, prestige or political appointments.  4.  The way things work in Washington.

Ok, you knew it, I'm going to object to the claim that it is in the NATURE of blowjobs to be servile and/or fawning.   I'm not going to lecture here on the complex and varied emotional pleasures of sucking cock for a gay man (though I have written at some length on the topic in the newsgroup soc.motss over the years), though I will note that for a lot of gay men it makes a big difference whether the cock you're sucking belongs to a gay guy or a straight guy (straight guys can be problematic in a number of ways, including the strong possibility that they will understand your blowjob as an act of servility, whatever you might think about it; on the other hand, some gay men positively desire straight cock, on the basis that straight guys are "more masculine" than gay guys), and that in any case though serving another man (not servility) can be one of those pleasures on some occasions, it's often a minor component, and may be entirely absent.  In fact, both in gay porn and in real life, the man enthusiastically taking the dick may understand the event as one in which the man providing the dick is serving HIM, by providing a cock for him to enjoy; in my experience, this is especially common for cocksuckers who generally identify themselves as "tops", in two ways: they like to be in charge, to run the show, and they fuck guys but don't get fucked themselves.  The world of sexual emotions and relations is astonishingly rich.

Calendo's dictionary entry moves quickly from the neutral (definition 1) to the negative in tone (all that follows) and thus mirrors what has long been a view of cocksucking -- the act -- as perverse, dirty, and abnormal.  I'm fighting that view by talking about it in positive and joyous ways.  Meanwhile, young Americans seem to be increasingly configuring it as routine and not perverse, in fact not really sex at all.  The January/February 2006 issue of the Atlantic Monthly has a review (pp. 167-82), by Caitlin Flanagan, of one nonfiction book, two young-adult novels, and a television show, all treating adolescent sex.  Flanagan notes "the genuine and perplexing rise of oral sex among teenagers--specifically of oral sex performed by young girls on boys" (p. 173).  Their parents are horrified, of course.

Once again, we've moved from words to acts, and there's not a lot of work for a linguist to do, qua linguist.  As a final reward, though, here's the delightful AHD4 account of the history of the word toady:

The modern sense... has to do with the practice of certain quacks or charlatans who claimed they could draw out poisons.  Toads were thought to be poisonous, so these charlatans would have an attendant eat or pretend to eat a toad and then claim to extract the poison from the attendant.  Since eating a toad is an unpleasant job, these attendants came to epitomize the type of person who would do anything for a superior, and toadeater (first recorded 1629) became the name for a flattering, fawning parasite.  Toadeater and the verb derived from it, toadeat, influenced the sense of the noun and verb toad and the noun toady, so that both nouns could mean "sycophant" and the verb toady could mean "to act like a toady to someone."

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 05:27 PM

All your emoticons are belong to Cingular?

The Register reports that "Cingular, the United States' largest mobile phone network this week applied to patent emoticons, better known as smileys".

The text of the patent application lists

) Smile ;-) or ;) Wink :-D or :D Big smile :-)) or :)) Very happy :0) Big nose smiley |-) Cool! >:-) or >:) Evil grin >;-> or >;> Evil grin with a wink :-X or :X My lips are sealed }:-) or }:) Devilish :-{circumflex over ( )}) or :{circumflex over ( )}) Tongue in cheek :-P or :P Sticking out tongue :-& or :& Tongue tied :op Puppy face 0:-) or 0:) Saint :-)8 or :)8 Happy wearing a bow tie 8-) or 8) Happy with glasses #-) I partied all night %-) or %) Drunk :-###.. or :###.. Being sick %-( or %( Confused :-0 or :0 Shocked :-o or :o Surprised :-| or :| Indecision :'-( or :'( Crying :'-) or :') Crying of happiness :-( or :( Sad

and motivates the "invention" as follows:

Written communication plays an integral role in modern social, economic, and cultural life. Writing facilitates the transfer and preservation of information and ideas. However, without direct access to facial expression, body language, and voice inflection, the potential for misunderstanding written communications is considerable.

The application seems to focus on the means of entering the emoticons, rather than the emoticons themselves:

A method and system for generating a displayable icon or emoticon form that indicates the mood or emotion of a user of the mobile station. A user of a device, such as a mobile phone, is provided with a dedicated key or shared dedicated key option that the user may select to insert an emoticon onto a display or other medium. The selection of the key or shared dedicated key may result in the insertion of the emoticon, or may also result in the display of a collection of emoticons that the user may then select from using, for example, a key mapping or navigation technique.

Still, this is my nomination for the lamest patent application ever. Have these people no sense of decency?

The article in The Register links to

"... the most delightful commentary ever written about the practice of using emoticons, Geoffrey Nunberg's celebrated radio piece A Wink is as Good as a Nod - where Nunberg imagines the literary greats employing the technique."

[Update: Chris Waigl draws my attention to Scott Fahlman's page on his (co-?)invention of emoticons, including a description of the "Digital Coelacanth Project" which found the original Bboard Thread in which :-) was proposed.]

[Update #2: Several readers have written to emphasize the point that I made above, though perhaps not strongly enough, that Cingular is not seeking a patent on emoticons as such -- contrary to the implication of the quote from The Register -- merely a patent on any method for entering emoticons via special keys, key sequences or menus. The lameness of the application, in my opinion, is due to the unprecedentedly high "duh factor" of this "invention", not because emoticons have been around for a quarter of a century or more. I expect that there is some prior art for keyboard shortcuts and the like for emoticon entry -- but even if there weren't, no one should be able to patent something as obvious as this. It's like asking for a patent on the idea of putting portable mp3 players in a shopping bag so that purchasers can carry them home more easily.

In Cingular's defense, they may feel driven to such silly gestures by the extraordinary pliability of the USPTO in the hands of "patent trolls", such as Acacia Media Technologies. That is, Cingular's aim may not be to derive revenue from licensing their simple-minded "invention" to others, but rather to protect themselves against having to pay ransom to the likes of Acacia for the right to do business in the obvious way.]

Posted by Mark Liberman at 05:09 PM

Jeopardy! strikes the wrong tone

The game show Jeopardy! has something of a mixed record when it comes to language-related clues. Last night's installment had a whole category on "Language," which was rather unremarkable except for the $400 clue:

In the Kootenai tongue, a word's pitch changes its meaning, as in this most widely spoken world language.

This clue follows a typical formula for Jeopardy!, where the wording may refer to something quite obscure even though the correct response is nice and obvious. In this case the requisite obscurity is Kootenai, also known as Kootenay, Kutenai, or Ktunaxa, a nearly extinct language isolate still in use by only a handful of speakers in southeastern British Columbia, northern Idaho, and northwestern Montana. I checked online and could find no references to Kootenai being a tonal language, and fellow Language Logger Sally Thomason verified that Kootenai is indeed not tonal.

So what's the deal? Did the usually meticulous clue-writers have some other tonal language in mind? My best guess is that they confused Kootenai with one of the many Athabaskan languages that are tonal. The most likely candidate for confusion is Gwich'in, also known as Kutchin or Kootchin. I can imagine the show's researchers consulting a list of North American languages, with Kutenai/Kootenai listed right after Kutchin/Kootchin. Of course, it didn't matter what exotic-sounding language was mentioned, as long as it served to spice up an otherwise uninteresting clue about Mandarin tonality.

Posted by Benjamin Zimmer at 01:46 PM

January 29, 2006

Dubious quotation marks

Gratifying though it is to see myself quoted in print, I'm peeved to see myself represented as using quotation marks for emphasis.  Like, 'for emphasis', meaning for emphasis.  But that's what happens in Leslie Savan's Slam Dunks and No-Brainers, chapter 2 ("Pop Talk is History"), in the section (pp. 33-4) titled "Who needs Esperanto when you've you got Coca-Cola?"  I'm not entirely sure how this happened.  More interestingly, this case illustrates an issue in the mention (rather than use) of linguistic material, including quotation:  faithfulness vs. well-formedness (shades of OT!).

Here's the whole quotation (with two bits boldfaced that were not boldfaced in the original):

    Coca-Cola is so ubiquitous that it's not always considered American.  The Stanford University linguist Arnold Zwicky recalled how, about thirty years ago, his wife, Ann Daingerfield Zwicky, "was teaching an ESL [English as a Second Language] class at Ohio State and used one of her ice-breaker topics: words borrowed into English from your native language.  Alas, this time the first word offered was 'Coca-Cola,' by (I believe) a speaker of Hindi.  An Arabic speaker ... denied this with scorn; 'everybody' knows 'Coca-Cola' is an Arabic word.  Pandemonium ensued.  Even a female student from Japan, normally silent in class, was moved to dispute the others' absurd claims.  The only thing they were agreed on was that the idea that the headquarters of the Coca-Cola Company could possibly be in 'Atlanta'--or anywhere in the U.S.--was preposterous (or evidence that America just grabbed everything away from the rest of the world)."
    [p. 278: from his post to the ADS and an e-mail interview, April 2000.]

Now, there are several ways in which this version differs from what I originally wrote.  Point 1: I used double quotes on "Coca-Cola"; these have now been replaced by single quotes, because they're inside a quotation from me that Savan has enclosed in double quotes.  Point 2: Most of my rapid writing is all lowercase, but this has been altered to conventional capitalization.  Point 3: I originally typed
    ... was "coca-cola", by ...
with quot-punc order, and this has been altered to
    ... was 'Coca-Cola,' by ...
with punc-quot order.  In all of these cases, Savan (or, more likely, her editors) opted against faithfully reproducing what I wrote, in favor of conforming to a style sheet different from the one I prefer.  Well-formedness trumps faithfulness.

The boldfaced words were originally typed inside asterisks, to indicate emphasis in text that sticks to ASCII characters:
    ... *everybody* knows "coca-cola" is ...
    ... could possibly be in *atlanta* -- or anywhere ...
The equivalent in handwriting would be underlining; in print, usually italics, or possibly boldface or small caps, depending on your style sheet.  But not any kind of quotation marks (single or double, smart or plain).  Emphatic quotation marks are usually mocked as an illiteratism; but in any case, they aren't standard.  Yet I have been represented as using them.  I feel sullied, and frankly, I'm puzzled as to how this happened; either Savan, or someone at Knopf, apparently thinks this is an ok way to indicate emphasis.

The larger point -- the conflict between faithfulness and well-formedness in linguistic mention -- is a gigantic one.  I originally started a Language Log posting on the topic back during the discussion of taboo words in titles of books and movies, but it quickly bloated up horribly.  But for fun, here's an unsubtle example (there are subtle and complex ones) that also provides a little homework assignment for the more enthusiastic readers:

Of the two major political parties in Britain, one is known there as the Labour Party; the -our spelling is the British one.  Here in the U.S., the party is referred to in print (in political stories in the New York Times, for instance) as the Labor Party, with the American -or spelling.  Once again, well-formedness, in this case conformity to the local spelling conventions, trumps faithfulness: references to the party are re-spelled.

Now, the homework assignment, in two parts: to find American writing with Labor within a quotation (referring to the political party, of course) that is then itself quoted in a British publication, like the Economist or the Guardian; and to find British writing with Labour within a quotation that is then itself quoted in an American publication, like the New Yorker or the New York Times.  Are references within quotations re-spelled?  (The earlier examples were outside of quotations, in the main text.)

For extra credit, look for occurrences of British Labour in book titles that are then cited in American footnotes or bibliographies, and the reverse: occurrences of American Labor in book titles that are then cited in British footnotes or bibliographies.

General observation: well-formedness tends to trump faithfulness, but not always.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 08:46 PM

Language Log final exam

For those of you who are taking Language Log for credit in the fall semester of 2005 and want to make progress toward your Diploma, you've had your January reading period (we are on the Harvard-style semester system, as you know), and it is now time for your final exam. No cheating; independent work only; essay-style answers. We will be judging you on neatness, originality, coherence, clarity, knowledge of elementary linguistic terminology and conceptual distinctions, and of course creative ranting. Your exam follows below. Submit the answers to your favorite Language Log contributor as usual, enclosing a stamped addressed envelope and a bottle of single malt scotch whisky.

LANGUAGE LOG FINAL EXAM, FALL SEMESTER 2005

Read this piece from The New York Times Real Estate section and answer the questions that follow below.

The yearning for a smooth transition from the surging [real estate] market is seen in the increasingly frequent use in the last six months of the phrase "soft landing."

"Soft landing is everyone's big hope," said Paul JJ Payack, president of the Global Language Monitor (languagemonitor.com), which analyzes language trends and their impact on politics, culture and business.

Mr. Payack, who graduated from Harvard with a bachelor's degree in comparative literature, calculated the popularity of some 36 buzzwords chosen by a reporter. He used his Predictive Quantities Indicator, or P.Q.I., an algorithm that tracks words and phrases in the media and on the Internet in relation to frequency, contextual usage and appearance in global media. It is a weighted index that takes into account year-to-year increases and acceleration in the last several months.

Among the market buzzwords he ranked, "soft landing" and "pause" had the highest P.Q.I.'s. They were ranked first and second respectively, while the more ominous sounding "housing bubble" ranked seventh. " 'Pause' is another one of these hopeful things," Mr. Payack said.

(Mr. Payack can also verify that "O.K." is the most frequently spoken word, that "outside the mainstream" was the top phrase of 2005 and that as of Jan. 26 at 10:59 a.m. Eastern time, the number of words in the English language was 986,120.)

  1. Discuss some reason why the text frequency of buzzwords might not tell us much about anything economic.
  2. Give the definition of the word "algorithm", and consider what might be meant by saying someone has an algorithm "that tracks words and phrases in the media" and so on. Compare with other gee-whiz locutions about computing, like "We ran him through the computer" in police procedural shows on TV and the like.
  3. Construct three or four interpretations of what Mr Payack might have meant by saying "'Pause' is another one of these hopeful things", and discuss them critically.
  4. Explain in detail why it is patently stupid to try and say exactly, down to a single word, how many words the English language has at a given instant, and why one would have to be a moron to think that a figure of about a million was right anyway.
  5. Rant a little about how silly this all is and how journalists really need to develop a bit more skepticism and a bit more knowledge about language than the typical 9-year-old has, and so on and so forth.

Posted by Geoffrey K. Pullum at 05:38 PM

The cran-morphing of -dango

On his blog Evolving English II, Mike Pope (aka "WordzGuy") reflects on the name of a new job-searching website for the Pacific Northwest: Jobdango, evidently inspired by the movie ticketing service Fandango (rather than the Spanish dance):

They've broken off -dango and used it to mean, I am guessing, something like "place where you get something":

Fandango = place where you get movie tickets
Jobdango = place where you get jobs

The flaw in this theory is that Fan- doesn't map cleanly to "movie tickets." But who says that the -logy part of etymology has to mean "logic"? Not I.

Pope has previously blogged about similar breakaway segments, such as -zilla and -palooza. This sort of reanalysis relies on what linguists sometimes call "cranberry morphemes." The segment cran- in cranberry is opaque, though it looks like it's a modifier for the transparent morpheme -berry. Indeed, cranberry was only ever fully transparent in the Low German dialects from which the term was borrowed, where it was kraanbere or 'crane-berry.' Since English underwent the Great Vowel Shift, the semantic connection between the cognate forms cran- and crane has been lost. But the opacity of cran- has allowed for a reanalysis of the morpheme to "stand for" cranberry in new compounds like cran-grape and cran-raspberry. Such cran-morphing has yielded many productive suffixes in the 20th century: -burger, -(o)holic, -(o)rama, -(a)thon, -(o)mat, -(o)nomics, -gate, etc. (In the case of -burger, the new morpheme quickly became lexicalized as the standalone burger.)

Though Pope was unable to find many other examples of the cran-morphing of -dango (besides an airbrush template with a flame pattern named "flame-dango"), he missed one obvious predecessor: fundango. Among the tens of thousands of Googlehits for "fundango" or "fun-dango" are a company promoting online activities for children, a circus festival, and a juggling convention. All three of these operations are based in the United Kingdom, but fundango also has a long history in American English as a fanciful name for a fun activity. The earliest examples I've found in digital newspaper databases date to 1961. On Feb. 26 of that year, the Los Angeles Times ran a photo feature in its TV section with the headline "Fun-Dango," about a flamenco act appearing on NBC's "Galaxy of Music." That obviously is still connected to the terpsichorean sense of fandango, but the same can't be said for this citation, from an article about Missouri osteopaths attending the annual convention of the American College of Osteopathic Surgeons in Denver, Colorado:

Chillicothe (Mo.) Constitution Tribune, Oct. 28, 1961, p. 1
Highlights of the convention will include a formal banquet, inaugural and ceremonial conclaves on Monday evening [and] the "Colorado Fundango" on Wednesday evening.

(Who says osteopaths don't know how to have fun?) A couple of years later, Chicago Tribune entertainment writer Herb Lyon began using fundango, as in his "Tower Ticker" column of Apr 29, 1963: "Bob Hope will take up most of Johnny Carson's Thursday night TV fundango." These examples rely on the preexisting sense of fandango as a bit of tomfoolery, popular in American English since the 19th century (along with the apparently related form fandangle).

The switch from fandango to fundango only requires changing one vowel, but it may have opened the door to the reanalysis of -dango as a bound morpheme. (As Pope suggests, the use of the name "Fandango" as a ticketing source for movie fans also contributes to the reanalysis, particularly as an analogical basis for a service like Jobdango.) A similar phenomenon occurred with the cran-morphing of -tastic, which began with the easy shift from fantastic to funtastic. I've found examples of funtastic all the way back to 1939 in Jimmie Fidler's syndicated column, "Fidler in Hollywood," as in these three citations:

Los Angeles Times, Apr. 27, 1939, p. 13
In-a-word description of the Ritz zanies: Fun-tastic.

Nevada State Journal, Oct. 27, 1942, p. 4
Fantastic and fun-tastic; manna for theater-goers who want "something different."

Nevada State Journal, Nov. 17, 1942, p. 4
Fun-tastic nonsense guaranteed to tickle your sense of humor.

By the 1960s, X-tastic had become a productive formation for US advertisers. A quick scan of the newspaper databases turns up shoe-tastic (1966), carpet-tastic (1966), fang-tastic (1968), shag-tastic (1969), swim-tastic (1970), and so forth. (During the NFL players' strike of 1987, David Letterman had a Top Ten list called "Top 10 Slogans of the Scab NFL" — the number one slogan was "It's scab-tastic!") Since the '90s, the suffix -tacular has begun to rival -tastic, though they often attach to the same root (craptastic and craptacular each return hundreds of thousands of Googlehits.)

One quick measure of the relative success of cran-morphemes in contemporary online discourse is to see how often they get attached to that promiscuous blend component, blog- (as in blogosphere, blogorrhea, blognoscenti, etc.). Google currently finds about 93,700 pages with blogtastic, 39,200 with blogalicious, 15,800 with blogeriffic, 984 with blogapalooza, but only 89 with blogdango. And most of those 89 don't count, since they refer to the Japanese website Blog Dango, which I'm assuming is a blog about rice dumplings. Filtering those out leaves just three examples: one about bloggers "getting ready to trip the light blogdango" (playing on Procol Harum's reanalysis of the old expression trip the light fantastic), one suggesting "Blogdango" as a blog-related project for Kevin Costner (playing on his 1985 movie "Fandango"), and one joking about calling "1-800-BLOGDANGO" (playing on the Fandango movie ticket service). So clearly -dango has a long way to go to catch up with its crantacular morphoriffic colleagues.

[Update: Grant Barrett suggests that the reanalysis of fandango as fan + dango has been encouraged by the Fandango ticket service ever since they began screening promotional ads in Sony/Loews theatres featuring paper-bag puppets. In one early promo, one of the puppets says, "I know 'fan' means for the fans, but I don't know 'dango'...what that means." Grant speculates that this widely viewed advertisement "might be a reinforcer of any cranalytical neologizing of 'fandango' now going on."]

[Update, 2/8/06: Mark Peters of Wordlustitude reports on another form: fuckdango.]

Posted by Benjamin Zimmer at 04:27 PM

Rare in the same sentence

What I immediately noticed about Ben Zimmer's post about hiphop research was that the claim that "It's rare to use the words ‘hip hop’ and ‘serious academic research’ in the same sentence" is one more case of someone writing for the press (this time a publicist rather than a journalist) deliberately and quite pointlessly disguising a topic of discussion as linguistic when it isn't. Journalists are drawn to irrelevant but checkable corpus-content statements like moths to a flame. They'll write that some word is always accompanied by the qualifier such-and-such, or that some other word is invariably followed by the phrase thus-and-so, and they'll never try to substantiate these claims, which are in any case nothing to do with what they want to discuss.

Sometimes the linguistic claims are screamingly, massively false. But I'm not really interested in the question of the truth of the present rarity claim. It took about ten seconds to find the sentence "I would also like to express my appreciation for Brother [Cornell] West, who, with the recording of his hip-hop albums, showed me that, even in the midst of academia, non-scholastic pursuits don't have to be put on hold for serious academic research" at Generally Awesome, but that's not the interesting thing. The interesting thing is that although it contains the two phrases in question, it very clearly implies by what it says that hiphop does not have anything to do with academic research. The mere presence of the two phrases has nothing to do with how common it is to find academics studying hiphop. Why would it ever seem sensible to a journalist or publicist to take a claim like "Serious academic research on hiphop is extremely rare" (ravingly false, but that's not my point) and transmute it into a claim with different subject matter, about frequency of co-occurrence of phrases, which the journalist or publicist in question knows nothing about and has no interest in, but we at Language Log can so easily fact-check? It's very odd.

Posted by Geoffrey K. Pullum at 10:58 AM

January 28, 2006

The dissing of hiphop linguistics

On the anthroblog Savage Minds, Kerim Friedman takes note of a recent press release from the University of Calgary under the title "Hip hop and linguistics: you ain't heard no research like it":

It's rare to use the words 'hip hop' and 'serious academic research' in the same sentence, but a University of Calgary linguistics professor has relied on rap music as source material for a study of African American vernacular English.

Dr. Darin Howe recently contributed a book chapter that focuses on how black Americans use the negative in informal speech, citing examples from hip hop artists such as Phonte, Jay Z and Method Man. Howe is believed to be the only academic in Canada and one of the few in the world to take a scholarly look at the language of hip hop.

As Friedman remarks, a little basic fact-checking would have helped here. There's been plenty of serious academic research on hiphop, including linguistic research, for quite some time now. Friedman quickly Googled up a bibliography of hiphop scholarship compiled by John Ranck of Simmons College, to which I'd add the even more extensive bibliography maintained at the Hiphop Archive website.

Linguistic research on rap lyrics and hiphop culture more broadly is certainly nothing new. The founder of the Hiphop Archive, Stanford communications professor Marcyliena Morgan, has been writing about hiphop from a sociolinguistic perspective since at least 1993 (in a paper presented at the American Anthropological Association annual meeting, "Hip Hop Hooray!: The Linguistic Production of Identity"). Geneva Smitherman of Michigan State, author of Talkin and Testifyin: The Language of Black America (1977) and Black Talk: Words and Phrases from the Hood to the Amen Corner (1994) also has an extensive list of hiphop-related publications. Dissertations on the language of hiphop have been produced since at least 1997 (Jon Abdullah Yasin's thesis at Columbia University, "In yo face! Rappin' beats comin' at you: A study of how language is mapped onto musical beats in rap music"). Newer scholars include Samy Alim and Cecilia Cutler, both of whom were involved in the recent PBS documentary "Do You Speak American?" (Alim on "Hip Hop Nation Language" and Cutler on the "crossing" performed by white suburban teenagers using hiphop talk).

As for whether Howe is the "only academic in Canada" studying hiphop language, I sincerely doubt that. He's certainly not the first. Though no longer affiliated with a Canadian university, Awad Ibrahim wrote his 1998 dissertation at the University of Toronto on language-learning by Francophone African youths at a Toronto high school. He found that a crucial aspect of their learning process was the acquisition of Black English through hiphop, which assisted them in "becoming black." (See also Ibrahim's contribution to Black Linguistics: Language, Society and Politics in Africa and the Americas, based on his doctoral work.)

Nowadays it's not unusual to find presentations or even entire panels on linguistic aspects of hiphop at academic conferences. The range of research topics is quite broad, from deixis in gangsta rap to the self-consciousness of the hiphop register, from conversational pragmatics to copula absence, from /aw/ variation to communicative failure in freestyle rapping. But what is unremarkable in scholarly gatherings is apparently still bizarre and exotic at the University of Calgary press office.

Posted by Benjamin Zimmer at 07:40 PM

The parts of speech

Luckily, the late Peter Ladefoged had a really good sense of humor, and as he sits in the faculty club at the University of Heaven and reads the AP story about his death, I'm sure the club booms with his rich laughter. The first sentence of what the Associated Press put on the wires (which appears in, for example, the San Jose Mercury News today) says:

LOS ANGELES - Peter Ladefoged, a UCLA linguistics professor emeritus who made it his life's work to record the parts of speech used in human languages, has died.

But (although it is easy to see how this howler might arise) the parts of speech are not in fact the parts of your body that you use for speech. More journalistic ignorance of even the most absolutely basic notions of linguistic science. Sigh.

"Parts of speech" is an old-fashioned name for lexical categories — classes to which words with similar grammatical properties belong, e.g., noun, verb, adjective, adverb, preposition. Categories of this sort are in fact the key element that lifts grammatical description up to a level of abstraction where you are not talking about speech, you are talking about higher-level units to which various grammatically equivalent small stretches of speech can be treated as belonging. The job of a phonetician is to describe in minute detail the speech sounds themselves; so classifying into parts of speech (lexical categories) is exactly what the phonetician must never do, in his capacity as phonetician. And if you will pardon my being a little irritable (I'm sorry, but a friend of mine recently died), I do think we have a right to expect better from the AP than this. What they have done is like writing up Einstein's demise as the passing of a chemist. It is rank ignorance. Journalists just don't know anything about the language sciences, but instead of asking they just write nonsense. Peter deserved better than to have his passing commemorated with an embarrassing goof that any of the students that he taught in UCLA's excellent Department of Linguistics could have fact-checked.

Posted by Geoffrey K. Pullum at 05:02 PM

January 27, 2006

Peter Ladefoged

I'm confident that I speak for the entire profession when I say that we are all deeply saddened by Peter Ladefoged's passing this week at the age of 80. Here are some links to find out more about this extraordinary phonetician.

You may also be interested in reading about Ladefoged's career in his own words (.pdf), which appears to have been written sometime within the last few years.

And for a little comic relief from the sadness: a linguist joke in which "the Devil [is] a tall, handsome man with a voice rather like Peter Ladefoged's."

[ Comments? ]

Posted by Eric Bakovic at 04:47 PM

The late Peter Ladefoged

I'm just not ready to write an obituary for Peter Ladefoged, whose death I just learned of today. But I will just say a word here about my own grief at the death of this man, a good friend and University of California colleague, who at his death was the most distinguished and important phonetician in the world and the active holder (at the age of 80) of one of the largest NSF grants ever given for pure linguistics research. He was loved by everyone who knew him. His works dominated the field — I have taught phonetics out of Ladefoged texts since 1982, and I treasure my copy of The Sounds of the World's Languages. We plan to simply steal the title of the latter book as the name for a new freshman course at UC Santa Cruz in 2006-2007, and I know Peter would have been very pleased. He was a fine raconteur, a tireless investigator of languages, a pioneer in archiving and digitalteaching aids, an original thinker, a pillar of the International Phonetic Association, a true gentleman, a wonderful human being. And he had this deep, dark, rich British voice, part James Earl Jones, part Christopher Lee. It is a very sad thought that never again, when I call the UCLA Phonetics Laboratory that he founded, will I hear that voice saying, "Peter Ladefoged here." All those of us who knew him will miss him so much. Those who know nothing much about phonetics but would like to could learn a great deal about it by consulting his relatively popular book Vowels and Consonants (2000; ISBN 0631214127).

Posted by Geoffrey K. Pullum at 04:40 PM

Bring the bling

David Giacalone at f/k/a, commenting on Ben Zimmer's post "Blawgs, phonolawgically speaking", observed:

After this brief exposure to linguistics, it seems to me that linguists are science-minded persons, who like words more than numbers, and are too nice to want to be lawyers.

After giving some thought to this hypothesis, I've concluded that it's roughly true, except that the sorting process is an imperfect one, and a certain number of people who should have been lawyers have ended up as linguists. Perhaps the errors are statistically symmetrical, and a certain number of linguists have also ended up as lawyers.

David's kind observation about the niceness of linguists, in any case, is a sort of rhetorical head-fake to set up a carefully-worded complaint. Linguists may be too nice to be lawyers, he concedes, but

Like lawyers, however, they apparently do tend to take liberties when describing the positions of others. Thus, where I said I was surprised, Benjamin says I am "shocked." Where I merely gave a prominent example, he says I am "troubled."

One thing for sure, I bet Benjamin and Mark would be quite annoyed, if someone wanted to permanently call their weblog a "bling", merely because weblogs by linguists are so unique.

We're certainly as unique as they come. But speaking for myself, I don't think I'd be annoyed by someone's desire "to permanently call" Language Log a "bling", though perhaps I don't adequately understand how that desire would impinge on me.

Meanwhile, Denise Howell at Bag and Baggage, who invented the term blawg, came up with two nifty titles for a comment on Ben's post -- "Pain In The Low Back" and "Better Than A Stick In The Eye-Dialect" -- before deciding on "I, Sandwich Dominatrix". All three titles are hilarious, in a quiet word-nerdy sort of way, but it'll spoil the jokes to explain them, so you'll have to read the sequence of posts.

This back-and-forth between law and linguistics reminds me, for some reason, of Walker Percy's proposed solution to the perceived problems of teaching poetry and biology in a way that allows "a student who has the desire to get at a dogfish or a Shakespeare sonnet [to salvage] the creature itself from the educational package in which it is presented". He describes two methods that he rejects as impractical -- catastrophe and apprenticeship -- and concludes that

since neither of these methods ... is pedagogically feasible ... I wish to propose the following educational technique which should prove equally effective for Harvard and Shreveport High School. I propose that English poetry and biology should be taught as usual, but that at irregular intervals, poetry students should find dogfishes on their desks and biology students should find Shakespeare sonnets on their dissecting boards ...

Posted by Mark Liberman at 10:35 AM

January 26, 2006

Surprising crocodile kin

It's great having a brother who's a noted science writer, especially one who's a fellow blogger. Today Carl Zimmer's blog ("The Loom") has an entry about his New York Times article describing a fascinating new paleontological discovery: the fossil remains of an ancient reptile related to modern crocodiles and alligators, with a body much like a dinosaur. What's surprising is that the fossil, named Effigia okeeffeae, dates to 210 million years ago, or about 80 million years before dinosaurs evolved similar bodily structures.

In the comments section for the blog entry one can find knowledgeable discussion about whether Effigia should be considered an "ostrich mimic mimic mimic." (This relies on the peculiar sense of "mimic" used by paleontologists, which is evidently applied when a newly discovered fossil resembles a previous discovery — thus when early ostrich-like dinosaurs were found, they were dubbed "ornithomimids," or 'ostrich mimics.') But what caught my eye was a comment about the headline of the Times article: "Fossil Yields Surprise Kin of Crocodiles." A commenter known as "Clueless" wrote:

When I saw the headline, I was wondering how a fossil yield could surprise crocodiles (or their kin), and it took a few moments to figure out what it was intended to mean. Does the author have any control over the headline, or is it completely up to the editors at the newspaper?

To answer the commenter's question, journalists rarely if ever have control over the headlines that are put on their articles, much to the chagrin of writers who wake up to find their painstaking work undercut by a misleading headline. In this case, the headline wasn't factually misleading, only syntactically so. It's a great example of the kind of ambiguous sentence that teachers of introductory syntax classes often present to their students (like the old standby, "I hate visiting relatives"). If this were a diagramming exercise in Syntax 101, the students would have to come up with phrase-structure trees to account for the structural ambiguity:

The ambiguous reading hinges on whether "yields" is understood as a noun or a verb. Once a reader decides to parse "yields" as a plural noun (with "fossil" understood as an attributive modifier), then the garden path has been established. The unusual headlinese of "surprise kin" further encourages the alternate parsing.

A similar ambiguous headline occasionally gets hauled out for the amusement of linguistics classes: "British Push Bottles Up German Rear." Again, the key to the battling interpretations is whether a single word (in this case "push") is parsed as a noun or a verb. I always figured that this headline was apocryphal (one also sometimes sees "French" in place of "British"). But I've seen two references online that say there was an actual headline from World War II along these lines, evidently reproduced in Fritz Spiegl's What The Papers Didn't Mean to Say (1965). The headline given in Spiegl's book reads: "Eighth Army Push Bottles Up German Rear." For American readers this isn't quite as elegant as using "British" or "French," since the ambiguity of Spiegl's headline requires construing "Eighth Army" as plural. That's not a problem for British readers, but in American usage so-called "collective nouns" typically take singular verbs. The ethnonyms "British" and "French," much like "Chinese," can be construed as plural and thus lend themselves to ambiguous readings.

(Another variant on the headline offered by the author Terry Pratchett is "Russian Push Bottles Up German Rear." That doesn't work nearly as well, since the noun "Russian" can only be construed as singular and thus doesn't agree with the verb "push" — unless, of course, one reads "Russian" as a vocative and "Push Bottles Up German Rear" as an imperative. Ouch.)

Posted by Benjamin Zimmer at 07:05 PM

Chinian, not Chinese?

In an interesting twist on the "Shanghainese" issue, Kevin Keqing Liu at China Daily argues that it's time to retire "Chinese" in favor of "Chinian". His reasoning starts this way:

Group I: American, Australian, Austrian, Canadian, German, Italian, Norwegian, Russian...

Group II: Chinese, Congolese, Japanese, Nepalese, Portuguese, Sudanese, Vietnamese...

In the State of Ohio in the United States, what do local residents call themselves? Ohioese? Wrong. Ohioan. In Toronto, Canada, the people there call themselves yes, you guessed it Torontonian. Never Torontonese.

Not enough to make you feel superior should you fall into Group I, or inferior if you unfortunately happen to be in Group II? Let's look at the Longman Dictionary of Contemporary English, 1978, for the definition of "-ese": suffix, 1. (the people or language) belonging to (a country); 2. (usually derogatory) literature written in the (stated) style. Examples: Johnsonese; journalese.

Or MSN Encarta Dictionary online: ... 3. The style of language of a particular group (disapproving). Example: officialese. [Via Old French -eis; Italian -ese]

He continues the argument:

The English-speaking founding fathers of Singapore were well aware of the subtle significance behind the "-ese" and "-an" distinction, and opted for Singaporean when the nation became independent in 1965.

India has a different story. The Indians stemmed from Europe. Europeans saw Indians as relatives. You wouldn't want to use harsh descriptions for your relatives, would you?

The same is true of Central and South Americans, who are cousins of North Americans and Mexicans.

You may ask: What about the Portuguese, also Europeans? Well, a few hundred years back, Portugal was a powerful nation warring fiercely with other major European countries for resources in overseas colonies, and was victimized by being hated and looked down upon by their European rivals.

and concludes:

In the 21st century, the world has evolved into an era when racial discrimination is not tolerated. It is time the names in Group II were abolished.

I don't know the history in detail, but I believe that the development of the derogatory suffix for writing or speaking styles followed, rather than preceded, the use of -ese for adjectival forms of toponyms. That's what the OED says:

A frequent mod. application of the suffix is to form words designating the diction of certain authors who are accused of writing in a dialect of their own invention; e.g. Johnsonese, Carlylese. On the model of derivatives from authors' names were formed Americanese, cablese, headlinese, journalese, newspaperese, novelese, officialese, etc.

The earliest citation for this development is from 1898:

1898 F. HARRISON in 19th Cent. June 941 As Mat Arnold said to me..‘Flee Carlylese as the very devil!’ Yes! flee Carlylese, Ruskinese, Meredithese, and every other ese.
1899 Golf Illustr. 14 July 134 American ‘golfese’.
1906 Daily Chron. 2 Aug. 3/2 Deplorable guide-bookese.

As for the story of the affix itself, the OED gives it this way:

forming adjs., is ad. OF. -eis (mod.F. -ois, -ais): -- Com. Romanic -ese (It. -ese, Pr., Sp. -es, Pg. -ez):-- L. ēnsem. The L. suffix had the sense ‘belonging to, originating in (a place)’, as in hortēnsis, prātēnsis, f. hortus garden, prātum meadow, and in many adjs. f. local names, as Carthāginiēnsis Carthaginian, Athēniēnsis Athenian. Its representatives in the Romanic langs. are still the ordinary means of forming adjs. upon names of countries or places. In Eng. -ese forms derivatives from names of countries (chiefly after Romanic prototypes), as Chinese, Portuguese, Japanese, and from some names of foreign (never English) towns, as Milanese, Viennese, Pekinese, Cantonese. These adjs. may usually be employed as ns., either as names of languages, or as designations of persons; in the latter use they formerly had plurals in -s, but the pl. has now the same form as the sing., the words being taken rather as adjs. used absol. than as proper ns. (From words in -ese used as pl. have arisen in illiterate speech such sing. forms as Chinee, Maltee, Portugee.)

There's clearly a story to be told about the concentration of -ese derivatives in East Asia, but I don't think that the story Liu tells is the right one, at least historically.

In sorting -ese and -ian, we need to note that English has other processes for forming adjectives from place names, including -ish (Irish, British, Flemish, Polish, Scottish, Spanish, Swedish), -i (Afghani, Iraqi, Israeli, Kuwaiti, Pakistani) and the motley collection of processes involved in cases like French and Greek.

In this context, we should note that -ish also has a disparaging or belittling tinge in nonce formations, as the OED observes:

In recent colloquial and journalistic use, -ish has become the favourite ending for forming adjs. for the nonce (esp. of a slighting or depreciatory nature) on proper names of persons, places, or things, and even on phrases, e.g. Disraelitish, Heine-ish, Mark Twainish, Micawberish, Miss Martineauish, Queen Annish, Spectator-ish, Tupperish, West Endish; all-over-ish, at-homeish, devil-may-care-ish, how-d'ye-doish, jolly-good-fellowish, merry-go-roundish, out-of-townish, and the like.

This can hardly be because the adjectival forms of toponyms with -ish are themselves generally deprecated.

Reforming English to regularize all adjectival forms of toponyms using -an or -ian would, ironically, align everyone with the usage attributed to George W. Bush in what were (among) the earliest reported "Bushisms": Grecians, East Timorians, Kosovians. On this line, I guess, you could pitch it as an educational reform to make it easier for schoolchildren to learn standard English, rather than as an exercise in political correctness designed to avoid negative connotations attached to anyone's morphemes.

But perhaps we'll see an alternative movement to rescue these morphemes from their historical degradation at the hands of elitist irony: "Say it loud: -ish and proud!"

[Update: Aaron "Dr. Whom" Dinkin asks whether Spaniard uses the same morpheme as words like bastard, canard, mallard, coward, buzzard, drunkard, laggard, sluggard etc.. It does, I believe. ]

Posted by Mark Liberman at 06:51 AM

January 25, 2006

Podzinger rejects Jesus

If you haven't already done so, go check out BBN's Podzinger service for searching podcasts. More exactly, according to its current banner, it's "searching 48797 podcasts" -- and growing. Podzinger applies automatic speech recognition to turn podcasts into text, and lets you search the stored texts for words or strings. You can sort results by date or by "Relevancy". Each hit is shown with the harvested title and abstract of the podcast, and 25 words or so of textual context around the matched search term -- more if there are several matches within the window shown -- and an indication of the time point in the podcast where the match occurred. In principle, Podzinger lets you access the audio of the podcast at the point of the match (although in my experience this often doesn't work due to server load or other issues), and it gives you links for the original source of the podcast (URL or RSS).

I should say right up front that I think Podzinger is terrific. I've been using it for several days with considerable satisfaction. And it's an excellent display of the strengths and weaknesses of state-of-the-art speech recognition technology.

To start your own tour, try a search term like Beijing. You'll get some very plausible hits, especially from podcasts of news programs. One that I got this morning was this stretch, from 04:34 in NPR's January 24th 10:00 a.m. news summary:

Deputy secretary of state robert zoellick is in beijing where he began talks today with senior chinese officials the nuclear standoff with iran and north korea are high on the agenda The two sides also are expected to discuss bilateral relations and preparations for a strategic dialogue later this year China is the host of six party talks aimed at ending north korea's nuclear weapons ambitions -- visit to beijing follows a recent visit by north korean leader kim jong il i'm carl -- NPR news in washington ...

This has the ring of truth -- without even bothering to check, I'm confident that this transcript is mostly correct. Except for the lack of appropriate punctuation and capitalization, it's pretty readable. And all things considered, I think this is an extraordinary achievement. Before we get to some of the area where today's speech recognition technology still needs improvement, we should pause and reflect on how good these programs have gotten to be.

Sometimes.

Speech-to-text (STT) programs are still heavily dependent on their "language model" -- their statistical appreciation of what words and word sequences are likely to occur -- and still find reverberant (and otherwise distorted) recordings difficult. I imagine that these are the factors that led Podzinger to return, as the first hit on my search this morning for Beijing, a passage at 0:28:42 of a sermon titled "Crown Him with Many Crowns", which it renders as:

... by the dot all eaten the curse and the -- in beijing this morning -- finest art so against -- has -- decade remote wooded delight When music scene it is my lord ...

Though I can't identify any particular theological error, this hardly seems like a suitable message to be delivered from the pulpit. When I listen to the appropriate section of the podcast, I hear it as:

(no one speaking) by the spirit of God calls Jesus a curse, and no one can say Jesus is Lord except by the spririt. So yes, you s- have said there came a moment in your life when you said Jesus is my Lord ...

The recording is a bit reverberant, and it's about topics that are not often featured in the newswire text that Podzinger's language model is apparently trained on, but it's not at all hard to follow for a human listener.

You can see what has happened, to some extent, if we line up the passages wordwise:

by the           dot  all  eaten the curse
by the spirit of God calls Jesus  a  curse

Here the "spirit" is missing, probably because the phrase "spirit of God" is spoken very rapidly, and the word sequence "God calls Jesus" has been rendered as "dot all eaten".

and the -- in  beijing this morning
and no one can say Jesus is Lord

This time "say Jesus is Lord" has been rendered as "beijing this morn(ing)".

The double hyphens in the Podzinger transcript represent unknown words, or rather (I presume) regions where none of the program's hypotheses reached its threshold of confidence. As this example indicates, the state of the art in assigning confidence ratings to recognition hypotheses is not very good.

  decade   remote wooded delight   When music    scene it is my lord
there came a moment  in  your life when you said   Jesus  is my Lord

Here "you said Jesus" has been rendered as "music scene it". I think we've seen enough to suspect that Podzinger is not yet ready to accept Jesus into its vocabulary, much less into its stony little silicon heart.

But no -- if we search for {Jesus}, we find 7,990 hits. Some are plausible, if not entirely correct. The 3rd hit I got, for instance, was at 0:04:05 in The Bible Podcast's reading of Genesis 38, which Podzinger rendered as:

... turned to prostitution and as a result she has become pregnant Jesus said Bring her out and let her be burned While they were bringing her route she sent word her father in ...

This is almost entirely correct, except that of course it's Judah, not Jesus, who is featured in the story of Tamar and Onan in Genesis 38.

Podzinger's first hit for {Jesus} this morning was at 0:09:50 of Rounders - The Poker Show for January 22, 2006, in a passage which it rendered as

... six names including daniel le grande do when jennifer harman and jesus ferguson and also -- -- had reaction doctor do we instead of tonight with the winner but at a -- that's ..

Not knowing much about poker, I figured this instance of "jesus" was another error, but in this case Podzinger had it right. At least the "jesus" part. My transcription of the corresponding stretch:

... six names including Daniel Negreanu and Jennifer Harman and uh Jesus Ferguson and also Robert Williams, and so had we actually talked to him two weeks ago instead of tonight, we wouldn't have uh chatted with that, so ...

So it seems that Podzinger is ready to accept Jesus after all, at least as the nickname of the poker player Chris "Jesus" Ferguson.

[I should make it clear that this post's focus on mistranscriptions of "Jesus" is just a humorous way to highlight some issues with STT technology. If you search Podzinger for "Jesus", you will certainly find plenty of examples where the word has been correctly recognized, and I certainly don't mean to suggest that Podzinger has any special problems with religious as opposed to secular words, or with Christian words as opposed to those associated with any other religion. Bill O'Reilly need not get indignant.

On the other hand, the examples cited above are exactly those that came up as I explored the Podcaster service this morning in writing this post. I first searched for "Beijing", and checked two of the top three hits, one of which looked good while the other looked bad; having observed some problems in rcognizing the word "Jesus" in one of the podcasts, I tried a search for "Jesus", and again checked two of the top three hits. The one I left out (from 36:56 of the PK & J Show)was transcribed by Podzinger as "embedded this can all learn sues outside community of (%EXPLETIVE) jesus was laying in finance -- awesome -- And The ..." Since this is a family weblog, you'll have to find out for yourself what the transcription should actually have been. Suffice it to say that "Jesus" is one of the few words that Podzinger got right. ]

.
Posted by Mark Liberman at 10:26 AM

Further evidence of declining university standards?

Don't get me wrong, I'm thrilled that my University's Office of Graduate Studies and Research and the Graduate Student Association are co-sponsors of the annual All-Grad Research Symposium, now in its sixth year. I think it's a great opportunity for "graduate students from all fields to present their work to peers in a professional and intellectually stimulating atmosphere." Best of all, registration is free and includes breakfast, lunch, and a wine reception. How cool is that?

Not cool enough to ignore the fact that the keynote speaker is none other than Richard Lederer -- nor, apparently, cool enough for Lederer to list this appearance in the Upcoming Speaking Appearances section of his Have Tongue, Will Travel page. Just reading Lederer's own bio is enough to make you wonder: what in the world were they thinking? As I and others have noted several times before, Lederer may be an award-winning punster and quick with a dictionary or two, but he's certainly no linguistic scholar. But perhaps the graduate student masses just want to be entertained.

[ Comments? ]

Posted by Eric Bakovic at 01:01 AM

January 24, 2006

A four-letter word beginning with F...

What's a four-letter word beginning with F that's guaranteed to make everyone laugh?

Back in November, Barbara and I went to see The Capitol Steps perform their satire show live in Harvard's Sanders Theater. At the beginning of every event in the Sanders Theater there is a calm-voiced announcement over the speaker system telling everyone to turn off their cell phones and to look round and check the location of the nearest exit to where they are sitting. But on this occasion the voice continued: "In the event of an emergency, do not leave the theater. Remain in your seats, and wait for FEMA to arrive.

And of course the place erupted in laughter. Everyone roared. That's what last fall's bumbling in New Orleans by Michael "Heckuva Job" Brown has done to the reputation of a once important Federal agency. The very word is a joke. Roughly like smut.

Posted by Geoffrey K. Pullum at 05:40 PM

A four-letter word beginning with P...

Scott Adams revealed on The Dilbert Blog yesterday that his editor has objected to a panel in an upcoming strip because a familiar four-letter word beginning with P appeared in the dialog. Want to guess?

The word was porn. Believe it or not, the recommended change was to another four-letter word, smut. A rather old-fashioned word for a character in Dilbert to use, I would have thought. Tom Lehrer wrote a wonderful song under that title, and even back then (in the sixties), the word was sort of jocular.

Posted by Geoffrey K. Pullum at 05:36 PM

Blawgs, phonolawgically speaking

Mark Liberman commented last week on some complaints lodged against the neologism blawg, meaning 'a law-related blog.' David Giacalone of f/k/a dismissed the term as "an insider pun by a popular lawyer-webdiva (which should have been passed around and admired briefly as a witty one-off)." (The lawyer-webdiva in question, by the way, is Denise Howell of Bag and Baggage, who began keeping a "blawg roll" in early March 2002. An article in Legal Times gives Howell sole credit for the coinage.)

Mark noted that blawg is "an unusual sort of portmanteau word" — unusual in that "the sound of one of the words (law) is completely contained within the sound of the other word (blog)." I'd agree that the blending of law and blog into blawg is a peculiar formation (even for a "witty one-off"), but not simply because one of the words is phonologically contained within the other.

First, let's consider the structural possibilities for "blends" or "portmanteaus" — words that combine two or more forms, with at least one of the forms getting shortened in the process. In "Blends, a Structural and Systemic View" (American Speech 52:1/2, Spring 1977, pp. 47-64), John Algeo discerns three main categories of lexical blending:

  • Blends with overlapping (and no other shortening): slanguage < slang + language, sexpert < sex + expert
  • Blends with clipping (and no overlapping): fanzine < fan + (maga)zine, smog < sm(oke) + (f)og
  • Blends with clipping and overlapping: motel < mot(or) + (h)otel, feminazi < femin(ist) + Nazi

For all three types of blending, the majority of items combine their components sequentially: a segment of the first word is followed by a segment of second word, with possible overlapping between the two segments. But Algeo notes that blending sometimes occurs through the insertion of one form into another, again with possible overlapping of segments.  Following the terminology of Harold Wentworth, Algeo dubs such inserted blends "sandwich words." Note that sandwich words, like other blends, still require that at least one form is shortened in the process of combination; if there's no shortening then it's simply a case of infixation, like fanfriggintastic (expletive infixation) or scrumdiddlyumptious ("diddly" infixation with partial reduplication).

Here are examples of sandwich words given by Algeo to fit each of his three categories:

  • Overlapping: autobydography < autobiography + by dog, in-sin-uation < insinuation + sin
  • Clipping: chortle < ch(uck)le + (sn)ort, miscevarsitation < misce(gen)ation + varsit(y)
  • Clipping and overlapping: slithy < sli(m)y + lithe, ambisextrous < ambi(d)extrous + sex

Though two of Lewis Carroll's classic portmanteaus — chortle and slithy — are represented among Algeo's sandwich words, most are what Giacalone would call "witty one-offs," or what linguists call nonce formations. Thus we have autobydography 'an autobiography written by a dog,' in-sin-uation 'the insinuation of sin,' miscevarsitation 'marriage between attendants of different colleges,' and ambisextrous 'sexually ambidextrous.' (Michael Quinion notes that ambisextrous is not so nonce, as it dates from 1929 and "has achieved a modest continuing circulation.")

Every generation seems to create its own sandwich words, but we are blessed (and cursed) to live in an era where every nonce formation is likely to be recorded on some website somewhere, occasionally gathered up in such repositories of fleeting usage as Urban Dictionary, Langmaker, or most recently Merriam-Webster's Open Dictionary. (Such collaborative enterprises tend to be utterly chaotic, as opposed to the more methodical cataloguing of innovative forms by Grant Barrett at Double-Tongued Word Wrester or Mark Peters at Wordlustitude.) It's easy enough to find latter-day sandwich words on these sites, e.g.: satiscraptory = satisfactory + crap, fantASStic = fantastic + ass, and specyackular = spectacular + yack. Elsewhere one can find sandwich words of a less profane nature, e.g.: specTECHular = spectacular + tech, fan-Kaz-tic = fantastic + Kaz (i.e., the baseball player Kaz Matsui), and ter-RIF-fic = terrific + RIF ("Reading is Fundamental").

Certain words seem to lend themselves to sandwich blending. Once ridonkulous and other silly variants of ridiculous began to spread several years ago, the word ridiculous became a prime target for nonce sandwich blends. Urban Dictionary is full of examples like redorkulous, redrunkulous, reboozulous, and recrunkulous (in these cases, the blending has led to a reanalysis of the first syllable as re-). In fact, ridonkulous itself has been interpreted as a blend of ridiculous and donk(ey), though this strikes me as an ex post facto rationalization. Another popular target among left-leaning Netizens is the word Republican, which gets the sandwich treatment in such epithets as Rethuglican, Resmuglican, Repiglican, Redumblican, Rebooblican, Reporklican, Repooplican, Reputzlican, Repukelican, etc., etc.

The recipe for such sandwich words is pretty constant: take a polysyllabic word and replace the primarily-stressed syllable with a punchy monosyllabic word of your choice. It's clear, however, that blawg is a different beast, morphophonologically speaking. Denise Howell took a monosyllabic word (blog) and inserted another monosyllable (law), such that the "bread" for the sandwich consists merely of one initial consonant (b-) and one final consonant (-g). I know of no other sandwich word so dominated by its filling.

What's more, the two component words are maximally overlapping for some speakers and nearly so for others. For speakers with the cot-caught merger of low back vowels (such as most residents of the western U.S.), the vowel in blog merges with the vowel in law, with the result that blawg is homonymous with blog. Speakers without the merger tend to use the cot vowel for most words ending in -og, with the exception of dog and occasionally other common words. Blog is not (yet!) common enough to be subject to this lexical diffusion and thus remains distinct from blawg for most speakers lacking the merger.

The low back merger is clearly a point of confusion in the blawg wars. The editor of Blawg Review evidently has the merger and doesn't seem to be aware that others might not:

Interestingly, the word blawg is pronounced the same as the word blog, so there is absolutely no confusion in oral communication. In the written word, blawg is easily intelligible and conveys additional meaning to readers and to search engines.

Conversely, David Giacalone doesn't have the merger and expressed shock that there are those who do:

Frankly, I was surprised to read that you pronounce "blog" and "blawg" in the same way... That underscores the notion that the word is just an insider gimmick, because the two words don't need to be homophones. Merriam-Webster online, for example, does not pronounce "blog" in a manner that makes it homophonic with "blawg." ... I believe most "blawgers" pronounce the words blawg and blog differently -- otherwise, making the distinction seems pointless. If one has to pronounce them the same way for the uninitiated to understand what you are talking about, you are making my confusion argument for me.

Both sides of this argument seem odd to me. The Blawg Review editor presents it as a virtue that blog and blawg are pronounced the same (for everyone, he thinks). I'd have guessed that this would be a strike against blawg, since the distinction with blog becomes difficult to make in spoken interaction, potentially leading to more confusion, not less. (Indeed, Giacalone links to a post by Trevor Hill, who also has the merger, but sees it as a drawback to blawg: "it's homophonous with blog, making it useless in actual English speech.")

On the other hand, I don't think that the presence of the low back merger for some speakers renders the blog-blawg distinction "pointless," as Giacalone would have it. It would simply make blawg a sandwich blend with maximal overlap, like in-sin-uation, fantASStic, ter-RIF-fic, or ri-dick-ulous. True, the punniness of those polysyllabic blends can be driven home by exaggerating the stress on the inserted segment, a prosodic device that isn't available for blawg (unless a peculiar contrastive pronunciation developed, like "buh-LAW-guh"). But blawg has been doing just fine as a visual blend, regardless of whether readers think it's pronounced the same as blog or not. Since the term has thus far existed primarily in online interaction in the blawgosphere, complaining about its potential pronunciation makes about as much sense as complaining about the typographical conventions of l33t. But if blawg really does start taking off in spoken discourse, it will be interesting to see if these arguments over the word's pronunciation become intensified.

A companion to the phonological argument over blawg is the aesthetic one. Hill thinks the word "looks ugly," and Giacalone is troubled by the similarity to dawg as eye dialect for dog. (I say eye dialect because, as I mentioned, even speakers lacking the low back merger tend to use the caught vowel for dog. But dawg may also represent a pronunciation spelling if it represents an exaggerated pronunciation of the vowel; cf., rock vs. rawk.) Giacalone writes:

Most members of the public are far more likely to think its a take-off on the incredibly overused "dawg" for dog, rather than a reference to law-related weblogs. Insiders know what it is, outsiders do not and are very likely to view it as adolescent jargon.

Personally, I think most "outsiders" are perceptive enough to avoid seeing blawg as merely "adolescent jargon." Surely context is key. I can't imagine many readers would have difficulty distinguishing between, say, "Blawgs can be used for practitioners to give information about what is happening in his/her area" on the one hand, and "Kewl blawg, dood!" on the other. And if there are any concerns about misconstrual, one can always opt for the more orthographically distinct bLAWg. Aesthetically, though, that's pretty darn odd-looking.

[Update #1: On a side note, Karen Davis emails to comment on the awkwardness of the above quote from the Maryland Bar Bulletin: "Blawgs can be used for practitioners to give information about what is happening in his/her area." As Karen notes, the writer "puts 'practitioner' in the plural and then *still* uses the clunky "his/her" instead of the natural — and totally permissible — 'their.'" I suspect this is simply an editing error, since the previous sentence uses the singular "practitioner." Or perhaps it's a case of pronominal hypercorrection brought upon by an aversion to singular they.]

[Update #2: John Lawler contributes the following:

A data point -- I come from DeKalb, IL, just above the Northern/Midland isogloss, and have distinguished between 'cot' and 'caught' all my life. Indeed, my surname constitutes a test case, since to me it rhymes with 'caller', 'taller', and 'hauler' but *not* with 'collar', 'dollar', or 'holler'.

However, *I* pronounce *both* 'log' and 'blog' (as well as 'dog', 'hog', 'fog', 'frog', 'smog', and 'bog') with the same vowel as 'caught' (open O), and *never* with /a/.

By contrast, I *always* have /a/ in 'cog', and I'm ambivocalic with 'slog', 'sog(gy)', 'tog(gle)', and 'trog(lodyte)'.

So 'blog' and 'blawg' do mean the same thing for me, and in fact when I first saw 'blawg' I assumed it was just an eye dialect spelling of 'blog', just as 'dawg' is of 'dog'.

I guess the moral is that Paper's Law [1] applies here.

[1] Named after my former colleague Herb Paper, the law is succinctly stated as "It's not that simple".
]

[Update #3: More from David Giacalone, Denise Howell, and Mark Liberman.]

Posted by Benjamin Zimmer at 11:44 AM

Unlike dangling

Last November Bob Tess mailed me from Macomb County in Michigan to bring to the attention of the Fellowship of the Predicative Adjunct a nice dangling adjunct case that does not involve a participle. I meant to comment on this at the time, but it got lost in the shuffle. It's about a piece on NPR's Morning Edition. Says Bob:

In a report on the lack of excitement generated by the Lewis and Clark expedition's bicentennial, NPR's Kirk Ziegler reported: "Unlike Lewis and Clark however, people do want to talk about the budget deficit."

Bob adds that Messrs. Lewis and Clark might have been very pleased to talk about a budget deficit of proportions that would have seemed like science fiction to them, only they can't, on account of being dead. That is, he found he was forced to understand Lewis and Clark as an understood subject of want rather than as object of about. He heard the sentence as saying that people want to talk about the budget deficit but Lewis and Clark don't. We were supposed to hear it as saying that people want to talk about the budget deficit but they don't want to talk about Lewis and Clark.

Unlike is an interesting word. It hovers on the boundary between adjectives and prepositions. When it was formed it must have been an adjective, because un- doesn't really attach to anything else. (You don't find *unbetween, *unover, *unwith.) But like now acts a lot more like a preposition in a number of syntactic ways, and unlike has been left in an odd position, not knowing whether to follow the syntax of its root or the syntax suggested by its derivational prefix (as it were; I'm anthropomorphizing lexemes here, which is a bit ridiculous, but I hope you see what I mean).

The relevant difference is that adjective phrases are predicative and mustn't be left to dangle with nothing to predicate about, while preposition phrases are allowed not to be predicative. So you can contrast the behavior of ahead (a preposition, albeit of the kind that does NOT take a noun phrase complement) with that of asleep (an adjective). Ahead doesn't need an understood subject, but asleep does:

  •  Ahead, there was nothing but the open road.
  • *Asleep, there was nothing but the open road.

Which way does unlike go? You decide. It seems to me to be on the cusp. The question is whether you found you read the NPR sentence the way Bob Tess did, or the way that the NPR scriptwriter intended. I guess I lean in the same direction Bob does, which supports the view that unlike a (highly anomalous) adjective.

Why do I say it's anomalous? because adjectives hardly ever take noun phrase complements, but unlike does.

Posted by Geoffrey K. Pullum at 09:59 AM

January 23, 2006

Wordplay's big splash at Sundance

A couple of months ago we were pleased to bring you the news that Patrick Creadon's documentary Wordplay had been accepted into competition at the 2006 Sundance Film Festival. Creadon's film focuses on New York Times crossword guru Will Shortz and his cultish followers, as well as providing a glimpse into the world of competitive cruciverbalism. Now it's Sundance time, and the buzz from Park City is quite promising.

As I suspected, noted crossword nut Bill Clinton is among the celebrities to make an appearance in Wordplay. The AP reports that other "self-professed word nerds" appearing in the film are "Daily Show" host Jon Stewart, folk-rock duo The Indigo Girls, and brainy New York Yankees pitcher Mike Mussina. (Stewart sounds reliably manic: he can be seen "assaulting the Times crossword, shouting 'Come on, Shortz! Bring it!'")

But the film's real excitement derives not from celebrity cameos but from its depiction of the American Crossword Puzzle Tournament, which Shortz directs. According to the AP, "Sundance crowds were so caught up in the film's footage of last year's crossword tournament that viewers groaned over a bitter agony-of-defeat moment in the dramatic finale."

Prospects for a distribution deal are looking good. An article on indieWIRE says that representatives from four independent distributors (Picturehouse, Warner Independent, Fox Searchlight and Roadside Attractions) expressed interest at a Sunday brunch with the filmmakers. Though no deal has been announced yet, it's looking more and more likely that Wordplay will be coming to an art house (or at least a video store) near you.

(For first-hand accounts of the Sundance excitement, be sure to follow the blogs of crossword champs Ellen Ripstein and Trip Payne, as well as Diary of a Crossword Fiend.) [Update: Tyler Hinman and Stella Daily are also blogging from Sundance.]

[Update, 1/25: IFC has acquired the distribution rights to Wordplay for $1 million (indieWIRE, Hollywood Reporter).]

[Update, 1/27: The website for the Sundance Channel has an interview with Shortz and Creadon, with clips from the movie. (Via Diary of a Crossword Fiend.)]

Posted by Benjamin Zimmer at 10:55 PM

A rejoinder from our president

I am grateful to President George W. Bush for pointing out to me that there have been other occasions when he used impermissibly antecedentless reflexive pronouns. One widely quoted remark of his was: "when I'm talking about myself and when he's talking about myself, all of us are talking about me." The second myself clearly has no antecedent, as he notes in his letter to myself.

However, this does not mean that we are dealing with anything other than sporadic slips. The president fully agrees with me that antecedentless reflexives are indeed ungrammatical in his dialect, and he lends no support at all to the contrary opinions of Chris Culy. So I hope that clears up the matter of the reflexives.

President Bush does, however, take issue with the suggestion that "Is our children learning?" is ungrammatical, and he makes a good point. First, this sentence gets more that 62,000 Google hits now, and frequency should count for something in linguistic inquiry. But second, as he himself pointed out in a lecture given at the Radio-Television Correspondents Association 57th Annual Dinner, the example has been misanalyzed:

Then there is my most famous statement: "Rarely is the question asked, is our children learning." Let us analyze that sentence for a moment. If you're a stickler, you probably think the singular verb "is" should have been the plural "are." But if you read it closely, you'll see I'm using the intransitive plural subjunctive tense. So the word "is" are correct.

As always, Language Log are happy to correct the record on those rare occasions where some grammatical subtlety slips past ourselves. Particularly when the wronged party is the leader of the free world and could quite easily blow Language Log Plaza straight to hell with a cruise missile.

Posted by Geoffrey K. Pullum at 01:20 PM

Unheimisch op straat

Rita Verdonk, the Dutch Immigration Minister, has recently called for a national code of conduct forbidding the public use of languages other than Dutch. Apparently the city of Rotterdam already has such a code. According to the article in de Volkskrant,

'Nederlands praten op straat is heel belangrijk. Ik krijg van veel mensen mailtjes dat zij zich unheimisch voelen op straat’, zei de minister zaterdag op een VVD-congres over integratie in Rotterdam.

"Speaking Dutch in the street is very important. I get email from many people who feel uneasy in the street", said the minister Saturday at a VVD meeting on immigration in Rotterdam.

What is at issue is not a law, but a set of rules of conduct (what Ms. Verdonk calls gedragsregels) that would include a commitment to use Dutch in all interactions in public places. I think the Rotterdam code might be here, though given my extremely limited command complete ignorance of Dutch I'm not certain that I've found the one that Ms. Verdonk favors rather than some alternative or opposing proposal. [Later: Dutch correspondents confirm that this is indeed the Rotterdam Code.] This burgerschapscode ("citizenship code") lists seven points:

Wij Rotterdammers
1. nemen verantwoordelijkheid voor onze stad en voor elkaar en discrimineren elkaar niet;
2. gebruiken Nederlands als onze gemeenschappelijke taal;
3. accepteren geen radicalisering en extremisme;
4. voeden onze kinderen op tot volwaardige burgers;
5. behandelen vrouwen gelijk aan mannen en met respect;
6. behandelen homoseksuelen gelijk aan heteroseksuelen en met respect;
7. behandelen (anders-) gelovigen en niet-gelovigen gelijk en met respect.

We Rotterdammers
1. take responsibility for our city and for one another without discrimination;
2. use Dutch as our community's language;
3. accept no radicalism or extremism;
4. raise our children as full citizens;
5. treat women equally with men and with respect;
6. treat homosexuals equally with heterosexuals and with respect;
7. treat adherents of different religions and atheists equally and with respect.

The context of all this is the situation symbolized by the murder of Theo van Gogh.

The de Volkskrant article quotes Laetitia Griffith, a member of Amsterdam's College of Aldermen, and a native of Suriname:

'Ik vind dat te ver gaan. Amsterdam is een wereldstad met buitenlandse investeerders, die juist de tolerantie prijzen. Als ik met een vriendin Surinaams op straat spreek en wij veroorzaken geen overlast, dan is daar niks mis mee.'

"I think that goes too far. Amsterdam is a world city with foreign investors. If I speak Surinamese [Sranan?] with a friend in the street and we don't cause any trouble, there's nothing wrong with that."

The Amsterdam mayor, Job Cohen, has been quoted as calling Ms. Verdonk a "hot-head with a cold heart", though he later back-pedaled a bit.

I'm not clear whether the part about speaking Dutch on the street is being emphasized because Ms. Verdonk and her party are emphasizing it, or because it's the part of the proposed gedragsregels that her political opponents object to. I imagine (though I don't know) that points 5 through 7 of the cited code are goals that the VVD (a "liberal", i.e. right-wing, party) shares with the Dutch left, though not with all of the immigrant communities; while point 2 (and perhaps the interpretation of point 3) is where the "liberals" and the left part company.

[Thanks to Bruno van Wayenburg for a pointer to the de Volkskrant article. Apologies to the Dutch nation for my (mis)translations from their gemeenschappelijke taal. Bruno also contributed this observation:

The rules are attempts to take a tough stance on the, admittedly very real, problems of integration and social exclusion of significant groups of Moroccan, Antillian and Turkish descent, but I doubt if lapsing into 19th century style language suppression will do the trick. No response yet from the Frisian language minority in the North by the way, although columnist Remco Campert predicts that they might definitely declare independence now.

]

[ Readers should be aware that in this post I'm writing about both political and linguistic matters where I have little or no personal knowledge. I invite you to read the comments below and form your own opinions.]

[Marrije Schaake writes:

Thank you for your article on Language Log about Minister Verdonk's plans.

The discussion about it is (today) focussing on the larger explanation of the rules of conduct in the Rotterdam code.

On page 4 of the code you link to (which is the code in question in the whole matter, you found the correct one!), there's more about 'our shared language', and particularly that everybody should speak this language at work, at school, in the street and at the community centre. That's where the rub is: should people be allowed to speak their own language in the street?

The minister of course denies now she would like to implement a 'language police', but she does keep repeating the bit about getting mails form people who feel unheimisch in the street. And I bet she doesn't mean people who are upset about American tourists speaking English: it's Surinamese speaking Surinamese, Turkish people who speak Turkish, and most of all bearded Moroccan men in djellaba who speak Arabic or Berber.

To me, it's quite funny that she uses the word 'unheimisch', since that's a loan from German. It's a correct and accepted word, but still somewhat funny in view of the troubled relationship we used to have with the Germans. Speaking German in the street would have (at least) earned you frowns twenty or thirty years ago.

Michel Vuijlsteke adds this information:

How very, very ironic that the very word Rita Verdonck uses to describe "uneasyness" is a German word.

I'm from Flanders, the Dutch-speaking part of Belgium) and I'd never heard the term before, even if Taalunieversum says it is a generally accepted loan word (cfr. http://taaladvies.net/taal/advies/vraag/826/).

]

[Steve from Language Hat points out:

I was delighted to see that unheimisch is borrowed from German; by my count that makes a majority of loanwords among the content words in Ms Verdonk's quote: straat, mailtjes, unheimisch, minister, zaterdag (I guess you could just count the zater- if you were being strict), congres, integratie. Amazing how unaware these linguistic nationalists are.

Also, as far as I know there is no language called "Surinamese"; the main languages of Surinam are Dutch and Sranan (oddly, an English-based creole), and I suppose the latter is being referred to.

]

[Rob Malouf writes:

Interesting discussion of language attitudes in the Netherlands! One thing that struck me too is the irony of Verdonk's use of a German loanword. It's not just that it's a foreign word -- the Dutch have a very complicated relationship with the German language. As a non- native I won't claim to understand it, but I will note that when Dutch racists painted anti-foreigner slogans on a Turkish-owned gas station in my neighborhood, they did so in German. The use of a German word by Verdonk (or her correspondents) carries a lot of meaning.

I also noticed that in the story about this in the Algemeen Dagblad (a less politically-activist paper than the left-leaning de Volkskrant), Verdonk is quoted somewhat differently:

Veel mensen voelen zich niet prettig omdat op straat geen Nederlands wordt gesproken
Many people feel unpleasant because no Dutch is spoken on the street.

In a related article, they quote Leonard Geluk, a Christian Democrat in the Rotterdam city government, as saying:

Veel autochtone Rotterdammers voelen zich ’unheimisch’ als op straat buitenlands wordt gesproken
Many native Rotterdammers feel 'uneasy' if a foreign language is spoken on the street.

Here unheimlisch gets scare quotes, but "autochtone" goes by without notice. "Autochtoon" is the opposite of "allochtoon", which usually gets translated as "foreigner" or "immigrant". Technically, it's any first or second generation immigrant from a country outside of Europe besides the US, Canada, Australia, Indonesia, or Japan.

]

[Stefan Tilkov wrote:

"Unheimisch" is not a German word, at least not one that I (as a native German speaker) have ever heard. There is "unheimlich", which means "sinister", "strange", or "uneasy", there is "heimlich", which means "secret", and there is "heimisch", which means "at home" (in the sense of "heimisch fühlen" -- "feel at home").

I asked him what he makes of the Taalunie page on the topic, and he responded

A Google search for "unheimisch" in German pages only yields 231 results; in Dutch pages, it's 688. Not really significant, I guess ... the first few of the German results use "unheimisch" in the sense of "not at home".

My Dutch is almost non-existent, still: this document - the first hit for "unheimisch" in German - mentions

"MAAR/ABER: "unheimisch" is Pseudo-Duits, dit woord bestaat niet in het Duits. Er is alleen "unheimlich"

]

[Bruno van Wayenburg commented further:

Thanks, Mark, your story is quite accurate, as far as I can judge. Here are some comments, for your information (unless of course you want to change the log name into Dutch Language Policy Log):

Job Cohen didn't really back-pedal: he later declared that his 'hot-head' qualification was used in a different context in the interview, -more importantly- the journalist acknowledged this and apologized quickly and publicly. (However, Verdonk is predictably backpedaling now, as Marije Schaake mentions)

Although I noted it myself, I think Rob Malouf and (especially) Steve of Language Hat make a bit too much of the German loan unheimisch. It's recognizably German, a bit learned but quite an ordinary word to use (though apparently not for Belgians), probably something like 'Deja vu' in English.

Besides, as far as I can see, the issue is not so much linguistic nationalism, purism or even language at all, but tolerance of foreign cultures. Verdonk might as well have started the old discussion about head-scarfs again, using English loans.

German and Germany are not by far as loaded as they used to be (even leading magazines to declare Germany 'cool' again). Still, I think Dutch racists use German for an obvious reason: the Nazi associations. (Although I didn't ask them).

]

[Lane Greene wrote:

You've already had many e-mails on this, and this isn't really a correction so much as a perception, but the VVD wouldn't normally be considered "right-wing" in the European context, though it is "liberal" in Europe. Liberal parties tend to be something like what we call libertarian: small government, lower taxes, but also socially permissive. I'm sure you know a bit or more of this. In the context of language policy, though, surely it's the socially permissiveness, and not economic policy, that's at issue, and in this context the VVD isn't particularly right-wing at all. (European countries, including the Netherlands, have Christian Democratic parties for their social conservatism.)

I think the real story here is how, since Pim Fortuyn and Theo van Gogh were murdered, even the traditionally socially liberal parties (also including the standard center-left Labor party) are starting to be tempted by the anti-immigration bandwagon, though the socially left parties (including VVD and Labor) tend to dress it up as a need for "integration" of immigrants, not hostility.

It's easier, I think, to translate between Dutch and English -- or even Berber and Dutch -- than to translate between European and American political parties. By putting "liberal" into scare quotes in writing about VVD, I meant to clarify that this word doesn't mean in Europe what in means in the U.S. I guess that "libertarian" would be a better translation than "right-wing", since European liberal parties are also not generally counted as being on the right. But their small-government, tax-cutting, degegulation-oriented outlook tends to make them look Republican, even if their laissez-faire social attitudes don't... Maybe the best American translation would be "South Park Republicans", without the overall contempt for government.]

Posted by Mark Liberman at 06:23 AM

January 22, 2006

Truthiness in journalism

I didn't go to the voting session for the ADS "word of the year" this time, but I sent a proxy with Erin McKean, and when she told me that truthiness had won, I was surprised. It seemed to me to be an overly-specific reference to a particular episode of a TV show, which probably wouldn't have gotten much circulation at all if the NYT hadn't mis-reported it. However, I'm starting to think that I was wrong: truthiness might have some staying power.

Frank Rich's most recent column, "Truthiness 101", is behind the Times Select wall, but as usual is available elsewhere on the web. It contains the best explanation of truthiness that I've seen, in the form of an unusual journalistic admission:

It’s the power of the story that always counts first, and the selling of it that comes second. Accuracy is optional.

Rich intends to describe a state of affairs that he dislikes and blames on politicians. But in fact he's describing the ethos of journalism, as it's generally practiced rather than as it's traditionally preached.

The only thing that prevents the complete fictionalization of journalism, it seems to me, is the adversarial process of complaints from powerful people and contradictory stories from alternative sources, with the implicit threat that these pose to journalistic reputations. The introduction of weblogs into this process is presumably an annoyance for traditional journalists. The task of weaving the raw fibers of truth into an attractive tapestry of truthiness is difficult enough, without millions of bloggers constantly picking at the fabric.

(And the reason that science journalism is so particularly bad, I think, is that scientists have never been especially powerful, and few of them have had easy access to public channels of information. )

From a blogger's point of view, the truthiness of the mainstream media is simply part of what H.L. Mencken called "the daily panorama of human existence", which

is so inordinately gross and preposterous, so perfectly brought up to the highest conceivable amperage, so steadily enriched with an almost fabulous daring and originality, that only the man who was born with a petrified diaphragm can fail to laugh himself to sleep every night, and to awake every morning with all the eager, unflagging expectation of a Sunday-school superintendent touring the Paris peep-shows.

Rich ends his column by raising the curtain on a particularly juicy scene:

Fittingly enough against this backdrop, last week brought the re-emergence of Clifford Irving, the author of the fake 1972 autobiography of Howard Hughes that bamboozled the world long before fraudulent autobiographies and biographies were cool. He announced that he was removing his name from “The Hoax,” a coming Hollywood movie recounting his exploits, because of what he judged its lack of fidelity to “the truth of what happened.” That Mr. Irving can return like Rip van Winkle after all these years to take the moral high ground in defense of truthfulness is a sign of just how low into truthiness we have sunk.

That's a robustly truthy peroration. It might even be true. Rich seems to be referring to a paragraph in Ben Sisaro's "Arts, Briefly" columns from 1/16/2006:

"The Hoax," a movie based on Clifford Irving's memoir about his infamous publishing scam - his 1972 "autobiography" of Howard Hughes - is not to be released until later this year, but Mr. Irving has already asked that his name be removed from the credits as its technical consultant. Mr. Irving made the request in a brief letter recently sent to Mark Gordon, one of the producers, and copied to others on the project, including the director, Lasse Hallstrom. In the letter he gave no reason for his decision, but in a recent telephone interview said: "My feeling, based on the script, is that there was more concern for the kind of cigarettes I smoked and the type of suitcase I carried than there was for the truth of what happened." The film's producers responded last week, saying in a statement issued by the studio, Miramax Films: "Clifford Irving's book, 'The Hoax,' contributed greatly to Bill Wheeler's screenplay. Throughout development and production, we reviewed Mr. Irving's notes and incorporated many of them into the script. We deeply regret that he feels this way in advance of seeing the finished movie." Mr. Irving, who spent 16 months in prison for his involvement in the fraudulent Hughes memoir, has also taken umbrage with the film's characters, including his own (portrayed by Richard Gere), as being largely unlikable. PAT H. BROESKE

The implication seems to be that Irving thinks the movie is truthful, in matters like cigarette and suitcase brands, but not truthy, in terms of his (un)likability. If so, then he's taking the moral high ground in defense of truthiness, not truthfulness. Though I admit it makes a better story the other way around.

Anyhow, you can see that the the word truthiness is showing signs of liveliness, and maybe even of life.

Posted by Mark Liberman at 12:13 PM

Who let the 'n' in?

I think it was the Dutch, actually, though the influence is an indirect one. Victor Mair emailed to ask why the language and people of Shanghai are known in English as "Shanghainese" (210,000 Google hits vs. 25,500 for "Shanghaiese"). The response that first comes to mind for most linguists, as Victor knows, would reference the universal preference for consonant-vowel alternation, the resulting uneasiness about vowels in hiatus (i.e. vowel-vowel sequences across morpheme boundaries), and the status of coronal (i.e. tongue-tip) consonants as the least-marked option for consonant epenthesis in repairing cases of hiatus.

However, Victor also cites these other counts:

Shandongese Shandongnese Vietnamese Vietnamnese
236
1,760
34,500,000
1,230

Now, many Chinese languages/dialects would pronounce Shandong with a final nasalized vowel rather than a velar nasal, but that's not the way it works in the English version of the place name, so why is "Shandongnese" with instrusive -n- preferred by 7 to 1?

I haven't done a careful study of this -- nor have I checked carefully to find the existing careful studies that may well exist. But my guess is that this starts with the analogical shadow cast by the place names ending in 'n' -- Japan, Taiwan, Canton, Bhutan -- whose adjectival forms (and the corresponding language names and/or ethnonyms) add '-ese' -- Japanese, Taiwanese, Cantonese, Bhutanese. Then there are the cases where a final syllable is elided in the place names to get adjectival forms that happen to end up ending in '-nese': Chinese, Lebanese.

Finally -- and most relevantly -- there are some long-established cases where there is an intrusive 'n': Java → Javanese, Sunda → Sundanese, Bali → Balinese, etc. The oldest of these seems to be Javanese, which the OED traces back to 1704:

1704 CHURCHILL Collect. Voy. III. 724/1 The Javaneses and Mardykers.

and which may derive from an earlier Javan:

1606 SCOTT (title) An exact Discovrse..of the East Indians, as well Chyneses as Iauans.

The preference for -ese as the adjectival ending for places in the "East Indies" presumably reflects the influence of Dutch, which also (I think) regularly has intrusive -n- in such words: Javanees, Sundanees, Balinees, etc. I don't have access to a historical dictionary of Dutch -- is there one? -- but I assume that these words date back at least to the early 17th century, if not the 16th. I also don't know whether the use of intrusive -n- to repair hiatus is the general pattern in Dutch, or whether (as in English) it's just one of many quasi-regular local options.

Anyhow, Shanghainese follows this well-established pattern, though the OED's earlier citation is from 1964:

1964 Asia Mag. 12 July 22/3 The Chinese [in Hong Kong]..speak no less than seven tongues -- Cantonese, Hoklo,..Shanghainese, Chiuchow and Fukienese.

As for "Shandongnese" and "Vietnamnese", I guess that people have started to re-analyze the ending as -nese rather than -ese. In the case of Shandongnese, there is no established English term, so the coinage is a recent one, and the 7-to-1 preference for "Shandongnese" over "Shandongese" apparently is telling us about the state of the net's collective neural net in this connection, so to speak. In the case of "Vietnamnese", there has been a standard English form "Vietnamese" for some time, so that "Vietnamnese" has the status of a rare mistake -- though I was surprised to learn that the OED's earliest citation is from 1947:

1947 H. R. ISAACS New Cycle in Asia viii. 157 Matters came to a head in Hanoi on December 19, 1946, when clashes in that city resulted in generalized warfare... The French charged that the Vietnamese were the instigators of the outbreak.

It's interesting that we chose the Dutchlike form (compare Vietnamees) rather than the Frenchy one (compare Vietnamien). They gave us their war, but not their word.

[Update: Steve of Language Hat emails to point out an obvious fact that I'd forgotten about:

I imagine the reason the OED's earliest citation is from 1947 is that until WWII and Ho's independence movement, there was no such thing as "Vietnam" -- what we think of as Vietnam was three provinces of French Indochina, and you'd use Tonkinese, Annamese/Annamite (interesting that there was no settled form), or Cochin-Chinese as called for.

He goes on to observe

Interesting also that the OED has no entry for Cochin-Chinese; they do have one for Cochin-China, which is defined as "Name of a country in the Eastern Peninsula"! I had never heard or seen that phrase used in that way, but a little googling turned up "Geography of the Eastern Peninsula: comprising a descriptive outline of the whole territory, and a geographical, commercial, social and political account of each of its divisions, with a full and connective history of Burmah, Siam, Anam, Cambodia, French Cochin-China, Yunan, and Malaya," by Henry Croley (1878). Forgotten geography...

]

[Update #2: Rogier Blokland writes:

As an avid reader of Language Log I couldn't resist answering your question ('I don't have access to a historical dictionary of Dutch -- is there one?'). There is one indeed, and it might be larger than the OED, or at least that's what we like to think.

The 'Woordenboek der Nederlandsche Taal' has some 43 volumes and is a historico-descriptive dictionary of Dutch from 1500 to 1976 ('Het Woordenboek der Nederlandsche Taal (WNT) is een historisch, wetenschappelijk, beschrijvend woordenboek van het Nederlands van 1500-1976').

See ( link).

I don't have access to one at the moment, nor have I found one on the web (the German Grimm is!), so I cannot check when 'Javanees' or 'Balinees' was first recorded, but I can have a look one of these days and get back to you, if you're interested.

I'll look forward to learning from Prof. Blokland, or another reader, what the WNT says about the antiquity of Javanees and similar words. ]

[Ben Zimmer observes

The historically older forms in Dutch are "Javaan" (pl. "Javanen") to refer to a Javanese person and "Javaansch" (now usually spelled "Javaans") to refer to the Javanese language or ethnic group. I know that both of these forms were in use at the time of the first Dutch expedition to Java in the 1590s. I don't have sources at hand, but I seem to remember that this and similar ethnonyms were borrowed from the Portuguese, who may have based their forms on Latin ("javana, javanensis"?).

He added in a later note

The historical Portuguese ethnonym is actually "Jãos" (= 'Javanese people'), used by João de Barros in 1553 (mentioned in the Hobson Jobson entry for "Java"). That appears to be based on Arabic "Jawi", which was a broader term used to refer to inhabitants of island Southeast Asia.

Hmm, looks like Language Hat has covered this...

http://www.languagehat.com/archives/002135.php quoting: http://www.crise.ox.ac.uk/pubs/workingpaper14.pdf

]

[And Gene Buckley observes

With regard to your post on intrusive "n", there's also of course the "l" in some African placename adjectives, such as Congolese and Togolese. These seem to follow "o" and I recall the story that the analogy is based on Angolese (a word that exists but which I'm not use to hearing, given the prevalence of Angolan). There may be other models as well.

]

[Update 1/23/2006: Marina Muilwijk writes:

In your post on "Who let the 'n' in" you wonder about the date of Javanees and Balinees. Since I'm sitting in a library, I could easily look them up in the Woordenboek der Nederlandsche Taal.

"Javanees" can't be found anywhere in the WNT. The first example for the form "Javaansch", that Ben Zimmer mentions, is from 1688, but that is in an example for a not very relevant word.

The earliest example for "Balineesch" is from 1726 (as an example with "vrouwentimmer (women's quarters)").

So I'm afraid the WNT isn't of much help here.

]

Posted by Mark Liberman at 09:32 AM

January 20, 2006

Guest post: Getting ourselves in trouble

In response to Geoff Pullum's post "People that would do ourselves harm", Chris Culy submitted the following:

Darrell Waltrip, successful NASCAR driver and sports commentator, had this to say about talking and getting into trouble:

"I had a reporter one time tell me, 'Waltrip, you're a great interview, but you talk too much,"' Waltrip said. "He told me I talked and talked and talked, and eventually I'd say something that would get myself in trouble.

"I was always like that. Talk, talk, talk, talk."

So, what did you think when you read that? Did you choke on your morning double mocha soyaccino and sputter, as Geoff Pullum might, "That's totally ungrammatical, in all dialects." Or did you calmly sit back and think to yourself, as Ricky Rudd fan DDniteOwl21 does, "Ignore him/them. Don't even risk saying something back that might get yourself in trouble. It's just NOT worth it."?

Unlike Geoff Pullum and more like Mark Liberman, I wasn't shocked by the reflexive pronoun in President Bush's statement (cited on Reuters here):
And so long as the war on terror goes on, and so long as there's a threat, we will inevitably need to hold people that would do ourselves harm.

The issue, as Geoff points out, is that

Reflexive pronouns like ourselves must (to put it roughly -- there are some codicils) have an antecedent earlier in the same clause, agreeing with it in person, number, and gender.

The codicils include the idea that stressed reflexive pronouns are not subject to this constraint, and there are certain other constructions that allow a reflexive pronoun without a same-clause antecedent. For example, Geoff pointed out to me in e-mail, that "between ourselves" in the sense of "confidentially" is one such construction. (More on that below.) And of course, other languages (e.g. Ewe, Japanese, Latin, and many more) do not necessarily have the same-clause antecedent constraint on particular pronouns. Those kinds of pronouns lacking the same-clause antecedent constraint have been referred to variously as "long distance reflexives," "non-clause bound reflexives," and "logophoric pronouns" — but that's another topic.

But English does have (some form of) the same-clause antecedent condition on its reflexives, so Bush's statement, as well as those by Waltrip and DDniteowl21, which are syntactically parallel to Bush's, I take to be one of those mysterious aspects of English that Geoff likes.

So, just for my own curiosity I tried to find more examples of unstressed ourselves used without a clausal antecedent. I did two kinds of searches. One type of search was to use Google to look for examples parallel to Bush's: ourselves as the direct object of a subject relative. The other type of search was to look for ourselves in a few (about two dozen) 18th-early 20th century works, fiction and non-fiction, that I happened to have scattered across my hard drive. Obviously, neither search was exhaustive of its kind. The 30 examples I found via Google are here, for brevity(?!) of this posting.

Here are some observations about the Google searches (examples parallel to Bush's utterance):

  • Looking for things that are/seem ungrammatical can lead to sites using fake English to fool search engines
    e.g. look for: "that did ourselves" -bush
  • The examples are rare, but they do exist, and they are probably (though I haven't tried to check) no rarer than some examples from syntax articles. Also, I didn't look for any other pronoun or any other position or function of the pronoun.
  • Many of the examples are from religious-themed pages.
  • Many, though not all, of the examples have another first person plural pronoun in the same sentence.
  • "X that GET ourselves Y" is easy to find, and seems among the most natural to me.
  • Past tense of the main verb of the relative clause is rare (though less so with GET).
  • I didn't find any examples with the perfect (present or past) in the relative clause.
  • I excluded examples where the relative clause modifies a noun that is predicated on a first person e.g. "One aspect of Human Selection says that we are a species that can reinvent ourselves and what I am saying is that we have done so before and it's a consequence of learning and surviving the lessons." (Source: http://groups.yahoo.com/group/Fountain_Society/message/3448")

The four examples I found from the historical works are very different, and none are of the object in a relative clause type. The numbers are too small to make any general observations. The examples are at the end of the post.

While all these people could have forgotten what the subject of the clause was, as Geoff suggested Bush did, it seems unlikely to me. Not that I have an explanation of what is going on, mind you. That would need more, and more systematic, data. And while I'm willing to admit that they could all be errors, it's worth taking another look.

Finally, a side benefit to me of looking for these examples was finding other things that piqued my curiosity. For example, "between us" meaning "confidentially" is pretty common in a cursory look on Google, but very uncommon in the few historical sources I looked at. Another example is that the ratio of among + pronoun to amongst + pronoun is significantly lower when the pronoun is a reflexive (ourselves, yourselves, themselves) than when it is a non-reflexive (us, you, them), and the ratios vary widely across persons. (Counts were the extremely crude measure of Google hits.) I didn't expect this disparity, given the synonymy of among and amongstamongst doesn't even rate a separate entry in the collegiate dictionary I looked in.

So, whether or not Bush's utterance is ungrammatical in some or all dialects, taking it as a mystery to be explored leads to some (potentially) interesting results.


Here are the four examples from historical sources of unstressed ourselves without a same-clause antecedent. None of them are in the above pattern discussed above (object of a verb in a subject relative clause). All the books were from Project Gutenberg. Emphasis on ourselves added throughout.

Source: Robinson Crusoe by Daniel Defoe

In this distress the mate of our vessel laid hold of the boat, and with the help of the rest of the men got her slung over the ship's side; and getting all into her, let go, and committed ourselves, being eleven in number, to God's mercy and the wild sea; for though the storm was abated considerably, yet the sea ran dreadfully high upon the shore, and might be well called DEN WILD ZEE, as the Dutch call the sea in a storm.

Source: Emma by Jane Austen

"Oh!" she cried in evident embarrassment, "it all meant nothing; a mere joke among ourselves."

Source: Thomas Jefferson's Autobiography (2 examples)

It was argued by Wilson, Robert R. Livingston, E. Rutledge, Dickinson and others ... [a long list of propositions, all starting with "that"] ... That it was prudent to fix among ourselves the terms on which we should form alliance, before we declared we would form one at all events: And that if these were agreed on, & our Declaration of Independance ready by the time our Ambassador should be prepared to sail, it would be as well as to go into that Declaration at this day.
... and the amendment against the reeligibility of the President was not proposed by that body. My fears of that feature were founded on the importance of the office, on the fierce contentions it might excite among ourselves, if continuable for life, and the dangers of interference either with money or arms, by foreign nations, to whom the choice of an American President might become interesting.

[Posted by Mark Liberman 1/20/2006 on behalf of Chris Culy]

Posted by Mark Liberman at 01:24 PM

Mr. Understanding-the-key-point Chen

This is a quick link to an interesting post at Pinyin News on "Chinese names, stroke counts, and fengshui", commenting on a WSJ story about a current Chinese fad for name changes:

The article repeatedly talks about this as if it were part of fengshui (fēngshui / 風水 / 风水). Coming up with a lucky name, however, traditionally belongs to fortune-telling, an entirely different field, though I suppose it’s possible that the two have become combined in modern China, where the traditional ways were broken.

Here's one of the name-changing narratives from the WSJ piece:

Chen Mingjian changed his given name in 1998, after he was fired from an investment consultancy he co-founded. A feng shui master said Mr. Chen's original given name, Jian, or "healthy," attracted money and success but mainly for his employers, not for himself. The name Mingjian, which means "understanding the key point," would give him a better chance at earning a personal fortune, the master said.

Mr. Chen now owns HollyHigh International Capital Co., a mergers-and-acquisitions consultancy with offices in Beijing and Shanghai. "Name changing ... gives you psychological assurance in difficult times," says Mr. Chen. "My career didn't take off until I changed my name."

Mr. Chen says eight of 32 of his classmates from prestigious Tsinghua University, most of them bankers and investment bankers, have followed his example and changed their names in recent years.

"I may change my name again if there are dramatic changes in my life," says Mr. Chen.

Pinyin News provides extra flavor by comparison to Jerome Rothenberg's Gematria Poems, and several (nonlinguistic but interesting) fengshui links.

Posted by Mark Liberman at 09:30 AM

Who let the blawgs out?

It started with an open letter to the editor of Blawg Review from David Giacalone of f/k/a, under the heading "let's make the word 'blawg' obsolete":

I've come to know you as an articulate lover of the English language. As far as I know, you don't say "lawgic" or "lawnguage," drink "lawtte," bill clawents, or use Blawk's Dictionary. You don't call lazy associates "slawkers," and have yet to dub Jack Abramoff a "lawbbyist."

You're usually a skeptic and no fan of "cute." If linguists called their weblogs "blings" (or argonauts called theirs "blargs"), you'd probably smirk. But, note: no one else uses such verbal oddities in naming their weblogs. So, Ed, why do you, and other otherwise-serious members of the legal community, refer to law-oriented weblogs as "blawgs?" Why take an insider pun by a popular lawyer-webdiva (which should have been passed around and admired briefly as a witty one-off) and help perpetuate it?

The Blawg Review's editor responded under the heading "Who let the blawgs out?". I'll let you read the closely-argued brief for yourself, but (s)he ends the argument by playing the trump card of lexicographic democracy:

Enough of this bloggerel, David. ... Fact is, 'Blawg' use in law firms is on the rise.

In an ex parte communication, Ed. wrote to me to ask for a link expert testimony on the matter:

There's probably a lot more that could be said about the portmanteau "blawg" from a linguist's point of view, and we'd be very interested in your thoughts.

I'll observe that "blawg" is an unusual sort of portmanteau word -- it is indeed "a word formed by merging the sounds and meanings of two different words, as chortle, from chuckle and snort". However, the sound of one of the words (law) is completely contained within the sound of the other word (blog). At the moment, I can't think of any other examples of that kind. (I'm sure there are some others, at least among what Giacalone calls "witty one-offs", but they don't come to mind at the moment. Send them to me and I'll add them here.)

Beyond that, I don't have much to contribute, except the standard quotation from Horace about norma loquendi:

...quid autem
Caecilio Plautoque dabit Romanus ademptum
Vergilio Varioque? ego cur, adquirere pauca
si possum, invideor, cum lingua Catonis et Enni
sermonem patrium ditaverit et nova rerum
nomina protulerit? licuit semperque licebit
signatum praesente nota producere nomen.
ut silvae foliis pronos mutantur in annos,
prima cadunt: ita verborum vetus interit aetas,
et iuvenum ritu florent modo nata vigentque.
...
mortalia facta peribunt,
nedum sermonum stet honos et gratia vivax.
multa renascentur quae iam cecidere cadentque
quae nunc sunt in honore vocabula, si volet usus,
quem penes arbitrium est et ius et norma loquendi.

But why should the Romans grant to Plautus and Caecilius a privilege denied to Virgil and Varius? Why should I be envied, if I have it in my power to acquire a few words, when the language of Cato and Ennius has enriched our native tongue, and produced new names of things? It has been, and ever will be, allowable to coin a word marked with the stamp in present request. As leaves in the woods are changed with the fleeting years; the earliest fall off first: in this manner words perish with old age, and those lately invented flourish and thrive, like men in the time of youth. ... Mortal works must perish: much less can the honor and elegance of language be long-lived. Many words shall revive, which now have fallen off; and many which are now in esteem shall fall off, if it be the will of custom, in whose power is the decision and right and standard of language. [translation from C. Smart]

The crucial phrase is "licuit semperque licebit signatum praesente nota producere nomen", meaning something like "it has always been allowed, and always will be allowed, to coin a word stamped with the current year". And as Horace observes, whether a new word is accepted as the coin of the realm, and for how long, is not determined by lawyers or linguists.

Posted by Mark Liberman at 08:57 AM

January 19, 2006

A guest rant: "All we want are the facts"

Just as I was posting my little complaint about Technology Review, this fine rant was submitted by Paul Kay for publication in Language Log:

Is truth really under attack in American society?

Or does it just seem that way? Does the apparent decline of respect for veracity in public discourse amount to just another case of "the country going to the dogs" -- as people seem to have been repeatedly rediscovering from time immemorial? In a penetrating essay on the significance of the Million Little Pieces dust-up (N.Y. Times. Jan. 17, 2006,"Bending the Truth in a Million Little Ways"), Michiko Kakutani (MK) makes a pretty convincing case that this time there may be a real wolf at the door.

MK highlights the no-big-deal attitude of author James Frey, his publisher Doubleday and his major promoter Oprah Winfrey to the fact that Frey's self-styled memoir contains some undeniable, and for that matter undenied, fiction, which makes him out to have been a significantly bigger loser and thug than he really was and thus fulsomely inflates the drama of his supposed redemption. (Oprah used the phrase "much ado about nothing" to appraise Frey's lies in the context of his emotional message.) MK proposes that this is not an isolated incident, not even an isolated case of an inflated memoir. She points to staged reality shows, phony biographies of both the gilding and tarring varieties, slanted opinion-mongering that masquerades as news, ... and specifically to a Bush aide's dismissive characterization of reporters "who live in the reality-based community ... we're an empire now and when we act we create our own reality." That declaration would be laughable if it wasn't terrifying.

MK also mentions several unfortunate turns of phrase that have become part of our everyday language; for example "virtual reality", "creative non-fiction" and the word "survivor" applied to those who have overcome bad credit or obesity. She could have added "war on terror(ism)", "They hate us for our freedom", and countless others, including Fox News's self-identification as "fair and balanced." One of the most alarming of the recent truth-obliterating usages, it seems to me, is "deniability." The first time I heard this expression was when Admiral John Pointexter, the Reagan administration's uber-point man on Iran-Contra, explained some administrative skullduggery as justified because it provided President Reagan with deniability. What Poindexter meant was that his own dishonesty was admirable because it enabled his boss to claim unassailably, albeit untruthfully, that he didn't know what was going on.

Lest the reader conclude too quickly that the fault for the decline of truth in public discourse lies exclusively with the political right, MK points to the culpability of the overwhelmingly left-leaning post-modernists of our humanities and social science departments. In "deconstructing" all historical texts and arguing that they merely express the power of the interests their authors represent, postmodernists apotheosize the obstacles to objectivity rather than combating them. She cites in this connection an elegant line of Stanley Fish's: "the death of objectivity 'relieves me of the obligation to be right'; ... it 'demands only that I be interesting.'" And the most visible language-based strategist of the current left, George Lakoff, urges liberals not simply to tell the unvarnished truth, but to "frame" issues in ways that will combat the propaganda of conservatives and so aid the achievement of liberal goals. Lakoff makes forcefully the familiar point that the facts never speak for themselves; they have to be recounted in human languages, which are fraught with connotation and presupposition. Fair enough. But if the truth is seldom plain and never simple, it is nonetheless the only truth we've got. I'm all for the achievement of liberal goals, but I worry a little that even some of my best friends seem to care less about plain facts than they used to.

Sgt. Friday, where are you when we need you?

[posted by Mark Liberman 1/19/2006 on behalf of Paul Kay]

Posted by Mark Liberman at 06:14 PM

Truthiness: a flash in the pan?

Has the golden era of truthiness already passed? The above graph, generated by BlogPulse, suggests that inhabitants of the blogosphere are already losing interest in Stephen Colbert's term for faux truth, less than two weeks after the American Dialect Society named it Word of the Year and Colbert launched his offensive against those who would deny him credit for the coinage. The recent controversies in the literary world over the pseudo-memoirs of James Frey and J.T. Leroy may have provided one final boost, as the word was featured prominently in commentary on the scandals in USA Today, the Chicago Tribune, and the San Francisco Chronicle. But even if truthiness is already on the wane, at least it was a fun ride.

Wordanistas are split on the viability of the term. Even before the ADS vote, noted etymologist Anatoly Liberman speculated on Minnesota Public Radio that truthiness might enter new dictionaries in the next year or two (presumably with a sense differing from the archaic meaning of 'truthfulness' found in the Oxford English Dictionary and the Century Dictionary). But even Liberman — clearly not a fan of "The Colbert Report" — disparaged the word as "rather ugly and rather useless." On the ADS mailing list Allan Metcalf wrote, "Like astronomers witnessing the birth of a nova, we are watching the nativity and infancy of a new word that has the possibility of becoming a permanent addition to the vocabulary." Ron Butters, on the other hand, countered that "truthiness is not a lexicological nova, it is a cute, stunt-wordy flash in the lexicographical pan and will go the way of Bushlips, and about as quickly." 

Bushlips, defined as 'insincere political rhetoric,' is something of an albatross for the ADS. It was named Word of the Year for 1990, the first year that the organization made such a selection, and it recalls the days of Bush the Elder's notorious backtrack from his "Read my lips, no new taxes" pledge. Needless to say, Bushlips quickly withered on the vine. Some might argue that selection as the Word of the Year isn't intended to be an indicator of future success — after all, the ADS has the category "Most Likely to Succeed" for that (in 1990 it was notebook PC and rightsizing, while in 2005 it was sudoku). Rather, the Word of the Year is meant to capture something of the annual zeitgeist, and both Bushlips and truthiness accomplished that for their respective years.

If truthiness is indeed headed for the neologistic scrapheap along with Bushlips and so many others, its rise and fall will at least serve as a fascinating case study for media observers. To that end, I'd like to add two more pieces to the puzzle. First, AP reporter Heather Clark, who Stephen Colbert declared as "Dead to Me" for neglecting to assign proper credit in the initial coverage, finally had her say in asap, the AP's "new multimedia service featuring original content designed to appeal to under-35-year-old readers." Clark was apparently inundated with emails (no doubt due to the call to arms issued by Adam Green on the Huffington Post), but she feels she is being unfairly maligned:

Now, listen up. Many of you insist that Colbert "coined" the word "truthiness."

In fact, Colbert himself is the epitome of the word — as in "truthy," not "facty." Mr. "Truthy" — witty intellectual that he claims to be [Huh? I thought he was supposed to be anti-intellectual. —BZ] — did not coin "truthiness," though he did popularize it. The Oxford English Dictionary has a definition for truthy that dates back to the 1800s and includes the derivation "truthiness."

And for the record, I did mention Colbert's show in the initial article that was read far and wide — or at least across New Mexico (though I did NOT credit him for inventing the word!). Seems, though, that the reference to Colbert was edited out by our national desk, which often tightens stories and drops information that they feel isn't all that important.

I have no problem believing that the omission of Colbert can be blamed on the AP's national desk editors, since the full version of Clark's story did emerge in some outlets at around the same time as the shortened version. But regardless of who was responsible for leaving Colbert's name out of the national desk story, the oversight turned out to be a godsend for "The Colbert Report," according to Stephen Colbert himself (the comedian, as opposed to the absurd on-air character of "Stephen Colbert"). Colbert was recently interviewed by San Francisco Chronicle TV critic Tim Goodman as part of the Chronicle's City Arts & Lectures program, and you can listen to all four parts of the interview here in podcast form. About halfway through Part 3, Colbert talks about the honor of having truthiness named Word of the Year. He goes on to say how ecstatic he was that the AP didn't mention him, since his character was in need of a persecution complex à la Bill O'Reilly.

And who knows? Maybe another successful neologism will emerge from all of this. I myself am quite fond of wordanista and will probably be using it for a long time to come.

Posted by Benjamin Zimmer at 02:33 PM

Technology review: giving new media a bad name?

As an MIT grad, I've been getting the MIT alumni magazine Technology Review in the mail for many years, and I generally read with interest and pleasure. But Tech Review is now undergoing some changes, to make it bloggier, or at least webbier: more "immediate", more "searchable" and more "interactive". Unfortunately, these changes are not all for the better, as I evaluate them so far, because the information seems also to be becoming less reliably true. At least, in a couple of recent cases, Tech Review presented false statements dealing with simple matters of fact, about which the truth could have been learned in a few seconds of Google searching.

Ironically, old-media pundits have been complaining for years that blogs and wikis and such, lacking editorial oversight, are not factually reliable. This was never true, in my experience -- bloggers who know their areas are more reliable, on average, than journalists are. But what seems to be happening now is that Tech Review is aiming for the immediacy of blogging and other new media, in a way that really does degrade factual reliability rather than improving it.

This is a shame, because the theory behind the changes seems otherwise to be a good one. The new editor, Jason Pontin, has smelled the same new-media coffee as everyone else in the industry, and writes:

Readers want information to be immediate, searchable, and easily customized, and advertisers are demanding accountability from the publishers who take their money. Put baldly, the era when publishers could rely on print magazines to satisfy their readers and build sustainable businesses is over.

In keeping with MIT's history of innovation and leadership, Technology Review has decided to invest more of its resources in interactive media.

Specifically, Pontin explains, they're going to:

• Decrease the frequency of the print magazine to bimonthly publication;

• Focus the print magazine on what print does best: present longer-format, investigative stories and colorful imagery;

• Dramatically increase the number of stories we publish on technologyreview.com every day;

• Expand the range of media we employ online to include podcasts, blogs, RSS feeds, and a variety of new technologies;

• Focus all our editorial content on the impact of emerging technologies and discontinue our coverage of the business models and financing of new technologies.

My first indication that the dramatic increase in online content might be sacrificing accuracy was a story by Kate Greene about machine translation, datelined Wednesday, January 18, 2006, under the headline "Repetez, en anglais, s'il vous plait". It contains this paragraph:

In 2005, DARPA also announced the Linguistic Data Consortium (LDC), a project aimed at acquiring huge amounts of translated documents, for distribution to Global Autonomous Language Exploitation, another DARPA-funded project in which computers will process the data. The intention of both of these initiative is to speed up the progress in machine translation. LDC is currently in the first year and will be transcribing speech from broadcast news sources and talk shows in Arabic, Chinese, and English, and also cataloguing text newswire feeds, Web news discussion groups, and blogs in those languages. For now, the project is focused mainly on data collection from these genres, with researchers in the computer and engineering science department at the University of Pennsylvania doing much of the work.

Now, the truth of the matter is that the LDC was founded in 1992, not 2005, and has been publishing materials for speech and language research since 1993. And the LDC's goals are quite a bit broader than collecting translated documents for MT research. And only a few of the LDC's staff members are associated with Penn's CIS department. And many LDC publications are authored by researchers from other institutions around the world. I know all this because I was the P.I. on the initial DARPA grant (which ended in 1995), and continue to direct the organization. Greene could have learned the facts about the LDC by asking Google for information on {linguistic data consortium history}, or poking around on the LDC web site for a few minutes, or by contacting someone at the organization.

These are small points, which I wouldn't care much about if I didn't have a personal connection to the work. I mean, 1992, 2005, what's 13 years in the grand tapestry of human history? In some ways, Greene's story is a step up from the July 2003 NYT story on an earlier DARPA MT evaluation -- which didn't mention DARPA at all, or the LDC for that matter, though it did track statistical MT back to 1999 or so. And I'm impressed that Tech Review allows comments on its online articles, so that readers can offer corrections.

However, it bothers me to think that when I read an article in Tech Review, I have to allow for the possibility that its "facts" are plainly and simply false, in ways that anyone can discover in a few seconds of research on the web. I don't have the time to check all the facts in every article that I read, so I like to think that in a reputable and well-edited publication like Tech Review, someone will have done that for me, at least to a first order.

Is this an isolated case of an unchecked mistake of fact? Apparently not. When I took a look at the Technology Review front page this morning, one of the prominently displayed blog headlines was "Lifespan for CD-Rs Around Two Years". The blog post behind the headline, by Brad King, quotes as if it were fact a 1/10/2006 IDG News Service story, which in turn quotes Kurt Gerecke, identified as "a physicist and storage expert at IBM Deutschland":

"Unlike pressed original CDs, burned CDs have a relatively short life span of between two to five years, depending on the quality of the CD. There are a few things you can do to extend the life of a burned CD, like keeping the disc in a cool, dark space, but not a whole lot more."

That's scary stuff -- think of all the crucial stuff naively saved on CD-Rs! But is it really true?

I checked into it a bit, not to get on Tech Review's case, but because I was genuinely worried about all the crucial data that I have backed up on CD-Rs. And apparently, it ain't necessarily so. The wikipedia article on CD-Rs says:

There are three basic formulations of dye used in CD-Rs:

  1. Cyanine dyes were the earliest ones developed, and their formulation is patented by Taiyo Yuden. Cyanine dyes are mostly green or light blue in color, and are chemically unstable. This makes cyanine discs unsuitable for archival use; they can fade and become unreadable in a few years. Many manufacturers use proprietary chemical additives to make more stable cyanine discs.
  2. Azo dye CD-Rs are dark blue in color, and their formulation is patented by Mitsubishi Chemicals. Unlike cyanine, azo dyes are chemically stable, and typically rated with a lifetime of decades.
  3. Phthalocyanine dye CD-Rs are usually silver, gold or light green. The patents on pthalocyanine CD-Rs are held by Mitsui and Ciba Specialty Chemicals. These are also chemically stable, and often given a rated lifetime of hundreds of years.

The same article says that

With proper care it is thought that CD-Rs should be readable one thousand times or more and have a shelf life of several hundred years. Unfortunately, some common practices can reduce shelf life to only one or two years. Therefore, it is important to handle and store CD-Rs properly if you wish to read them more than a year or so later.

And this 1995 paper "Lifetime of KODAK Writable CD and Photo CD Media" applies an Arrhenius model to the criterion of "maximum block error rate less than 50", and finds that

That model predicts (at the 95% confidence level) that 95% of properly recorded discs stored at the recommended dark storage condition (25°C, 40% RH) will have a lifetime of greater than 217 years.

It wasn't hard to find this information: these pages were the first and third hits on a Google search for {CD lifetime}.

I'm glad to be warned that low-quality CD-Rs may lose data after a couple of years, and from now on I'll check to see what dyes are used in the CDs I buy. (I checked the ones I've been using, and I think I'm OK.) The E-MELD "School of Best Practices in Digital Language Documentation" mentions this problem in the general context of hardware and software obsolescence, but doesn't make any specific recommentations (that I could find in a quick search, anyhow), except the suggestion to

Place archival copies in a stable online linguistic archive that will:

  • Maintain a constant URL.
  • Migrate data to new formats

Good idea -- the LDC, among other outfits, stands ready to publish significant and well prepared language documentation archives -- but E-MELD ought also to tell language documenters to use CD-Rs with phthalocyanine dyes. And Tech Review should have done so, too, rather than just repeating an apparently incorrect newswire story.

Posted by Mark Liberman at 02:30 PM

A prescriptivist rant? Get a clue

I was astonished to find my musings about why people people call common abbreviations acronyms described at the excellent copy-editor's blog Tongue Tied as a "thoroughly prescriptivist rant". "Pullum pounced", it says, as if I was some wild carnivorous beast; "what's with the complainin' 'n' prescribin' act?", it asked (in the original form of the post, now slightly revised), as if I had howled for vengeance and laid down ukases. I mean, really! I said, very calmly, "It's funny that people get this wrong" (about calling something like FTBSITTTD an acronym), and went on to note what acronyms and abbreviations have in common: they are both what The Cambridge Grammar calls initialisms — words formed anomalously from lists of initial letters. And that is supposed to be a rant? Only to someone who has never seen me rant.

I've noted before that with popular beliefs about language it's all "Everything is correct" versus "nothing is relevant". Two extremes, no sensible middle. Let me say it again, as clearly as I can, in boldface this time: It is not inconsistent for a linguist to note that somebody used a word with a meaning that it does not standardly have. Even a "descriptivist" professor of linguistics like me is perfectly entitled to the the view that comprise means "comprise", i.e., "embrace" or "include"; it doesn't mean "compose" or "jointly make up" (despite a century of evidence of people confusing comprise and compose — and again, it's psycholinguistically interesting that these two words are confused with each other whereas red and green are not). And likewise, acronyms standardly denotes the initialisms that can be pronounced like words rather than lists of letter names (hence the apology offered by Slate magazine for wrongly calling FTBSITTTD an acronym). Why would anyone think that because I'm a linguistic scientist I have to pretend nobody ever misuses any words?

Tongue Tied is quite right, though, to point out that Webster's actually lists "abbreviation" as a second meaning of acronym (after an "also"). The Webster's practice is helpful to dictionary users: it enables a reader to figure out what some people mean when they say "acronym". That's good. And paying attention to me will enable you to understand why "abbreviation" is only given as a secondary meaning, and why The Cambridge Grammar uses its terms the way it does. You need to understand both that "abbreviation" is not the original or standard meaning (the American Heritage Dictionary, Tongue Tied notes, does not give that as a meaning for acronym) and that lots of people believe otherwise.

Posted by Geoffrey K. Pullum at 09:24 AM

January 18, 2006

And still they come

The market for books of word and phrase origins seems to be inexhaustible.  Most of them have no visible scholarship whatsoever, just bald assertion.  And many of the sources they propose are preposterous, or plausible-sounding but clearly wrong.  (There are books that are honorable exceptions, most recently Michael Quinion's Ballyhoo, Buckaroo, and Spuds.) Yet still they come.

The latest of these horrors to come to my attention is Albert Jack's Red Herrings and White Elephants (HarperCollins, 2004).  No references for its claims, and just opening pages at random I found three appalling entries in as many minutes.

Number one: mealy mouthed, Jack claims, derives from Ancient Greek melimuthos 'honey speak'.  There is no mention of meal (as in the OED), instead this elaborate and strained loan-word account.  Well, they sound sort of alike.

Number two: fell swoop Jack takes back to Shakespeare, claiming that once the bard used the word fell in this phrase in the Scottish tragedy, it came to have the meaning 'evil'.  Good grief, even without checking the OED, I knew that fell 'evil' goes back to Old English.  (I checked the OED anyway; memory is a fickle thing.)  I'm guessing that somewhere Jack heard that Shakespeare was involved in the history of the phrase and then just made up the rest of the story.  Shakespeare is in fact involved, as Quinion explains in his entry for one fell swoop: when Macduff learns that his entire family has been murdered, he does indeed cry out, "O hell-kite!  All?  What, all my pretty chickens and their dam at one fell swoop?"  And this would have been understood entirely compositionally by everybody in the audience, as a metaphorical allusion to the evil plummeting of a kite (the bird) as it seizes its prey.  Not a novel meaning of fell at all, but a wonderfully effective image.  And so one fell swoop became a memorable, and quotable, expression.  Unfortunately, fell 'evil' later pretty much dropped out of use in English, leaving this expression marooned as an idiom.  It's a nice little story about language history, much better than the story Jack invented.

Number three: the spill the beans entry provides a charming tale about voting with beans in Ancient Greece.  (Again, the Ancient Greek thing!)  Against this is the OED's assertion that the expression is originally U.S. slang and the fact that the dictionary has no cites for it earlier than 1919 (from an American source, of course).

Enough, enough.  It's a terrible book, by someone who doesn't seem to know how to use dictionaries. We need a new genre category for publications like this: "etymological fantasy", "fantasy etymology", or maybe "fantetymology".

The one customer review on amazon.com -- it's a counterweight to a positive snippet from an editorial review in the Knutsford (Cheshire) Guardian -- is rather more detailed, and even more negative, than mine.  But then I spent only 15 minutes on mine, mostly writing time; my reading time was blessedly brief.  Here is "Syntinen", writing from southeast England:

This book should carry a label saying "Warning - don't assume that any of this is true". In the foreword the author portrays himself as being inspired to write it when sitting in an olde English pub musing on the oddness of English phrases. It reads as though it had been researched in a pub as well; many of the "origins" given are exactly the kind of thing you'd be told by some wiseacre leaning up against the bar. To disprove some of them, such as "keeping danger at bay" and "on the fiddle", wouldn't even take a reference library; you'd only need to look up the words in a good dictionary. One or two of them - such as "dead ringer" - come directly from a famous internet spoof, "Life in the 1500s".

The book is sloppy in every way. Regardless of whether the explanation of a phrase's origin is broadly correct or not, many of the supporting "facts" are wrong; such as the statements that a pig's ear "cannot be eaten or used in any way" - an assertion that would startle peasant cooks from all over Europe - and that pigs are "sacred to Hindus" (!)

It's very odd that some of the "explanations" of phrases in this book don't actually explain them at all. The images evoked by phrases like "flogging a dead horse" or "scratch my back and I'll scratch yours" exactly match what we mean when we say them; the stories in "Red Herrings and White Elephants" actually make much less sense. And yet people seem to prefer the far-fetched stories. Strange.

zwicky at-sign csli period stanford period edu

Posted by Arnold Zwicky at 01:51 PM

January 17, 2006

Nabspam is kinda nice

At least, "if random messages from people I don't know count as nabspam". So says Richard Zach, who has experienced it.

Posted by Mark Liberman at 07:17 AM

January 16, 2006

We, We, Madame

I've been inspired by Dana Milbank's exercise in computational politico-linguistics over at the Washington Post -- "I, I, Sir: The Alito Hearings, Annotated" -- in which it's revealed that

... one thing united lawmakers on both sides: reverence for the first person. Republicans used the "I" word 1,180 times. Democrats used it 1,123 times. Combined, they used it well more than the nominee, who said "I" 1,907 times.

Milbank doesn't tell us whether the rate of uses of the first person singular was significantly more or less than we should have expected from the various parties in such confirmation hearings. But as I asked myself this question, it reminded me of a recently noteworthy first person (plural) pronoun: the errant reflexive "ourselves" that Geoff Pullum cited in President George W. Bush's 1/13/2006 press conference with German Chancellor Angela Merkel. Did an uncharacteristic focus on the diplomatic "we", due to the renewed emphasis on trans-Atlantic identity, lead W into over-reflexivization?

The problematic sentence came up in response to the first question from a reporter, and (according to the White House transcript) was:

The answer to your question is that Guantanamo is a necessary part of protecting the American people, and so long as the war on terror goes on, and so long as there's a threat, we will, inevitably need to hold people that would do ourselves harm in a system that -- in which people will be treated humanely, and in which, ultimately, there is going to be a end, which is a legal system. [emphasis added]

That ourselves should have been us, since the subject of its clause is "people", not "we". Why might the president have spoken as if an extra "we" had snuck into the subect slot? Well, according to the transcript, Bush's opening remarks were 576 words long, and included

26 we
10 our
2 us

for a total of 38 first person plural pronouns, or a remarkable 6.6% of his word count. In particular, it's notable that 4.5% of these 576 words were the subject form "we". If we compare his opening remarks at the most recent three visits of heads of state, we find "we" percentages between 0.4% and 2.3%, or roughly a tenth to a half of the rate in his remarks welcoming Chancellor Merkel.

Specifically: when President Bush welcomed President Saleh of Yemen on 11/05/2005, his 167 words included

1 we
3 our
3 us

for 4.2% 1st plurals, and 0.5% "we". When he welcomed Prime Minister Berlusconi to the White House on 10/31/2005, his 171 words included

4 we
3 our

for 4.1% 1st plural pronouns, and 2.3% "we".

And when he welcomed President Abbas of the Palestinian Authority on 10/20/2005, his 928 words included

4 we
4 our

for a mere 0.9% 1st plural pronouns and 0.4% "we".

President Bush was not alone in focusing on "we" in the session with Chancellor Merkel. The 954 words of Merkel's (translated) opening remarks included

47 we
1 ourselves

for fully 4.9% "we".

It was a veritable festival of we-ity. It pegged the we-meter. So it's not surprising that a stray "we" crept into the empty subject slot of that relative clause.

Sigmund Freud, on the other hand, would have been more impressed by the literal, subversive interpretation: who indeed are those "that would do ourselves harm"? And these days, John McCain might agree with him.

Posted by Mark Liberman at 01:31 PM

The birth of truthiness?

Last week's great truthiness debate is still raging in some corners, despite the fact that both the American Dialect Society and Comedy Central's "The Colbert Report" have probably milked about as much publicity out of the spurious squabble as can be expected. At the heart of the debate is the question of what sort of ownership Stephen Colbert (or rather the truculent on-air persona known as "Stephen Colbert") has over truthiness, the word first popularized on his show and later selected as ADS Word of the Year. Colbert was appalled when the initial Associated Press story on the Word of the Year selection didn't even mention him, instead turning to an ADS member, Michael Adams, for a quick gloss. (The AP's shoddy reporting has led, bizarrely, to Colbert calling the AP the "No. 1 threat facing America"... in an article by the AP.)

Though Colbert vehemently declared that he "pulled that word right out of where the sun don't shine," Adams defended his right to define the word by pointing out (both to Colbert himself and to the AP in its followup article) that truthiness can already be found in the Oxford English Dictionary. Colbert's rejoinder — "you don't look up truthiness in a book, you look it up in your gut" — is unassailably truthy. Nonetheless, we would be failing in our mission as wordanistas if we didn't try digging a little deeper into the roots of truthiness.

Since the OED's lone 1824 citation for truthiness was first noted right here back in October, it's incumbent on us to investigate the source of this earliest known usage. The citation is taken from a book that was actually published in 1854 by Joseph Bevan Braithwaite, entitled Memoirs of Joseph John Gurney, with selections from his journal and correspondence. Gurney (1788-1847) was an English banker who gained renown as a charismatic Quaker minister, traveling to the United States and other countries to preach on behalf of world peace, the abolition of slavery and capital punishment, and abstinence from alcohol. Braithwaite was a disciple of Gurney's evangelism, and he sought to spread his mentor's teachings by presenting Gurney's collected writings posthumously.

Fortunately, both volumes of the memoirs are publicly available from the University of Michigan's Making of America digital library. And it turns out Gurney used truthiness at least twice in his writings. (I haven't found any other pre-Colbert uses of the word in printed materials, though the Usenet archive finds a number of mostly tongue-in-cheek examples in online newsgroups over the past decade.)

The first of Gurney's uses, the one that made it into the OED, describes Amelia Opie (1769-1853), a family friend who, through Gurney's influence, decided to become a Quaker herself:

The chronology here is a bit confusing. The date at the top of the page is 1824, which is what the OED used for its citation. But Gurney is describing the difficulties Opie encountered "when she found herself constrained to make an open profession of Quakerism," which didn't happen until 1825. The chapter where the passage appears actually begins with letters to Opie in 1824, but then Braithwaite injects other material that Gurney wrote about her and her decision to become a Quaker. This particular passage is from "his notice of his long valued friend," which on an earlier page Braithwaite explains is from Gurney's autobiography, a manuscript written in 1837 while he was on a voyage to America. So it looks like the OED got the dating wrong — truthiness is actually 13 years younger than we thought. (Maybe Colbert was right about not trusting reference books!)

Regardless of the exact date of the usage, it's immediately striking to the reader due to its italicization in the text, which suggests that Gurney was emphasizing the unusualness of the word, perhaps in recognition of its nonce status. I don't find any uses of truthy (or other derived forms) elsewhere in the text, so I doubt that this was a term in common use by the Quakers of the era. But certainly the word truth had a particular resonance for Gurney and his fellow Quakers. To this day, Quakers often call themselves "Friends of the Truth" and place great importance on truthful testimony. So for Gurney to trumpet Opie's "truthiness" must have been an innovative form of praise for a recent convert to Quakerism.

The second example of truthiness that I found in Gurney's writings, from a journal entry written in 1844, relates not to a personal quality but to the Scriptures themselves:

Again, the italicization of the word highlights its peculiarity. But here the usage seems positively (dare I say it?) Colbert-esque. Late in life, Gurney learned to take delight in the odd little contradictions found in the Scriptures. But these contradictions only reinforced his faith in the truth, or rather the truthiness, of the biblical text. Without those minor inconsistencies, the Scriptures would lack "genuineness and authenticity." So clearly Gurney was reaching for a concept beyond mundane truth. The Bible is no mere reference book, after all. As I'm sure Mr. Colbert would remind us, no one ever accused the Good Book of being "all fact, no heart."

[Update: Jesse Sheidlower of the OED points out that that the Century Dictionary has an entry for truthiness (marked "rare") with an 1832 citation from Noctes Ambrosianæ:

So assuming this citation holds up to scrutiny, it predates Gurney's (correctly dated) first use by five years.]

Posted by Benjamin Zimmer at 11:57 AM

January 15, 2006

What Whorf would have said

[This is a guest post by Paul Kay, responding to an earlier Language Log post.]

My colleagues and I would like to express our appreciation for the nice things Mark Liberman ("What would Whorf say?" Language Log, Jan 3, 2006), had to say about our study (Gilbert, Aubrey L., Terry Regier, Paul Kay and Richard B. Ivry. "Whorf hypothesis is supported in the right visual field but not the left." PNAS. 103, 489-494, 2006). In that paper, we presented evidence for the Whorf hypothesis operating in the right visual field (RVF) but not the left visual field (LVF). This pattern is suggested by the functional organization of the brain, since the RVF furnishes visual input to the left cerebral hemisphere (LH), and the LH is significantly more dedicated to language processing than the right hemisphere (RH). In studies involving visual search for colors, we found that reaction times to target colors in the RVF were faster when the target and distractor colors had different names than when they had the same name; in contrast, reaction times to targets in the LVF were not affected by the names of the target and distractor colors.

Mark’s post gave an excellent description of our experiments and findings, and sparked some very useful email discussion. However, at the risk of seeming ungracious, we wish to contest one part of his interpretation of our results. Our disagreement arises with the following passage of his post:

One [problem] is that the explanation might have worked just as well if the experiment had come out quite a bit differently. […] Other possible results -- basically anything except a situation in which color category makes no difference, or doesn't interact with visual field -- could similarly be given a Whorfian interpretation. [Italics ours]

We think the Whorf hypothesis makes a more specific prediction than this – a prediction that is confirmed by our findings. It has previously been established that (1) other things being equal, stimuli from distinct lexical (thus, linguistic) categories are discriminated faster than stimuli from the same lexical category – a Whorfian finding, since it apparently stems from language, and (2) as mentioned above, language function tends strongly to be biased to the left hemisphere, to which the RVF projects. Under the Whorfian hypothesis that language affects perceptual discrimination, a straightforward extrapolation from (1) and (2) is that cross-category stimulus pairs should be discriminated more readily than within-category pairs to a greater extent in the RVF than the LVF. Our results confirm this specific prediction. More complicated scenarios leading to different predictions are of course possible, but, we submit, less well motivated.

[This post has benefited from email exchanges between some of us and Mark Liberman.]

Paul Kay, January 14, 2006.

Posted by Mark Liberman at 09:23 AM

January 14, 2006

Forensic linguistics, the Unabomber, and the etymological fallacy

It's been noted here at Language Log that mass-media reporting on linguistic topics very often turns out to be frustratingly simplistic or misleading. But the truth is, it's difficult to get journalists interested in writing about linguistics at all. Despite the success of Steven Pinker in popularizing cognitive linguistics and Deborah Tannen in doing the same for gender-based sociolinguistics, most research by linguists remains resolutely unsexy. (American dialectologists and lexicographers find that the only sure-fire way to get mentioned in the press is to anoint a Word of the Year — and if that selection sparks a phony feud, all the better!)

But one subdiscipline that seems tailor-made for media attention is forensic linguistics, the application of linguistic analysis in legal settings, such as criminal casework. The Washington Times, reporting on the field in its Jan. 12 edition, went with the obvious headline, "CSI: Language analysis unit." Hey, if forensic anthropology can get its own network TV show, why not forensic linguistics?

The article touches on the forensic analysis of academic scholars such as Roger Shuy, as well as work done within the FBI. James R. Fitzgerald, the acting chief of the FBI's Behavioral Analysis Unit-1 and a longtime Bureau analyst, spoke of perhaps the most famous application of forensic linguistics in a U.S. criminal case:

[Fitzgerald] recalls how a transposition of verbs in the manifesto written by the Unabomber helped lead to a closer identification of Ted Kaczynski in April 1996.
The latter used the phrase "You can't eat your cake and have it, too," instead of the usual form, which is "You can't have your cake and eat it, too." Like most people, Mr. Fitzgerald thought Kaczynski had made a mistake. But examination of other letters by him contained a similar feature, which, Mr. Fitzgerald says, "is actually a traditionally middle English way of using the term. He technically had it right and the rest of us had it wrong. It was one of the big clues that allowed us to make the rest of the comparison and submit a report to the judge who signed off on a search warrant."

There are a few problems with this account. First, by focusing strictly on forensic linguistics, the article glosses over the role of David Kaczynski, the brother of the Unabomber. It was David who first made the realization that the appearance of "you can't eat your cake and have it too" in the Unabomber manifesto might be an indication of the writer's true identity. [See Update #3 below.] Fitzgerald has elsewhere discussed how David Kaczynski's call to the FBI set the identification of the Unabomber in motion. Following David's hunch, Fitzgerald's team of agents and analysts made a more systematic comparison of the Manifesto with letters written by Ted Kaczynski to his brother and mother. The idiosyncratic use of the "cake" expression, among other stylistic evidence presented in the FBI's affidavit, was enough to convince a judge to issue a search warrant for Kaczynski's cabin in Montana. (See the abstract from a paper presented by Fitzgerald at the 2001 conference of the International Association of Forensic Linguistics.)

But what of Fitzgerald's assertion that Kaczynski's particular usage of the "cake" phrase is "actually a traditionally middle English way of using the term"? Well, the "eat your cake and have it" ordering is indeed older than "have your cake and eat it," though its first dating of 1562 (in John Heywood's A Dialogue Conteynyng Prouerbes and Epigrammes) only makes it Early Modern English, not Middle English. But beyond that nitpick, Fitzgerald's claim that Kaczynski "technically had it right and the rest of us had it wrong" is a clear variant of the etymological fallacy frequently observed by Arnold Zwicky and others (see here, here, and here). As with "could care less" developing from "couldn't care less," it's often claimed that the historically later idiom is less "logical" and therefore incorrect.

But does "you can't have your cake and eat it" really lack the inherent logicality of "you can't eat your cake and have it"? Only if you consider the ordering of the two conjoined verb phrases to imply sequentiality: you can't eat your cake and then (still) have it, but you can have your cake and then eat it. On the other hand, if the and conjoining the VPs implies simultaneity of action rather than sequentiality, then neither version is more "logical" than the other: cake-eating and cake-having are mutually exclusive activities, regardless of the syntactic ordering.

Fitzgerald seems to suggest that Kaczynski's "correct" use of the idiomatic phrase helped guide FBI profilers into looking for an exacting academic type (rather than a mere raving crank), one who knows how to use the "right" language that ordinary folks get "wrong." But of course it's only "wrong" in the sequential-and (rather than simultaneous-and) way of thinking. According to this article, "eat your cake and have it" was also the ordering that Kaczynski's mother used (probably another reason why his brother spotted it in the Manifesto). If that's the case, then it's understandable why he would have grown up scorning the "have your cake and eat it" ordering, especially if his education at top universities (Harvard and Michigan) reinforced an elitist view of language use. The FBI thought they were looking for a paragon of linguistic propriety, when they were actually just looking for a pedant.

Finally, I should note that the "wrong" version of the expression has been around for 180 years or so, at least in American usage. A search on the American Periodical Series and the Making of America databases finds the have-eat ordering in use from 1827, and firmly established by the mid-19th century:

North American Review, July 1827, p. 116
This may have its advantages, but how will he contrive to live below the common standard and above it at the same time? He cannot both have his cake and eat it.

Tennessee Farmer, Feb. 1837, p. 2
We beg of them to look about the River Towns for Farmers who will join them in getting up a sugar refinery, and in that way falsify the old proverb, which says "you cannot have your cake and eat your cake."

Cincinnati Weekly Herald, Nov. 13, 1844, p. 9
If your Jewish creed be right, you are wrong to deny its manifest deduction. If your Jewish creed be wrong, you are right in wishing to explain it away. But you cannot have your cake and eat it, too.

North American Review, Apr. 1848, p. 371
The reading public cannot have its cake and eat it too, still less can it have the cake which it ate two thousand years ago.

Daguerreotype, May 20, 1848, p. 289
The experiment will end in the discovery that "you cannot have your cake and eat your cake."

Debow's Review, Oct.-Nov. 1848, p. 271
Unfortunately, we cannot have our cake and eat it too.

United States Democratic Review, July 1849, p. 82
It would, indeed, allow the stockholders to "have their cake, and eat it, too."

The earliest example, "He cannot both have his cake and eat it," is helped along by the use of both — which, as Michael Quinion notes, assists the reader with the simultaneous-and reading. Later examples from the 1840s onwards simply append too at the end of the expression to imply simultaneity, and this remains an overwhelmingly common phrasing. But this still doesn't seem to satisfy those who consider the sequential-and reading to be somehow more "logical." In fact, Kaczynski said "you can't eat your cake and have it too" in his manifesto, so the presence of too is apparently not sufficient to establish the simultaneous sense of and for those who are committed to the sequential version.

(I certainly don't mean to tar all sequential-and types with the same brush as Ted Kaczynski. But perhaps he's a cautionary tale for what can happen when narrow-minded pedantry goes unchecked!)

[Update #1: Early American Newspapers supplies an earlier variant with keep-eat rather than have-eat in a verse entitled "Guillotina for 1797," first published on Jan. 1, 1797 in the Connecticut Courant and subsequently appearing in other papers (Chelsea Courier, Jan. 18, 1797; Providence Gazette, Feb. 4, 1797):

Thus greedy boys would gladly treat it,
Could they but keep their cake and eat it.

Here the exigencies of verse dictate the ordering, but this example still establishes that the simultaneous-and reading was already available by the late 18th century.]

[Update #2: Richard Mason takes issue with my assertion that "cake-eating and cake-having are mutually exclusive activities, regardless of the syntactic ordering," noting that one "has" cake during the process of eating it. Though this is technically correct, the "having" part of the idiom seems to me to imply possession over a long period of time, rather than the transient cake-having that occurs during cake-eating. (The 1797 example, interestingly enough, makes this sense more explicit by using keep instead of have.) Ultimately, however, such ruminations over logicality are irrelevant when it comes to the popular usage of crystallized idioms. Few people protest the expression head over heels to mean 'topsy-turvy,' despite the fact that its "literal" reading describes a normal, non-topsy-turvy bodily alignment.]

[Update #3: James R. Fitzgerald sent the following email:

I recently read your posting on "Language Log" regarding my interview with the Washington Times. I want to make a few clarifications.
Firstly, if David Kaczynski did know of his brother Ted's non-standardized usage of the proverb/idiom "you can't eat your cake and have it too," he never provided it to me or my colleagues on the Unabom Task Force in 1995 or 1996, or any other time. He was apparently aware of the term "cool-headed logicians," which was found in the Manifesto, and also known to have been used by Ted, as he told various investigators of its use. But, as valuable as he was to the FBI in providing his brother Ted's information to the Task Force, he never mentioned anything about the "cake" proverb/idiom. As I explained in chapter 14 of the book Profilers, I was the first one to recognize this unusual usage.
Secondly, years ago, upon doing some basic research re. this phrase, I dated the idiom to the Middle English period as, according to the Morris Dictionary of Words and Phrase Origins, it was first found in Heywood's "Proverbs" in 1546, but, "...it had been in circulation for centuries before that...." (1988: p 277). While the Modern English period is generally seen as beginning c. 1500, I felt it safe to say that its etymological roots are firmly planted in the Middle English period.
]

Posted by Benjamin Zimmer at 04:14 PM

More freedom but not more right or more rule

I wasn't as knocked over by the out-of-place reflexive as Geoff Pullum was, but I did notice something in President Bush's news conference yesterday with Angela Merkel. In two places, he used the phrase "rule of law" without an determiner:

We share common values based upon human rights and human decency and rule of law; freedom to worship and freedom to speak, freedom to write what you want to write.

I think the best way for the court system to proceed is through our military tribunals, which is now being adjudicated in our courts of law to determine whether or not this is appropriate path for a country that bases itself on rule of law, to adjudicate those held at Guantanamo.

This is not the first time he's made this choice. For example, on 11/04/2005, in remarks headlined "President Bush Meets with President Kirchner of Argentina", he said:

Argentina and the United States have a lot in common. We both believe in rule of law.

Sometimes, however, he (or his speechwriter) says "the rule of law" in similar contexts:

As two strong, diverse democracies, we share a commitment to the success of multi-ethnic democracy, individual liberty, and the rule of law.

My initial reaction was that the determiner is optional with freedom, but required with right and strongly preferred with rule:

I believe in freedom of speech.
??I believe in rule of law.
* I believe in right to bear arms.

Web search supports the notion that freedom is different from the other two:

  MSN Yahoo Google
believe in the freedom of
8,651
63,300
39,500
believe in freedom of
27,398
154,000
104,000
the/0 ratio
0.32
0.41
0.38
believe in the rule of
8,443
65,400
35,400
believe in rule of
252
343
323
the/0 ratio
35
191
110
believe in the right to
18,417
72,600
45,500
believe in right to
235
179
810
the/0 ratio
78
405
56

However, the specific phrase makes a difference: "right to life" occurs an order of magnitude more often without an article than "right to bear arms" does. In fact, it seems that the article is dropped from "the right to life" much more often, relatively speaking, than from "the rule of law".

  MSN Yahoo Google
believe in the rule of law
7,347
57,200
31,600
believe in rule of law
251
323
285
the/0 ratio
29
177
111
believe in the right to life
877
903
1,190
believe in right to life
101
61
273
the/0 ratio
9
15
4
believe in the right to bear arms
972
973
1,670
believe in right to bear arms
1
3
58
the/0 ratio
972
324
29

"You have right to remain silent" sounds like my Russian grandfather, and indeed the web instances of that phrase that I checked seemed either to be typos or associated with writers from places like Ukraine.

The media sometimes use "rule of law" without an article:

(Salem Statesman Journal) (link) Respect for rule of law and oversight of those in power distinguish our democracy from a dictatorship.

but the article seems to be commoner (except in headlines):

(New York Times) (link) Speaking calmly, if with a continued hint of nervousness, Judge Alito provided no substantive new insights into his judicial philosophy or background as he tried to cast himself as open-minded and dedicated to the proposition that the rule of law should trump personal views and public opinion.

If there were a categorical difference between freedom on one hand, as opposed to rule and right on the other, we could attribute it to the fact that freedom is often used as a mass noun:

I just want some freedom.
They want more freedom.

whereas rule and right are not:

??I just want some rule.
??They want more rule.

However, it's clear that some native speakers think that it's fine to use "rule of law" or "right to life" without an article. The more I read the examples, the more plausible they sound to me as well. I suppose that these phrases are on the way to becoming conventional names for belief systems -- intellectual brand names, so to speak -- so that their use without articles follows the pattern of phrases like "I believe in verificationism" or "we share a commitment to democracy".

One historical note on "rule of law": the OED suggests that this phrase began (and also continues in lawyers' use) as a way of referring to specific legal principles, rather than as a term for the general idea that "no person ..., no matter how high or powerful, is above the law, and no person ... is beneath the law" (as Judge Alito put in in his confirmation hearings).

1765 BLACKSTONE Comm. I. 70 Whenever a standing rule of law..hath been wantonly broke in upon by statutes or new resolutions.

1827 JARMAN Powell's Devises (ed. 3) II. 89 This case was considered to have fixed, beyond controversy, the rule of law upon this subject.

1861 MAINE Anc. Law ii. (1876) 26, I employ the expression ‘Legal Fiction’ to signify any assumption which conceals, or affects to conceal, the fact that a rule of law has undergone alteration.

1994 (title of Act) Sale of Goods (Amendment) Act: An Act to abolish the rule of law relating to the sale of goods in market overt.

[gloss for maxim] 2. a. Law. A proposition (ostensibly) expressing a general rule of law, or of equity

The earliest citation for "the rule of law" in the sense of "political supremacy of an independent legal system" is from 1981:

1981 Times 10 Feb. 9/3 Geriatric judges with 19th century social and political prejudices only bring the rule of law into disrepute.

1988 Representations Autumn 117 Pudd'nhead Wilson translates mob opinion into the rule of law at the conclusion of Twain's novel.

But I expect that Ben Zimmer will find that Horace Greeley, if not Benjamin Franklin, used this term in its contemporary sense.

[Update: Ben writes:

I can't take it back to Greeley or Franklin, but the generalized sense of "rule of law" was definitely in common use by the fall of 1914, shortly after the outbreak of World War I. Both American and British scholars used the phrase in their justifications for war against the Central Powers.

"The Meaning of the War", New York Times, Sep. 23, 1914, p. 8
By David Starr Jordan, Former President of Leland Stanford University.
The invasion of Belgium changed the whole face of affairs. As by a lightning flash the issue was made plain: the issue of the sacredness of law; the rule of the soldier or the rule of the citizen; the rule of fear or the rule of law. ... However devious her diplomacy in the past, Britain stands today for the rule of law.
Reprinted in The New York Times Current History of the European War, Vol. 1

"Oxford Historians Defend England," New York Times, Oct. 14, 1914, p. 4
"The war in which England is now engaged with Germany is fundamentally a war between two different principles — that of raison d'état, and that of the rule of law."
Quoting Why We Are At War: Great Britain's Case

]

[Update: Margaret Marks points out, in a post on her Transblawg, that the OED actually has a specific entry for "rule of law", which I carelessly missed, and which (uncharacteristically) Ben Zimmer didn't catch me on:

c. rule of law: (a) with a and pl. : a valid legal proposition; (b) with the : a doctrine, deriving from theories of natural law, that in order to control the exercise of arbitrary power, the latter must be subordinated to impartial and well-defined principles of law; (c) with the : spec. in English law, the concept that the day-to-day exercise of executive power must conform to general principles as administered by the ordinary courts.

What's at issue here is b and (especially) c; and for b the earliest citation is

  1883 J. E. C. WELLDON tr. Aristotle's Politics iii. §16. 154 The rule of law then..is preferable to the rule of an individual citizen.

while for c the earliest one is

  1885 A. V. DICEY Law of Constitution v. 172 When we say that the supremacy or the rule of law is a characteristic of the English constitution, we generally include under one expression at least three distinct though kindred conceptions. We mean, in the first place, that no man is punishable or can be made to suffer in body or goods except for a distinct breach of law established in the ordinary legal manner before the ordinary courts of the land.

It's embarrassing to have missed this.

In any case, both of these uses are explicitly said to require the.

This leaves me with two questions:

First, since the concept was a prominent one in 18th-century political philosophy, why does the phrase not appear until the late 19th century? What phrases were used instead? Where was Whorf for that century and a half?

Second, when did the phrase start being used without the definite article?

]

Posted by Mark Liberman at 02:00 PM

January 13, 2006

No tattooed acronym

The hugely newsmaking exposé of Oprah-touted memoir-faker James Frey on The Smoking Gun reports that he "wears the tattooed acronym FTBSITTTD (Fuck The Bullshit It's Time To Throw Down)." He does not. No matter what the mendacious and vomit-bespattered self-deaggrandizer may have tattooed on his body, or on what part, FTBSITTTD is not an acronym. It's an abbreviation. It's funny that people get this wrong. What the two terms have in common is that they are composed of the initial letters of a phrase. The difference is whether you can read out the initial letters as if they were a word (as with AIDS, but not TB). Try pronouncing [ftbsitttd] as a word if you like, but if your tongue gets tangled into a knot, don't come complaining to me. Just get your boyfriend to untangle it.

Posted by Geoffrey K. Pullum at 08:17 PM

People that would do ourselves harm

I jumped when I heard it ten minutes ago on NPR's "All Things Considered", and turned to Google immediately to double-check transcripts. And sure enough, during his press conference with German chancellor Angela Merkel, President Bush used a reflexive pronoun with no permissible antecedent — the noun phrase it was co-referring with was not in a position that the grammar allows. I didn't mishear. Reuters has already quoted it:

"Guantanamo is a necessary part of protecting the American people. And so long as the war on terror goes on, and so long as there's a threat, we will inevitably need to hold people that would do ourselves harm."

That's totally ungrammatical, in all dialects. Reflexive pronouns like ourselves must (to put it roughly — there are some codicils) have an antecedent earlier in the same clause, agreeing with it in person, number, and gender. This isn't a subtle usage or style point. It isn 't a matter of dialect variation. Bush really does have a problem with spontaneously uttering sentences that respect the syntax of Standard English. He balks even on short ones. I'm not generally one of the picky-picky "Bushism" collectors, but even I sometimes have to wonder, are our president learning?

Of course, the wires are burning up with people emailing me examples of emphatic reflexives and telling me Bush might have been using one of those. He wasn't. The example cited above isn't a possible context for an emphatic reflexive, and he spoke it with no stress. He just forgot what the subject of his clause was. So did the people who wrote these examples, sent to my by Chris Culy:

So we put our collective heads together and came up with a moniker that does ourselves proud. (http://www.cufsnorth.org/newslett1.htm)

I think labeling them as "evil" is a demonization which does ourselves a disservice by adding a religious bias to our relationship with that country. (http://68.166.163.242/cgi-bin/readart.cgi?ArtNum=7777)

I'm not buying it, Chris: sometimes people write down sentences that just don't cut it grammatically. These are painfully ungrammatical.

What I will grant you, though, is that ourselves does sometimes occur (stressed) with the meaning "us ourselves", as in the example you sent me:

'I said come here, cell-sword! This concerns you as much as it does ourselves', hissed the rogue, his rat-like face pulling itself into a grimace of agitation. (http://www.pulpanddagger.com/pulpmag/dark/cobra1.html)

That's a genuine emphatic (notice, syntax nerds, it's the obligatorily focused final constituent in a pseudogapping construction).

Posted by Geoffrey K. Pullum at 05:32 PM

Colbert immortalized again

Wordanista Michael Adams may no longer be "On Notice" over at the Colbert Report, though AP reporter Heather Clark is still shunned; but now another journalist has taken lexicographical notice of Stephen Colbert, and this time it's not for truthiness. Lane Greene, from economist.com, has nominated Colbert to be cited in the Eggcorn Database for copywrite, copywritten and copywrote.

Here's Lane's letter to Language Log, dated 1/11/2006:

Yesterday's post on the Colbert Report made me watch it last night [this refers to the show of Tuesday, 1/10/2006 - ed.] Not only did he return to "truthiness", but another linguistic item popped up. He noted to Carl Bernstein that "-gate" had become a common scandal suffix, and asked him something like "have you copywritten that?" He referred back to a word he'd used earlier, "sexageterrorists", and then said "I copywrote that." Obviously he meant "copyrighted" in both cases.

IP law aside (you copyright a work, not a single word; you can try to trademark a word), it's an interesting eggcorn candidate. Perhaps Colbert-the-character was engaging in another language joke, but instead it seemed to me that Colbert-the-actor, speaking quickly and without a script, might have actually made the mistake.

If so, he's not alone.

Some forms of "copywritten" might be an inflection of a phrasal verb like "to write copy", but most seem to be inflections of the eggcornic "copywrite". (Looking for "copywritten material", "copywritten music", etc. confirms this.) It's also an interesting candidate because most people wouldn't use "copywrite" in its noun form. I imagine that a lot of doubletakes would accompany a CD that carried the label "Copywrite 1998", but when people go for the participle, "copyrighted" seems wrong and they go for "copywritten".

There's already a citation in the Eggcorn Database for copywrite, entered by Chris Waigl on 2/25/2005, and the nominal form is reasonably common even on respectable journalistic sites -- searching Google News this morning for copywrite turns up 12 hits like this one:

CNN/money 1/7/2006: Sony (Research) CEO Howard Stringer brought out Hanks, the star of its new film the "Da Vinci Code," ... to talk about the importance of copywrite protection.

But Colbert surely deserves special mention, if only for using so many of the principal parts of the verb "copywrite", whether in jest or in earnest. And he shouldn't feel in any way disrespected, since here at Language Log we consider eggcorns to be a poetic form even more compact than the haiku.

Posted by Mark Liberman at 09:04 AM

The truthiness wars rage on

It was round two of Colbert vs. Adams Thursday night.

In our last installment, Comedy Central's mock-newsman Stephen Colbert put Michael Adams of North Carolina State University "on notice" for his quote in an AP article about the selection of truthiness as the American Dialect Society's 2005 Word of the Year. Colbert excoriated the Associated Press for its failure to recognize him as the source for the word, and Adams, who provided a Colbert-free definition for the article, ended up being one of the targets of his righteous indignation.

But Adams had the opportunity to fight back on Thursday's "Colbert Report," in a debate via telephone about the ownership rights to truthiness. While Colbert claimed to have invented the word, Adams pointed out that it already appears in the Oxford English Dictionary (first noted here, with the OED's 1824 citation, back in October — though to be totally truthy, the truthiness of 1824 simply meant 'truthfulness').

Transcript follows. [Update: Video from Comedy Central here.]

You know, by now everyone's aware of the conspiracy against me by the Associated Press. The American Dialect Society named truthiness as the Word of the Year. So far, so good.

But then the AP picked up the story and didn't even call me for a definition. They asked one "Dr." (in air quotes) Michael Adams, visiting associated professor at North Carolina State — which I think may be a made-up school. This "professor" told the AP that truthiness means, quote, "truthy, not facty," earning him a place on my "Notice" board.

Anyway, this Adams guy now claims he can explain himself. And since I am nothing if not generous to those I have crushed, I talked to Dr. Adams by phone earlier today. Here's how it went.

(Beginning of taped interview.)

COLBERT: Hello.

ADAMS: Hello.

COLBERT: Hello, is this Dr. Michael Adams?

ADAMS: Yes it is.

COLBERT: Uh, Dr. Adams, this is Stephen Colbert from "The Colbert Report." Are you familiar with the show?

ADAMS: Ummm, no.

COLBERT: OK, are you the same Dr. Adams who took it upon himself to define truthiness to the Associated Press last week?

ADAMS: Uh, yes I am.

COLBERT: Um, sir, where do you get off defining a word that I made up?

ADAMS: You didn't make it up. It's in the dictionary.

COLBERT: What dictionary?

ADAMS: It's in the Oxford English Dictionary.

COLBERT: OK, stop right there. I pulled that word right out of where the sun don't shine on October 17th.

ADAMS: Umm...

COLBERT: All right, you are aware that you are "on notice" and this phone call's not helping. Do you understand the implications of that?

ADAMS: Um, no, I don't understand the implications.

COLBERT: Well, they're many.

ADAMS: How do I get off notice?

COLBERT: You could apologize.

ADAMS: Apologize for...?

COLBERT: I accept! Thank you! It takes a big man to admit he was wrong.

ADAMS: I didn't apologize.

COLBERT: Too late, I forgive you. Good day!

ADAMS: But...

COLBERT: I said good day, sir!

(End of taped interview.)

You hear that, Associated Press? I am standing by for your formal apology. And that means engraved. Good night, citizens. We'll see you tomorrow.

(Just in case anyone was wondering, that was indeed the voice of Michael Adams, though the nerdish "visual approximation" they used bears no resemblance.)

[Update: Colbert and Adams also face off in a new Associated Press article, this time by entertainment writer Jake Coyle. The OED entry for truthy and its derived form truthiness comes up again, but Colbert counters this lexicographical reproach in expected fashion:

"The fact that they looked it up in a book just shows that they don't get the idea of truthiness at all," Colbert said Thursday. "You don't look up truthiness in a book, you look it up in your gut." ]

Posted by Benjamin Zimmer at 01:04 AM

January 12, 2006

The evolution of "birdflu"

Two headlines from today's Reuters wire...

"Gene tests show birdflu virus is evolving"

"Birdflu spreads, World Bank approves funds"

Is bird flu mutating into birdflu right before our eyes?

(For now the mutation may be limited to isolated cases, such as the Reuters headline writers who have been trying it out since at least July. But look for the compact single-word form to catch on as the virus — or at least hysteria about the virus — continues to spread.)

Posted by Benjamin Zimmer at 11:34 PM

The [sic]ing of the President

In November, when the White House Press Office sought to change transcripts of a briefing by Scott McClellan (who either thought that it was "accurate" or "not accurate" that Karl Rove and Scooter Libby were known to have had conversations about Valerie Plame), liberal bloggers were quick to invoke the usual dystopic Orwellian imagery. Though that suspicious incident has still not been fully explained, I continue to give the White House transcribers the benefit of the doubt, since the official transcripts rarely give the appearance of being "cleaned up," even to correct trivial (but potentially embarrassing) slips of the tongue. Two examples of transcript problems involving President Bush this week put this idea to the test.

One possible case of transcript-cleansing occurred on Monday, on the occasion of Bush's appearance with Judge Samuel Alito before his confirmation hearing. Eric Pfeiffer on Wonkette reported that this sentence appeared in an early transcript emailed to the White House press pool:

Sam Alito is imminently qualified to be a member of the bench.

It took about fifteen minutes for the White House Press Office to catch this and email out a correction informing the press corps that Bush actually said:

Sam Alito is eminently qualified to be a member of the bench.

This version is also what went into the official White House transcript. But the damage had been done, as Pfeiffer and several other bloggers took the opportunity to ridicule Bush's supposed implication that Alito is not quite qualified but should be soon. (News organizations were roughly split on the matter: a Google News search currently finds 31 appearances of "eminently" and 39 appearances of "imminently." This includes two comments on the transcript correction itself, one from Wonkette and one from Townhall.com.)

In the video accompanying the official transcript, one can clearly hear Bush say ['ɪmɪnəntli] rather than ['ɛmɪnəntli]. But how do we know that this was a malapropistic gaffe as the bloggers imply, rather than simply an example of the pin-pen merger? The merger of /ɪ/ and /ɛ/ before nasals, typically with /ɛ/ raising to the position of [ɪ], is a dialectal feature encompassing most of Texas, and Bush identifies himself as a native Texan. (His family moved to Texas when he was two, though it's often claimed that his Texan accent is a relatively new phenomenon and lacks authenticity.) I haven't made an exhaustive study, but I believe Bush frequently exhibits the pin-pen merger (especially when he's in a folksy Texan mode), though the feature is not always evident. When he called Harriet Miers "eminently qualified" (whoops!) on Oct. 4, the audio suggests that he raised the initial vowel to [ɪ]. But when he said it again about Miers on Oct. 12, the word sounded more like ['ɛmɪnəntli]. And on Nov. 6, when he wanted to make it "eminently clear that the United States is a friend of Brazil" the initial vowel again seemed to be in the neighborhood of [ɪ] (though the audio is not entirely clear).

A transcriber lacking the pin-pen merger might misconstrue Bush's pronunciation of "eminently" as ['ɪmɪnəntli], and this appears to be what happened when the initial transcript of Monday's comments was released with "imminently." The later correction by the White House Press Office might not have been remarked upon in another presidency, but since it has become such a sport to lampoon Bush's disfluencies, this simply added more fuel to the fire. Despite the clumsy handling of the correction, I don't think Bush's use of ['ɪmɪnəntli] is necessarily proof of a lexical confusion between "imminently" and "eminently." Of course, such a confusion could be encouraged by the existence of the pin-pen merger, since the two words would become homonymous. But this homonymy also means that we have no way of knowing if a speaker with the merger has the lexical confusion solely based on spoken evidence. (The confusion would be easier to spot in written form, but in that case we'll just have to wait until Bush's presidential papers are released for verification.)

Let's see how the White House transcribers deal with a more obvious slip of the tongue from Bush. On Tuesday, in his address to the Veterans of Foreign Wars, Bush uttered this unfortunate remark:

You took an oath to defend our flag and our freedom, and you kept that oath underseas and under fire.

The sentence as constructed by Bush's speechwriters has multiple parallel structures, such as "took an oath...kept an oath" and "our flag and our freedom." But Bush flubbed the final parallel structure of "overseas and under fire" by overextending the parallelism to "underseas and under fire," thus committing a kind of anticipatory assimilation. Again, Eric Pfeiffer on Wonkette and other bloggers were quick to snicker. (Jacob Weisberg must have been busy, since neither of these has appeared on his list of Bushisms yet.) Fair enough, a clear gaffe. But how does the official transcript read?

You took an oath to defend our flag and our freedom, and you kept that oath underseas [sic] and under fire.

No coverup here. Indeed, the White House transcribers have no problem deploying a well-placed [sic], as in these recent examples from President Bush:

12/4/05: In his capacity to grow and to excel as an artist, Robert Redford has shown very few limitations. In 1980, he decided to try working behind the camera. The result was "Ordinary People," and it won him the Oscar for best actor [sic].

12/7/05: An Iraqi battalion has consumed [sic] control of the former American military base, and our forces are now about 40 minutes outside the city.

1/6/06: I can't imagine a tax code that penalizes marriage. It seems like to me we ought to be encouraging marriage to [sic] our tax code.

In the first example, Bush commits a factual error: Kennedy Center honoree Robert Redford won the Oscar for best director, not best actor in 1980. The next example is an assimilatory slip like "underseas and under fire"; in this case, "assumed control" is transformed into "consumed control." The third [sic] flags a clumsy prepositional usage, since we can assume Bush is in favor of encouraging marriage through, not to, our tax code.

In fact, Bush got [sic]ed twice in the same speech on Monday at a Maryland elementary school, commemorating the fourth anniversary of the signing of the No Child Left Behind Act. From the transcript:

And as I mentioned, there was a lot of non-partisan cooperation -- kind of a rare thing in Washington. But it made sense when it come [sic] to public schools.

Laura and I's [sic] spirits are uplifted any time we go to a school that's working, because we understand the importance of public education in the future of our country.

First, "it come..." is given the [sic] treatment, evidently Bush's latest stumble over agreement in number. (Perhaps Bush was wavering between "came" and "come(s)" since the main clause uses a past-tense construction, "it made sense.")  It would have been easy enough to change the transcript to read "it comes" without anyone noticing, but the transcriber remained meticulous. In the second case, Bush takes a common route for dealing with a coordinate possessive structure in which the last item is a pronoun. English is notoriously vexing when it comes to such structures, and Bush's solution of "Laura and I's spirits" may actually be a slight improvement for some speakers over the putatively standard but no less awkward "Laura's and my spirits." Nonetheless, it too gets the [sic].

Could it be that the transcriber is making a point of correcting Bush, particularly in a speech about education? (During the speech Bush made one of his usual self-effacing remarks about his own disfluencies: "I can remember [Laura] reading to our little girls all the time. Occasionally, I did, too, but stumbled over a few of the words and might have confused them.") One blogger seemed to think so, commenting, "Hell, even the transcript guy is marking Bush down."

But President Bush isn't the only one who gets [sic]ed by the White House transcription team. For the month of December, I found two [sic]s for Vice President Cheney and three for Scott McClellan:

Cheney, 12/6/05: One unit of the 40 (sic) I.D., the "Fighting 69th" from New York City, showed its toughness in confronting insurgents around Baghdad.

Cheney, 12/20/05: I don't believe for a minute that the vast majority of Americans are prepared to accept defeat, to retreat in the face of terror, to turn over Iran (sic) or Afghanistan to the likes of Osama bin Laden.

McClellan, 12/12/05: And it's important that all of us, not only in the coalition, but the entire international community and the Arab world, stand behind the Iraqi people during this time of transition to a peaceful and democratic future, because the Iraqi people have shown through their courage and determination that they want a freedom of future [sic].

McClellan, 12/14/05: Then the members were able to hear from Ambassador Khalilzad, who was on with General Casey from Baghdad, video conference. And General Khalilzad [sic] gave an update on the elections and talked about how there are more than 300 political parties that are participating in the elections.

McClellan, 12/16/05:  I think these are difficult issues that you have to address in a post-September 11th world. Some people go back to a post-9/11 [sic] mind-set now that we're four years after the attacks of September 11th.

Most of these misspeaks appear to be factual errors (besides McClellan's odd invocation of "a freedom of future") and thus are obvious candidates for [sic]ing. By contrast, Bush gets [sic]ed not just on errors of fact but also on seemingly minor grammatical lapses. All in all, the transcribers at the White House seem at pains to demonstrate that they are not, in fact, sanitizing any potential embarrassments in the public comments of Bush and other officials. This is good to know, especially for those of us in the reality-based community.

[Update #1: The author of the blog Whatever It Is, I'm Against It (noted above) writes in with more on White House [sic]ing:

I've been sporadically noting the whitehouse.gov [sic] phenomenon, which I personally attribute to the very understandable annoyance of someone assigned to transcribe the speeches of George W. Bush. They're payback sics. More:

"Our journey from national independence to equal injustice [sic] included the enslavement of millions, and a four-year civil war." (blog link)

"armies of compassions [sic]" & inspectors generals [sic] (blog link)

"An Iraqi battalion has consumed [sic] control of the former American military base" (blog link)

I'm not so sure I buy that these are "payback sics" from a resentful transcriber, but I guess you never know.]

[Update #2: Jacob Weisberg finally got around to the "underseas" Bushism, more than a week late.]

Posted by Benjamin Zimmer at 01:20 AM

January 11, 2006

Trying to talk alike and not succeeding

There were a lot of great talks at the LSA annual meeting in Albuquerque, and I wish I had the time to tell you about them. But for now, I'll dash off a note on one presentation, because it included a quote that caught my eye and my inward ear.

Alexandra Jaffe spoke on the topic "Transcription in Sociolinguistics: Nonstandard Orthography, Variation and Discourse". She started with her own work on the "polynomic" orthography of Corsican, where "variation in spelling is understood to be a systematic representation of coherent linguistic systems (regional dialects of Corsican)". In contrast, she observed, we Americans most often use respelling to index stigmatized dialects. This effect is especially striking when the respelling represents ubiquitous, pan-dialectal pronunciations, like "wuz" for was, "hist'ry" for history, or "subjecks" for subjects.

Jaffe described an experiment by Jennifer Nguyen that brought this out clearly ("Transcription as Methodology: Using Transcription Tasks to Assess Language Attitudes", NWAV 32). Jaffe's summary:

Novice transcribers in Michigan listened to two speakers with accents that were different from their own: one stigmatized (Appalachian English) and one not (British English). They were given instructions to transcribe them in such a way that anyone reading their transcriptions would “get the same impression of the speaker that the participants got listening to the samples" and were told that they could represent speakers in any way they wanted, that dictionary spellings were not required. Nguyen found that the percentage of respellings in these novice transcripts was significantly higher for Appalachian vs. British English...

The quote that caught my attention in Jaffe's handout was a passage written by a Glaswegian poet, Tom Leonard:

Yi write doon a wurd, nyi sayti yirsell, that's no thi way a say it. Nif yi tryti write it doon thi way yi say it, yi end up wit hi page covered in letters stuck thigither, nwee dots above hof thi letters, in fact yi end up wi wanna they thingz yi needti huv took a course in phonetics ti be able ti read. But that's no thi way a think, as if ad took a course in phonetics. A doan't mean that emdy that's done phonetics canny think right—it's no a questiona right or wrong. But ifyi write down "doon" wan minute, nwrite doon "down" thi nixt, people say yir beein inconsistent. But ifyi sayti sumdy, "Whaira yi afti?" nthey say, "Whut?" nyou say "Where are you off to?" they don't say, "That's no whutyi said thi furst time." They'll probably say sumhm like, "Doon thi road!" anif you say, "What?" they usually say "Down the road!" the second time—though no always. Course, they never really say, "Doon thi road" or "Down the road!" at all. Least, they never say it the way it's spelt. Coz it izny spelt, when they say it, is it?

[quoted in Ronald Macaulay, "Coz it izny spelt when they say it: Displaying dialect in writing". American Speech 6(3): 280-291.]

In fact, I think there's an important sense in which Leonard's last point is wrong. Because human speech has what Hockett called "duality of patterning", it's fair to say that it is spelt, when they say it. Maybe not spelt spelt, but still, in some sense, spelt...

Jaffe missed the chance to cite Mark Twain's well-known "Explanatory" from the start of Huckleberry Finn, where he takes a much more positive and self-confident line on respellings.

In this book a number of dialects are used, to wit: the Missouri negro dialect; the extremest form of the backwoods Southwestern dialect; the ordinary "Pike County" dialect; and four modified varieties of this last. The shadings have not been done in a haphazard fashion, or by guesswork; but painstakingly, and with the trustworthy guidance and support of personal familiarity with these several forms of speech.

I make this explanation for the reason that without it many readers would suppose that all these characters were trying to talk alike and not succeeding.

It's interesting to see how Twain uses such normal and invariable respellings as "wuz", which appear to do nothing more than represent the ubiquitous pronunciation of words whose spelling is phonetically irregular. He mostly reserves "wuz" for Jim, representing the "Missouri negro dialect" -- though sometimes he has Jim say "'uz" for was, even in the same sentence as "wuz":

But looky here, Huck, who wuz it dat 'uz killed in dat shanty ef it warn't you?

But he also gives "wuz" to Old Mrs. Hotchkiss, who (I guess) repesents the "extremest form of the backwoods Southwestern dialect":

Don't tell me, s'I; there wuz help, s'I; 'n' ther' wuz a plenty help, too, s'I ...

Twain distinguishes Jim's rendition of and as "en" from Mrs. Hotchkiss' rendition of the same word as "'n'". I wonder whether (and especially how) this really reflects the sociolinguistic facts of the time? Anyhow, his usage supports the idea that this sort of respelling is used to index stigmatized dialects of various sorts. However, it also underlines the fact that this connection is by now a highly conventionalized one, not something that is invented anew by each transcriber.

Time was that alternative spellings in English meant -- as far as I can tell -- absolutely nothing at all. According to the Textbase of Early Tudor English, John Skelton (1460-1529), "poete laureate in the unyversite of Oxenforde" and also poet-laureate to Henry VIII, spelled should in his poems as "shold", "sholde", "should", "shoulde", "shuld", "shulde", and "xuld". In the first of his poems in the LION database, "An Elegy on Henry Fourth Earl of Northumberland", Skelton uses two of these spellings in one line:

41 What shuld I flatter? what shulde I glose or paynt?

and at least one other spelling a few lines later:

67 To the right of his prince which shold not be withstand;

He manages to spell one three-word phrase in two completely different ways, within the space of 48 lines:

130 Of this lordis dethe and of his murdrynge.

178 Thys lords death, whose pere is hard to fynd

Proper names are not spared:

43 In Englande and Fraunce, which gretly was redouted;

179 Allgyf Englond and Fraunce were thorow saught.

I wonder, in which contemporary orthographies is this sort of catch-as-catch-can spelling used? One that I've encountered personally is Somali; but in that case, the orthography is only a few decades old, and the educational system that promulgated it has been defunct for much of that time.

[Update: Gene Buckley points out that the spelling "boyz" has become a conventional orthographic index of AAVE, although the voicing of plural /s/ after vowels has been normal in most variants of English for centuries.

And Ben Zimmer wonders whether "wuz" might actually have meant something about sound, in Twain's time:

I've often wondered whether Twain's "wuz" is properly understood as eye-dialect (i.e., a mere respelling indexical of the quoted speaker's low status, education, etc.) or as a pronunciation spelling indicating a real dialectal difference. It's possible it could have been the latter when used by Twain or other keen-eared 19th c. writers if, for instance, "was" had a standard pronunciation with an open back rounded vowel (IPA turned script-a, as in the British pronunciation given by the OED), while "wuz" represented a once-nonstandard (now standard) Amer. pronunciation with an open mid back unrounded vowel (IPA wedge). I don't have any evidence for this shift in the pronunciation of "was", but it's something to consider.

There's one small and indirect piece of support for this view in the quotes that I gave. At least judging from contemporary BBC pronunciations, the vowel in was will in any case be reduced to a schwa/wedge sort of quality except where the word is emphasized ("she *was* there") or phrase-final ("so it was"). In the quote from Miss Hotchkiss, Twain the first "wuz" is given emphasis with italics, as well as by the sense of the passage. And in the quote from Jim that I happened to pick, the was spelled "wuz" is arguably emphasized, while the one spelled "'uz" is reduced. However, there are plenty of cases where "wuz" is used to render Jim's speech with no basis for assuming any emphasis, e.g.

Well, I wuz dah all night. Dey wuz somebody roun' all de time.

And Huck's narrative voice never uses "wuz", although he shows other non-standard features ("There was things which he stretched, but mainly he told the truth") and other eye dialect spellings like "di'monds" and "s'pose". Nor is it used in the quoted speech of Tom Sawyer ("Well, Ben Rogers, if I was as ignorant as you I wouldn't let on"), though again Tom is rendered with some spellings like "A-rabs". Likewise Huck's father is given plenty of indices of non-standard speech, like "afeard", "ain't" and double negatives, but all of his examples of was are spelled "was" ("There's a hand that was the hand of a hog; but it ain't so no more; it's the hand of a man that's started in on a new life, and'll die before he'll go back.")

Anyhow, I guess it's possible that in Twain's youth, "the ordinary Pike County dialect" and its "four modified variants" all had [wɒz] or [wʊz], while the "Missouri negro dialect" had [wʌz] or [wəz]. I'll ask someone who knows about the history of American speech patterns.

]

Posted by Mark Liberman at 04:18 PM

Stupid machine-generated spiritual blather

The site established by the Devi Press (find them if you want at http://www.devipress.com/, but I am damned if I am going to give them a link) for the purpose of advertising its books on Christian, gnostic, and mystical topics has a set of pages containing 1,185 (one thousand, one hundred eighty-five) articles on religious topics, each with an accompanying link to a page advertising a book called The Mystic Christ. The article titles, indexed in ASCII order, run from "1 John God Is Love" and "A Love Sent From God Above" down to "Youth Group Devotions" and "Zohar Kabbalah". And a typical piece of prose from one of them looks like this:

Abounding opposites present a few pages but he was carried out of members who went to our conversation ends where it really fulminating on ecumenically united states submitted as well i daniel had died of morality and a final analysis be fighting with which are by subject browse for an unbearable. The fourth year of heaven on a thousand strong and he had the christians about. We will of your policy who are his entrance into sticking it was called the only your sects so that will take the whole world or unpopular they do you?

That's right. Every single article was generated by an extremely crude random text-generation algorithm. (Fantastically crude. Computational linguists can do a lot better than this. Heck, a trained trunk monkey could do better.) The articles even bear a notice saying (lest the program actually write something intelligible) "DISCLAIMER: The text for this article was generated automatically by a computer. As such, nothing in this article should be construed as a statement of fact or as the opinion of the maintainers of this site." And each article has ten links to others on the list. The entire fraudulent assemblage is just an exercise in Google-bombing: Devi Press is trying to raise its Google ranking by having more than a thousand pages that link to ads for its crappy books, each of those pages being the target of links by multiple other articles.

Would you like me to begin my rant now?

<RANT>What I'm objecting to is not that this crap is religious drivel. It's that it's dishonest drivel. It's an illicit attempt to get advertising space (in the form of appearances on Google search results lists) that other people ultimately pay for. It's like having thousands of huge styrofoam cubes with your company's name on them delivered to a public landfill so that others will see them (only that doesn't happen because neither styrofoam nor landfill space is free).

The poor Google corporation is buying new CPUs and disk drives every day as it tries to keep up with the growth of the web, and every byte of this asyntactic Christian-gnostic-mystical garbage, this useless verbal waste, has to be stored in some huge refrigerated data barn somewhere and indexed and searched every single day. Every legitimate site and every genuine shopper (and — declaration of interest — every honest syntactician trying to explore language using Google as a corpus) is being slowed down (at least a tiny bit), and sometimes baffled and misled, by the totally fake pseudo-text these venal morons are stashing on their server for the sole purpose of masking the fact that nobody is very interested in their boring useless crappy books, and it makes me mad, OK? As Stephen Colbert would say, Devi Press, you're dead to me. You're on my Dead To Me board. All right, I'm done.</RANT>

Posted by Geoffrey K. Pullum at 10:00 AM

January 10, 2006

Colbert fights for truthiness

On Friday the American Dialect Society chose as its 2005 Word of the Year Stephen Colbert's sublimely silly neologism truthiness. In a post submitted that night from the ADS/LSA meetings in Albuquerque, I surmised that the initial Associated Press coverage of the voting, which didn't even mention Colbert, would "serve as more fodder for Colbert's put-upon persona of perpetual outrage."

Well, "The Colbert Report" returned to Comedy Central from an extended break on Monday night, and sure enough Colbert was in high (faux) dudgeon. At the end of the show he called out not only AP reporter Heather Clark, but also wordanista Michael Adams (author of the excellent Slayer Slang), who happened to be the ADS member that Clark buttonholed for a quick definition of truthiness. Colbert even dug up Adams' academic title and course information at North Carolina State University, in homage to the over-the-top ad hominem attacks perfected by the likes of Bill O'Reilly. At Language Log Plaza, our hearts go out to Adams, the blameless victim of a pseudo-anchor's pseudo-wrath.

A transcript of the segment follows. [Update: A video clip is now available from Comedy Central. It can also be viewed here.]

Before we go, I want to say something about the first "Word" from the first ever broadcast of this show. Jimmy, roll the tape.

(Video from first show: "Truthiness. Now I'm sure some of the Word Police, the wordanistas over at Webster's, are gonna say, 'Hey, that's not a word.'")

Turns out I underestimated those wordanistas. On Friday the American Dialect Society chose truthiness as the 2005 Word of the Year (applause), beating words like podcast and Katrinagate. We kicked their asses. And I've never been so honored and insulted at the same time.

You see the Associated Press article announcing this prestigious award, written by one Heather Clark, had a glaring omission: me. I'm not mentioned, despite the fact that truthiness is a word I pulled right out of my keister. Instead of coming to me, here's where Ms. Clark got the definition.

Quote: Michael Adams, a professor at North Carolina State University who specializes in lexicology, said (subquote) "truthiness" means "truthy, not facty."

First of all, I looked him up. He's not a professor, he's a visiting associate professor. And second, it means a lot more than that, Michael. I don't know what you're getting taught over there in English 201 and 324 over at Tompkins Hall, Wolfpack. But it isn't truthiness.

You know what? Bring out the board, bring out the board. (Stagehand brings out the "On Notice" board, with entries including "Black hole at center of galaxy," "E Street Band," "grizzly bears," "Bob Woodruff," "the Toronto Raptors," "The British Empire," "business casual," and "Barbara Streisand.")

Visiting associate professor Michael Adams: you, sir, are on notice. OK, somebody's gotta go. E Street Band, this is your lucky day. (Colbert pulls out card for "E Street Band," replaces it with "Michael Adams.")

OK, there it is. Deal with it.

But the real culprit here is so-called reporter Heather Clark. This is her sleaziest piece of yellow journalism since "New Mexico Poll Watchers See Smooth Election Day." Now I already tore her a new one for that. Heather Clark, you are dead to me.

Let's bring out the board. (Stagehand brings out "Dead To Me" board, with the entries "CNN en Español," "cast of Friends," "owls," "screw-cap wines," "bowtie pasta," "California's 50th district," "New York intellectuals," and "men with beards.")

Get ready, Heather. Get ready, brace yourself. (Colbert adds card for "Heather Clark" to the board.) How does that feel? Does that sting? Now that you're dead to me, you're gonna wish you were never born.

I'm sorry you had to see that, nation. But in the interest of truthiness, it had to be done. Good night.

[Update #1: Adams has been enshrined on the Wikipedia page for "The Colbert Report," in a section now moved to the rapidly expanding entry for truthiness.]

[Update #2: Adam Green of the Huffington Post suggests that defenders of truthiness should ask Heather Clark to correct the record, even supplying her email address. To be fair, she did file a later wire story that credited Colbert, albeit indirectly. (Yet another iteration of Clark's story gives Colbert direct credit.)]

[Update #3: Steve Kleinedler recommends this column for anyone who is still puzzled by the concept of truthiness.]

[Update #4: Colbert and Adams went mano a mano on the Jan. 12 show.]

Posted by Benjamin Zimmer at 12:32 AM

January 09, 2006

Nias, Komodo, and "Kong"

I have yet to find three hours to devote to Peter Jackson's remake of King Kong, but I did catch the original 100-minute version on Turner Classic Movies over the holidays. I hadn't seen it in its entirety since I was a kid, but now I can see why Jackson has said it was the movie that inspired him to become a filmmaker. It's an extremely appealing adventure tale, despite the now-quaint special effects, occasionally clunky storytelling, and typical Hollywood exoticization of "primitive" lands.

Since one of my areas of research is Indonesia, my ears perked up when Carl Denham, the leader of the expedition, shows Captain Englehorn their destination on a chart, saying it is "way west of Sumatra." Englehorn then tells Denham, "I know the East Indies like my own hand, but I was never here." My interest was further piqued by the captain's early suspicion that "Kong" was "some Malay superstition, a god or a spirit or something." When they finally arrive at Skull Island, Englehorn says the speech of the natives "sounds something like the language the Nias Islanders speak."

Nias is an island off the west coast of northern Sumatra, most recently in the news for the heartbreaking devastation wrought by the one-two punch of the Dec. 2004 tsunami and the less-reported earthquake of Mar. 2005. The first language of most of the island's estimated 600,000 inhabitants is also called Nias (known locally as "Li Niha") and is related to the Batak languages of northern Sumatra and more distantly to Malay and other languages on the Sundic branch of the Austronesian family tree.

The film's depiction of the Skull Islanders is notoriously racist, with mostly African-American actors enlisted to prance around like generic savages, but I thought the specific references to Sumatra and Nias could mean that their linguistic interaction with Captain Englehorn might carry a shred of verisimilitude. From what I could catch, there was only the tiniest shred. When the chief makes an offer to trade six of his women for Ann Darrow (as a "gift for Kong"), Englehorn declines by saying "Tida, tida!" That seems to be modeled on Malay-Indonesian tidak /tidaʔ/, meaning 'no, not.' Also, when Englehorn buys time by telling the chief that they'll come back tomorrow, he says "dulu," which in Malay can mean 'for the time being' (as in tunggu dulu /tuŋgu dulu/ 'wait for now'). Other than that, nothing in the exchange between the chief and Englehorn sounds much like Malay or related languages.

But should we expect the dialogue to be anything but gibberish? A recent article by Kenneth Turan in the Los Angeles Times looking back on the original Kong suggests otherwise:

To understand the 1933 version's success, you have to start with how close two of its key characters, director Denham (the irresistibly intense Robert Armstrong) and cameraman Jack Driscoll (Bruce Cabot), were to producer-directors [Merrian C.] Cooper and [Ernest B.] Schoedsack. In fact, as related in Orville Goldner and George E. Turner's "The Making of King Kong," when Cooper hired his wife, tyro writer Ruth Rose, to do the final polish on the "Kong" script, he told her flatly, "Put us in it ... Give it the spirit of a real Cooper-Schoedsack expedition."

For with Cooper as the driving visionary and Schoedsack as the unflappable director-cameraman, these two were adventurers before they were filmmakers. As related in a new Cooper biography, "Living Dangerously" by Mark Cotta Vaz, the two had made a pair of successful ethnographic documentaries in faraway places — "Grass" in what was then Persia, "Chang" in Siam — that fully lived up to Cooper's celebrated determination to keep his films "distant, difficult and dangerous."

In fact, when Denham complains that critics are always bemoaning the lack of a love interest in his films, he's echoing what was actually said about the Cooper-Schoedsack films. And the language Rose created for the natives of Skull Island was based on the idiom of the Nias Islanders, near Sumatra, whom she and Cooper had visited. Fearful that disguised indecent language might sneak on-screen, the Production Code Administration reportedly insisted on a translation of all Skull Island dialogue before giving the film its approval.

(One correction to Turan's article: Ruth Rose was Schoedsack's wife, not Cooper's.)

I thought I'd look for this supposedly Nias-based dialogue online, and I found what purports to be a draft of the screenplay on Val Lewton's Whiskey Loose Tongue website. Sure enough, the Skullese dialogue is provided with "translations," presumably for the skittish Production Code Administration. A sampling:

Chief:
Bado! Maka mini tau ansaro.

(Wait! Two warriors come with me.)

Watu! Tama di? Tama di?

(Stop! Who are you? Who are you?)
Englehorn: Tabe! Bala kum nono hi. Bala! Bala!

(Greeting! We are your friends. Friends! Friends!)
Chief: Bala reri! Tasko! Tasko!

(We don't want friends, Go! Get out!)
Englehorn: Vana di humya? Malem ani humya vana?

(What are you doing? What is that woman doing?)
Chief: Ani saba Kong!

(She is the bride of Kong!)

Sita! Malem! Malem ma pakeno!

(Look! The woman! The woman of gold!)

Malem ma pakeno! Kong wa bisa! Kow bisa para Kong!

(The woman of gold! Kong's gift! A gift for Kong.)

Dama, tebo malem na hi?

(Strangers, sell woman to us?)

Sani sita malem ati - kow dia malem ma pakeno.

(I will give six women like this for your woman of gold.)
Englehorn: Tida, tida! Malem ati rota na hi.

(No, no! Our woman stays with us.)

Dulu hi tego. Bala. Dulu.

(Tomorrow we come. Friends. Tomorrow.)

Most of the dialogue and "translation" accords with the 1932 novelization of the script adapted by Delos Lovelace (searchable on Amazon).

The reader is welcome to search for any vague correspondences between the screenplay and this Nias word list prepared by Robert Blust for the Austronesian Basic Vocabulary Database. (Those familiar with Indonesian can also consult the Nias-Indonesian dictionary maintained here.) Suffice to say, whatever Ruth Rose Schoedsack used as the basis for Skullese, it surely wasn't Nias or any other related language. The word for "woman" in Nias is a-lawe, not malem; "who" is ha, not tama; "you (plural)" is yaʔami, not di; "we (exclusive)" is yaʔaga, not hi; "come" is möi, not tego. The Nias-Indonesian dictionary supplies some more examples: "six" is önö, not dia; "gold" is anaʔa, not pakeno; "tomorrow" is mahemolu, not dulu (hey, at least the final syllable for that one is right!).

I picked up the new biography of Merrian C. Cooper, Living Dangerously by Mark Cotta Vaz, to see if there was any mention of Cooper or the Schoedsacks going to Nias Island. There is a brief account of Cooper passing through the Toba Batak region of Sumatra on a round-the-world expedition with explorer Edward Salisbury, and another part describes the Schoedsacks' trip to Aceh on Sumatra's northernmost tip to shoot the orangutan movie Rango. But the only time Nias comes up is later in the book when stop-motion animator and hardcore Kong fan Ray Harryhausen recalls going to the island with his wife in search of the model for Skull Island:

Well, Nias Island actually exists, although they're not black but Asian people, and we thought we'd go there. We arrived early in the morning, in the fog, but there was no skull, no ancient wall. I stepped out on the pier and there was this native guy and I thought I'd try out Ruth Rose's language and I said, "Bala, bala Kong nna hee." And the native put his hands on his hips and said, "What are you talking about?" (Vaz, p. 407)

It turns out another Indonesian island probably had more of an influence on the making of Kong: Komodo, one of the Lesser Sunda Islands (which also include Flores, Sumba, and Timor). When Cooper was first formulating Kong in 1929-30, he contacted another adventurer named W. Douglas Burden. In 1926 Burden had led an expedition sponsored by the American Museum of Natural History to bring the first live Komodo dragons to the West. The following year Burden wrote a book about his expedition, Dragon Lizards of Komodo, describing how he and herpetologist F.J. Defosse were enchanted by the "lost world" of Komodo. Before leaving, Defosse told Burden, "I would like to bring my whole family and settle here, and be King of Komodo."

Cooper was inspired by Burden's story of "primeval monsters" on a faraway island and his description of how the creatures' spirits were broken once they were taken back to New York in captivity. (The two live Komodo dragons were brought to the Bronx Zoo and quickly died there.) Here is an excerpt from a 1964 letter from Burden to Cooper reminiscing about their conversations:

I remember, for example, that you were quite intrigued by my description of prehistoric Komodo Island and the dragon lizards that inhabited it. ... You especially liked the strength of words beginning with 'K,' such as Kodak, Kodiak Island, and Komodo. It was then, I believe, that you came up with the idea of Kong as a possible title for a gorilla picture. I told you that I liked very much the ring of the word...and I believe that it was a combination of the King of Komodo phrase in my book and your invention of the name Kong that led to the title you used much later on, King Kong. (Vaz, p. 193)

In response to Burden's letter, Cooper wrote, "Everything you say is right on the nose." He did add that he conceived of a "Giant Gorilla" story before reading Dragon Lizards of Komodo, which reminded Cooper of his own expedition to the Andaman Islands and the giant lizards he saw there. But at least we know where the K in Kong came from!

Posted by Benjamin Zimmer at 01:14 AM

January 08, 2006

New swords for old

There's an article by Anne Kornblut and Glen Justice in this morning's NYT about the Alexander Strategy group: "Officials Focus on a 2nd Firm Tied to DeLay". It ends with this quote

"It's a double-edged sword, being known as DeLay Inc.," said one Republican lobbyist. "They are on the sharp edge of the sword now."

Swords are not part of Americans' everyday experience these days, and so the sword-related metaphors that we've inherited from earlier times are open for creative re-interpretation. A double-edged blade is traditionally one that has two cutting edges, and being sharp on both sides, can "cut both ways". This can make a double-edged sword or knife dangerous to its user. Instead, the anonymous lobbyist has taken the expression to refer the now-standard kind of blade, with one sharp edge and one dull one, and rekeyed the metaphor to the contrast between two different sides, not the symmetry of two similar ones. The new interpretation is like the familiar use of "two-sided coin", where the whole point is that the two sides are different -- see this article headlined "the two-sided coin of PA credentialing" for an example.

We've previously noted that this sort of thing has been happening to terminology associated with the harnessing of animals: "reigns of power", "unbrided fury", "yolked to the coloniser". In those cases, though, the result was an eggcorn; here's it's just a new interpretation of an old expression. And unlike the various new interpretations of "beg the question", this new interpretation happens to mean essentially the same thing as the old one .

Or does it? Well, that depends on what means means, I guess.

Posted by Mark Liberman at 08:46 AM

January 06, 2006

The wordanistas have spoken

Back in October, when Comedy Central's Stephen Colbert kicked off his faux-news show The Colbert Report, he promoted a new word that nailed the malleability of "truth" in today's mediascape. His word was truthiness, and he used his blustery O'Reillyesque persona to launch a preemptive strike against naysaying "wordanistas":

Now I'm sure some of the Word Police, the wordanistas over at Webster's, are gonna say, "Hey, that's not a word." Well, anybody who knows me knows that I'm no fan of dictionaries or reference books. They're elitist. Constantly telling us what is or isn't true, or what did or didn't happen. Who's Britannica to tell me the Panama Canal was finished in 1914? If I wanna say it happened in 1941, that's my right. I don't trust books. They're all fact, no heart.
[Video here, here, and here.]

Well, the wordanistas have heeded his call. Earlier today, the American Dialect Society selected truthiness as its 2005 Word of the Year.

Truthiness edged out Katrina (and Katrina-related words) in the annual voting, with other nominees such as podcast, intelligent design, and refugee trailing behind. (The full results, with the voting in other categories, is available in PDF form here.) For the ADS voters, Colbert's creation just seemed to capture a certain ineffable zeitgeistiness.

Fittingly, truthiness has circulated in the media with only a tenuous connection to those pesky "facts." As we noted here, the New York Times rendered the word as trustiness, apparently due to an errant spellchecker. (The redfaced Times not only issued a correction but also elevated truthiness to a place on its list of year-defining buzzwords.) Now that the ADS has coronated it as Word of the Year, the media coverage has once again come up short. The Associated Press article, published around the nation and indeed the world (in the Washington Post, the Los Angeles Times, the UK's Guardian, Australia's Age, etc.), doesn't even mention the genesis of the word on Colbert's show.

Ah well. Perhaps this will serve as more fodder for Colbert's put-upon persona of perpetual outrage.

[Update, 1/6/06: A later and longer version of the Associated Press wire story gives the background on Colbert (though it implies that his show is still part of "The Daily Show with John Stewart") and adds some other new details. But I suspect the incomplete article that first hit the wires will be the one picked up by most papers.]

[Update, 1/10/06: On his Jan. 9 show, Colbert responded with all the phony indignation he could muster. Details here.]

[Update 1/13/06: The truthiness battle continues.]

Posted by Benjamin Zimmer at 11:43 PM

January 05, 2006

Shakespeare used they with singular antecedents so there

Not happy that I cite Sean Lennon as a source of evidence concerning the way they can be used in modern English? Feeling that only something 400 years older would really convince you that it's OK. Has Coby Lubliner got news for you! Coby writes from Berkeley to point out the following lines from Shakespeare's A Comedy of Errors, Act IV, Scene 3:

There's not a man I meet but doth salute me
As if I were their well-acquainted friend

It's not just a case of they with singular antecedent; like Lennon's example, it uses they despite the fact that the sex of the antecedent's referent (male) is known! And there's more.

Marilyn Martin writes from Cornell to say that she's O.K. with normally, but this example was a bit more than she could take ("somehow bothers me", she wrote):

UK scientists have identified the part of the brain that determines whether a person perceives themselves as fat. (BBC News, Tuesday, 29 November 2005, 11:52 GMT)

What she doesn't like, I'm quite sure, is that the reflexive form themselves is morphologically marked as plural (self / selves), yet still it is used with singular antecedent. Don't flinch, Marilyn! Look at this example of Shakespeare's (from the poem The Rape of Lucrece):

Now leaden slumber with life's strength doth fight;
And every one to rest themselves betake,
Save thieves, and cares, and troubled minds, that wake.

So even the reflexive form of the pronoun lexeme they is used in Shakespeare with a singular antecedent (every one, spelled everyone in modern English).

[Added later: I would have to agree with you if you said that the above example is quite difficult to parse. It is indeed. Having direct objects before subjects is never helpful for those of us speaking SVO languages, but that's what Tudor English poetry is like. After some discussion with Marilyn Martin and Mark Liberman, I think I am satisfied that leaden slumber is understood as the subject of betake, and every one is its object. The reason betake doesn't have a final -s is not that it's agreeing with a plural subject (its subject is leaden slumber, singular), but rather that it is understood as doth betake with the doth omitted: it is not a present-tense verb, finite; it's in what The Cambridge Grammar calls the "plain form", as required by doth. So, in other words, the sense of the passage is roughly as follows (I change "doth fight" to "fights" in accord with contemporary English syntax, and simply murder the poeticality): "Now leaden slumber fights with life's strength; and takes everyone off to rest themselves, except for thieves, and worries, and troubled minds, which remain awake." The relevant point is unchanged by this clarification: the antecedent of themselves is the singular noun phrase every one. That's the current thinking in the halls of 1 Language Log Plaza, anyway. I did warn you that it was difficult.]

By all means, avoid using they with singular antecedents in your own writing and speaking if you feel you cannot bear it. Language Log is not here to tell you how to write or speak. But don't try to tell us that it's grammatically incorrect. Because when a construction is clearly present several times in Shakespeare's rightly admired plays and poems, and occurs in the carefully prepared published work of just about all major writers down the centuries, and is systematically present in the unreflecting conversational usage of just about everyone including Sean Lennon, then the claim that it is ungrammatical begins to look utterly unsustainable to us here at Language Log Plaza. This use of they isn't ungrammatical, it isn't a mistake, it's a feature of ordinary English syntax that for some reason attracts the ire of particularly puristic pusillanimous pontificators, and we don't buy what they're selling.

Posted by Geoffrey K. Pullum at 11:43 AM

January 04, 2006

Maybe Globalization Isn't As Advanced as We Think

I was watching Commander in Chief and, not having watched it regularly enough or with sufficient attention, was unclear as to one character's role, so I googled the show in hope of clarification. I ended up at this site. It proved satisfactory - it had the information I wanted - but one thing was peculiar. It described Nathan Templeton as el portavoz de la Casa Blanca, that is, "White House spokesman". Inexpert as I was, I knew this was wrong. Actually, Nathan Templeton is Speaker of the House of Representatives or portavoz de la Casa de Representantes. Evidently, the translator (its an American series and further googling suggests that the Spanish blurb is a translation of one provided in English by the network) confused the White House and the House of Representatives. Maybe we should find it heartening that the details of American government are not so universally familiar as to render such mistakes impossible.

Update 2006-01-04: some people have commented that the House of Representatives should be Cámara de Representantes. Both Casa and Cámara are in use as you can easily establish by googling for the two terms. Its possible that there is some sort of dialectal basis for the choice, but if so, I don't know what it is.

Posted by Bill Poser at 01:21 AM

January 03, 2006

Happy Abramoffukkah!

Another legal brouhaha, another celebratory blend. Last year we had Fitzmas and Kitzmas. This year kicks off with Abramoffuk(k)ah, commemorating Republican lobbyist Jack Abramoff's guilty plea earlier today.

As with Fitzmas, it looks like there were multiple discoverers of this felicitous blend. Maximus Clarke (aka "Artifice Eternity") used it in a comment on Metafilter on Dec. 21 ("First comes Fitzmas, then comes Abramoffukah!"). It showed up on Ed's Daily Rant the same day with the "Abramoffukkah" spelling. (Both were reacting to the news that Abramoff was looking for a plea deal that could implicate Tom DeLay and other top Republican legislators.) The day after that, it was used by "DCeiver" guest-blogging on Wonkette. The expression was further popularized in a Dec. 30 post on Daily Kos by "Sherlock Google," who credited Clarke with the coinage — though if time stamps are to be trusted, it looks like Ed's Daily Rant beat out Clarke by several hours.

It's not too surprising that several online wags should independently hit upon Abramoffukkah. The wild success of the Fitzmas blend was an easy model for liberal bloggers to follow, with the new coinage conjuring up the same schadenfreude at the legal follies of top Republicans. Secondly, Abramoff's orthodox Judaism makes a blend with the seasonally appropriate Hanukkah a natural fit. And finally, the blendability of -ukkah has already been established over the last couple of years by the jocular pseudoholidays of Chrismukkah (celebrated by the fictional inhabitants of "The O.C.") and Chrismahanukwanzakah (featured in tongue-in-cheek advertising from Virgin Mobile), along with several other variants. If there's such a thing as an overdetermined neologism, this is certainly an example of one.

[Update: Ella Earp-Lynch swiftly spotted yet another factor contributing to the creation of Abramoffukkah:

You made a lot of valid points about why this blending might occur to multiple people at once. However, I do think that you missed one, being the orthographic and phonological similarities between either spelling of the neologism and many pseudo-dialectal alternative spellings of the common pejorative (fucker). The coincidental presence of the word 'off' in Mr Abramoff's name makes it even more evocative. ]

Posted by Benjamin Zimmer at 11:40 PM

No problem

I couldn't resist the opportunity provided by Mark's mention of the latest MS Windows security crisis to point out that those of us running GNU/Linux or other Unix variants are blissfully unaffected by this, and most other, security problems.

Posted by Bill Poser at 10:55 PM

The Chemical Composition of Words

Mark Nandor, a math teacher at Wellington School in Columbus, Ohio, has posted a list of all of the English words that can be spelled using the symbols for the first 111 elements, as well as lists of magic squares made up of chemical symbols. His definition of English word is "listed in the ENABLE word list", which is used by Scrabble players. You can get your own copy here if you like: enable.zip.

Nandor says that he computed the list using Mathematica in about 25 hours including programming time. Mathematica is a wonderful tool for doing mathematics, but it isn't ideal for this sort of problem. I solved the same problem by matching this regular expression case-insensitively against the ENABLE word list:

^((ac)|(ag)|(al)|(am)|(ar)|(as)|(at)|(au)|(b)|(ba)|(be)|(bh)|(bi) |(bk)|(br)|(c)|(ca)|(cd)|(ce)|(cf)|(cl)|(cm)|(co)|(cr)|(cs)|(cu) |(db)|(ds)|(dy)|(er)|(es)|(eu)|(f)|(fe)|(fm)|(fr)|(ga)|(gd)|(ge) |(h)|(he)|(hf)|(hg)|(ho)|(hs)|(i)|(in)|(ir)|(k)|(kr)|(la)|(li) |(lr)|(lu)|(md)|(mg)|(mn)|(mo)|(mt)|(n)|(na)|(nb)|(nd)|(ne)|(ni) |(no)|(np)|(o)|(os)|(p)|(pa)|(pb)|(pd)|(pm)|(po)|(pr)|(pt)|(pu) |(ra)|(rb)|(re)|(rf)|(rg)|(rh)|(rn)|(ru)|(s)|(sb)|(sc)|(se)|(sg) |(si)|(sm)|(sn)|(sr)|(ta)|(tb)|(tc)|(te)|(th)|(ti)|(tl)|(tm)|(u) |(v)|(w)|(xe)|(y)|(yb)|(zn)|(zr))+$

using the GNU version of the standard Unix utility grep (specifically, its egrep avatar). It took ten minutes or so to locate and download the ENABLE list and construct the regular expression. The computation time? Less than one second on my 1.6GHz P4 with 512MB of RAM, not exactly a supercomputer. Moreover, I think that I got the correct result. Nandor's program somehow missed the valid words berg and urges, but included the non-words cryosurg ical, urg es, and v irgins.

Personally, I don't find this sort of exercise all that fascinating though I know some people do. It does, however, provide a nice illustration of the utility of regular expression matching for linguistic searching.

Posted by Bill Poser at 10:13 PM

What would Whorf say?

Something about most anything, it seems. I've recently come across two papers about the influence of language on thought and action. Both papers strike me as suggestive (in roughly the same way) and also not entirely convincing (in roughly the same way). Otherwise, the two papers are just about as different as they could possibly be.

The first paper is Heesook Kim, "What would Sapir and Whorf talk about the social conflicts in the South Korean Society" [sic], in 어너학 [Eoneohag -- Journal of the Linguistic Society of Korea], No. 40, December 2004. Actually, it's just the translated title and abstract of a paper published in Korean, which I haven't read. (The English is somewhat imperfect, though infinitely better than my Korean.) Here's the abstract:

Like Taiwan, South Korea is a somewhat lately democratized society. However, comparing two countries, South Korea has been featured with more social conflicts, we find. In modern times, both have shared a similar experience in social, political and economic aspect. They have been neighbors even geographically. Then, how could one society reveal more confrontations among the members than the other? With the help of Sapir-Whorf hypothesis, we tried to show that honorifics in Korean, which is believed the most complex in the world, is reponsible for the distinction. We proved that honorifics, which was born and developed in the pre-modern social structure, tends to prevent equality, which is necessary for people to face one another on equal footing, from being established among the individuals and make people seek collective actions to make their voices heard and resolve difference in their interests in equal terms.

The idea, I guess, is that if your language forces you to use honorifics all the time, drawing your attention to your place in society relative to others, you will be more conscious of distinctions in social status; and therefore you will be more likely to interpret issues in terms of social status, and/or to feel a greater sense of solidarity with your social peers. That makes some sense.

But if the facts about social conflict had been different -- more recent conflict in Taiwan than in Korea -- you might have taken Whorf the other way. Maybe the use of honorifics helps keep people grounded in their traditional roles, and less likely to challenge the traditional division of power. And in fact, if we take a slightly larger geographical and historical frame, it's not obvious to me that China/Taiwan has had less social conflict than Korea.

Any way you slice it, it seems to me that it's hard to make a strong case based on two data points. If you had a survey of 10 or 20 dominant national languages, convincingly quantifying the degree to which they reflect the relative social status of conversational participants; and an independent survey of the degree of social conflict in the countries in which these languages are spoken; and a clear correlation between the two measures...

(My evaluation of this paper is obviously limited by the fact that I've only read the abstract -- perhaps the body of the paper presents other arguments or addresses these points in some way.)

The second paper is Aubrey L. Gilbert, Terry Regier, Paul Kay, and Richard B. Ivry, "Whorf hypothesis is supported in the right visual field but not the left", PNAS, vol. 103, 489-494, January 10, 2006. Here's the abstract:

The question of whether language affects perception has been debated largely on the basis of cross-language data, without considering the functional organization of the brain. The nature of this neural organization predicts that, if language affects perception, it should do so more in the right visual field than in the left visual field, an idea unexamined in the debate. Here, we find support for this proposal in lateralized color discrimination tasks. Reaction times to targets in the right visual field were faster when the target and distractor colors had different names; in contrast, reaction times to targets in the left visual field were not affected by the names of the target and distractor colors. Moreover, this pattern was disrupted when participants performed a secondary task that engaged verbal working memory but not a task making comparable demands on spatial working memory. It appears that people view the right (but not the left) half of their visual world through the lens of their native language, providing an unexpected resolution to the language-and-thought debate.

There were 13 subjects, all Berkeley students. The stimuli were made up of colored squares drawn from a set of four, spanning a series from "green" to "blue":

On each trial, the subject was shown a ring of 12 squares surrounding the fixation point. Eleven of the twelve square were the same color, and one (in a random location) was a different color.

The subject's task was to indicate as rapidly as possible whether the different-colored square was in the left half or the right half of the array. The oddball square could be could be of the "same" basic color-name category (in English!) or of a different category.

The crucial thing about the experimental design is that the left side of the visual field projects to the right (non-dominant) side of the cerebral cortex, while the right side projects to the left (language-dominant) side of the cortex.

And here's some of the results:

In the "no interference" condition, the subjects were significantly faster when making a between-color-category judgments (i.e. the oddball was green and rest were blue, or vice versa) in the right visual field. (Eyeballing the figure, the difference was apparently only about 15-30 msec. out of about 420 msec., or about 5% -- but that's what reaction time experiments are usually like.)

When subjects were distracted by having to silently rehearse an 8-digit number during a block of trials (and they had to recall it at the end of the block), the results were quite different:

In this case, all the reaction times were a bit longer, of course. But now in the right visual field, the within-category trials were significantly faster than the across-category trials! The effect of (English) color category was reversed. In fact, curiously, the RVF within-category trials were now also significantly faster than the LVF within-category trials, while the LVF between-category trials were faster than the RVF between-category trials.

In a second experiment, the authors compared a different verbal-interference task (remembering an irrelevant color word, like "red") with a non-verbal interference task (remembering the arrangement of a spatial grid of square). They found that with the non-verbal interference, RVF between-category judgments were faster (similar to the no-interference condition), while with the verbal interference, RVF within-category judgments were again faster -- and again, the same curious inversion of effects obtained in the verbal interference case, with the RVF within-category trials being significantly faster than the LVF within-category trials, while the LVF between-category trials were faster than the RVF between-category trials.

Well, you can read the details for yourself, if you want. It's a great piece of work, and the authors' interpretation makes a lot of sense, and might well be true. But a couple of things about it worry me.

One is that the explanation might have worked just as well if the experiment had come out quite a bit differently. For example, if the LVF between-category reaction times had been slower instead of faster, you could say that it's because the color names are interfering with a faster non-linguistic process. Indeed, you have to say something like that to explain the (unpredicted) result that the verbal interference task actually reverses the effect, making between-category reaction times significantly slower than within-category reaction times.

Other possible results -- basically anything except a situation in which color category makes no difference, or doesn't interact with visual field -- could similarly be given a Whorfian interpretation.

A second cause of worry is that the experiments, though extensive, are somewhat limited. There are just four colors, and one basic color category distinction. The particular colors used have lots of other properties besides their relationship to English color-name boundaries. It would be unfortunate (for example) to learn that things are quite different if we use purple-to-red instead of green-to-blue, or if we use a green-to-blue sequence with different saturation or brightness.

We might also wonder what happens with subjects who have various other sorts of cerebral lateralization, for language and perhaps other knowledge and skills. Or what happens if you ask subjects to judge whether the oddball square is in the top half or the bottom half of the array, instead of left vs. right.

And finally, the cross-linguistic shoe hasn't dropped yet. The authors observe that "a majority of the world's languages" use a single word for the basic color categories of the green-to-blue sequence they used in this experiment. So the prediction is that the speakers of these languages will not show any effect of the boundary between stimuli B and C; and similarly for other such boundaries on other color sequences.

A crucial difference between the paper on honorifics and the paper on colors is that it's a straightforward job of work to do more color experiments of the same general sort, and no doubt we'll see some along these lines in the future. Furthermore, the same LVF-vs.-RVF RT paradigm could be used with things other than color, e.g. a range of similarly shaped unnamable pictures vs. pictures with a range of similar-sounding names vs. pictures with a range of semantic similarities. A whole psycholinguistic industry devoted to seeking Whorfian effects in cortical RT asymmetries may be ahead of us.

[More Language Log posts referencing Benjamin Lee Whorf are here. There are shockingly many of them.]

Posted by Mark Liberman at 07:44 PM

Singular they with known sex

Sean Lennon (the singer/songwriter son of John Lennon and Yoko Ono), who is 30, would like to have a new girlfriend, and for some reason talked to the people at the Page Six department of the New York Post about it and "playfully pleaded with The Post to find him a girlfriend for the new year", his relationship with Bijou Phillips having broken up a while ago (in 2003, I'm told, though you wouldn't know that from the Post, which has a big picture of them together as if this were recent news). The Post publishes some self-descriptions sent in by young women who thought they might suit his needs ("I lead a full life and would like to share it with someone," says Betsy Head, 27). The linguistic angle (this is Language Log) is that what he is reported as having said to the Post about those needs provides a nice example of the way singular they is going in the speech of younger people (and 30 counts as young in this context). Said Sean (Thursday, December 29, 2005 , page 9):

Any girl who is interested must simply be born female and between the ages of 18 and 45. They must have an IQ above 130 and they must be honest.

The antecedent of they, both occurrences, is any girl who is interested, a singular noun phrase. Yet because of the head noun girl we know that semantically the quantifier that noun phrase denotes ranges only over female humans. Thus the sex of the 18 to 45-year old honest person with the 130+ IQ that Sean hopes to find is fixed by his stipulation. Yet he still says they.

Notice that the verb phrase must have an IQ above 130 clearly needs a singular subject (each girl has one IQ; he could have said "They must have IQs above 130" if he was talking about the whole group of hopeful applicants). Clearly, in the speech of Sean Lennon, they not only can have a morphosyntactically (and semantically) singular antecedent, but it can do so even if the gender of the referent is known, and syntactically overt, as with any girl.

The context that most favors they with singular antecedents, I think, is where it roughly corresponds to what logicians call a bound variable. Lennon's two sentences above convey a meaning something like:

"For any girl X such that X is interested, X must be born female and X must be at least 18 years old and X must be not more than 45 years old and X must have an IQ above 130 and X must be honest."

The use of they is, to put it informally, a reminder that the pronoun is not referring to any one person. Gender reflects the sex of the person referred to in English, and it is evident that Sean Lennon feels that with no definite reference for the pronoun, they is more appropriate than she.

It's tricky to talk about this with care. The traditional simplistic line is just the pronoun they is plural and that's all there is to it; but that won't do. The uses of they in the quote above do not really refer to any particular person or persons. Nor does the noun phrase any girl. But any girl expresses a quantifier, and the quantifier binds variables semantically, and the pronoun they realizes the bound variables syntactically. The number agreement facts show that they is morphosyntactically plural. But the anaphora facts show that it can take an antecedent that is morphosyntactically singular (as The Times style guide once agreed but then it changed its mind back again to the old-fashioned view that says there's something wrong with that).

Semantically, in some uses they is semantically "plural" in the sense that it refers to a group. But in the use illustrated here it is not semantically "plural"; it corresponds to a bound variable, and the semantic notions of singular and plural reference don't really apply to it. The semantic notion of sex reference doesn't really apply either. The fact that Lennon stipulates a range for the variables that includes only humans born of the female sex apparently is not quite enough for him to use she, though of course he would use that pronoun in a context where a specific human such as Bijou Phillips was being referred to. (The Post's picture has her in a tight red dress, and she is very clearly female. Trust me. At Language Log we do fact-checking on this sort of thing.)

Posted by Geoffrey K. Pullum at 06:13 PM

WMF vulnerability

[Update 1/6/2006: Microsoft's official patch is out, released five days ahead of "Patch Tuesday". However, note that (ironically) Linux is still vulnerable to the WMF exploit via WINE. I don't know about Virtual PC under Mac OSX.]

This has nothing to do with linguistics, but it isn't as widely known as it ought to be, and it's important, so I'll post it here. If you or yours have any computers running Windows XP, you should run, not walk, to this story at the Internet Storm Center, and consider following the instructions found there (installing a patch and de-registering a particular .dll). This may protect you until Microsoft makes a more systematic solution available.

The patch was written by Ilfak Guilfanov, who has also released a program that tests your system for vulnerability. There's a security advisory from Microsoft (but no patch), a vulnerability note from CERT, some additional information from Mikko at F-Secure, a Washington Post story, and a CVE entry.

Because this exploit was publicized on Dec. 27, bad guys around the world have had a week to work on ways to use it while most people have been busy with other things. I believe you'll be hearing more about this.

[Update: here's a ZDNet article.

Let me add that something about this situation puzzles me a great deal. It was back in January of 2002, fully four years ago, that Bill Gates was reported to be "kicking off an all-out effort to repair the company's reputation for poor security and reliability". The simplest and most obvious security vulnerabilities are those that arise because a standard, commonly-used file format includes, by design, the capability to instruct the OS to execute some arbitrary piece of code. How can it possibly be true that after a few weeks (never mind four years) of "all-out effort", some MS software engineer didn't call attention to the fact that Windows Meta Files -- a common graphics format on Windows machines -- contain such a vulnerability? If no one noticed this, then Redmond's engineers are incompetent. If someone did notice, and nevertheless up to four years went by during which no one did anything to patch the vulnerability, then Redmond's managers are incompetent. Either way, it's a bad omen for Microsoft's future.]

[Note -- I incorrectly glossed "wmf" as "windows media file" -- thank to several alert readers for correcting the mistake. That's what I get for learning as little as possible about Windows internals...]

Posted by Mark Liberman at 07:00 AM

January 02, 2006

Everyone at The Times agrees... No they don't

I have a one-step-forward-one-step-back story. I noticed a while ago that the second edition of The Times Guide to Grammar and Usage (ed. by Simon Jenkins; London, 1992, and now long out of print, I think), explicitly states that they with singular antecedent (the example Everybody should bring their lunch is cited) is "acceptable usage" and will often constitute a good solution to the problem that one is otherwise forced to choose between he (which says the referent is male) or she (which says the referent is female) or he or she (often much too clumsy, as anyone who thinks he or she might like to spend some of his or her time convincing himself or herself will soon find out for himself or herself).

Jenkins thus endorses the position that The Cambridge Grammar of the English Language was to take a decade later. And rightly so. It's a position that couldn't really be doubted by anyone who had devoted even a few minutes to looking at the facts of usage, be it literary (over the past 600 years) or everyday conversational. Excellent advice.

But if you read on, there is sadder news to come.

I wondered for a while if the old-fashioned handbooks that condemn singular antecedents for they realized they were contradicting The Times as well as CGEL. Even dyed-in-the-wool old-tyme prescriptivists, who might regard CGEL's descriptive stance with horror, I thought to myself, should surely be prepared to agree that The Times of London knows whereof it speaks with regard to the English language. If any newspaper in the world can be regarded as virtually definitive of good written Standard English over the past two hundred years it has to be The Times. But before telling you of my discovery in the 1992 edition, I had a look around to see what was the current version of The Times's style guide. And what I found plunged me back into depression.

Go to The Times Online Style Guide and take a look at what they have under "they" now. It is the following piece of ill-considered stupidness:

they   should always agree with the subject. Avoid sentences such as "If someone loves animals, they should protect them". Say instead "If people love animals, they should protect them"

Agree with the subject? Subjects have nothing to do with this, as you can see from an example like We told everyone they were free to leave, where they has a direct object as its antecedent. Here no sense can be made of the idea that they should "agree with the subject". The editors — Richard Dixon, Mike Murphy, and Denis O'Donoghue — don't know a subject from an antecedent, or doubtless from an artichoke, and they should be ashamed of themselves. (It is not clear what they would recommend. Probably some sort of rewording like We told all of them they were free to leave. This is no improvement.)

What Dixon, Murphy, and O'Donoghue are trying (ineptly) to say is clear enough: they have returned to the dopey position that says they must never have a morphosyntactically singular antecedent. They are back in tune with American backwardness: with Strunk & White, and their thousands of latter-day co-religionists such as Stanley Fish and the style guides of the Modern Language Association and the American Psychological Association. Why do these sources continue to damn singular antecedents for they in defiance of all the evidence of its constant use by respectable authors during at least the past six centuries? I have no idea.

But you can look the matter up for yourself: the wonderful Merriam-Webster's Dictionary of English Usage would be a very good place to start. Look at their list of literary examples, and then decide, freely and of your own will, uninfluenced by me, whether to side with The Cambridge Grammar or the atavistic loonies. (Hint: The atavistic loonies would not be a good choice.)

Posted by Geoffrey K. Pullum at 08:59 PM

Dinner at the L.S. Cabal: the sequel

Claire Bowern of anggargoon.org has suggested another blogging-oriented dinner at the annual LSA meeting, which is in Albuquerque this year (preliminary program here). Last year's dinner was interesting and fun. Claire suggests either Friday 1/6/2006 or Saturday 1/7/2006. My preference would be for Saturday, leaving around 7:30 from the reception after the Presidential Address, but Friday is also possible for me. Let me know if you'd like to come so we can get a headcount for a reservation.

[Note: I've changed my (careless) wording from "bloggers' dinner" to "blogging-oriented dinner", to make it clear that friends, correspondents, readers, and any other interested parties are welcome. 10 blogging-oriented people have signed up so far, but there are a few who can't make it on Friday and a few who can't make it on Saturday, so the schedule is still up in the air. ]

Posted by Mark Liberman at 02:32 PM